***** To join INSNA, visit http://www.insna.org *****
Although the corpora you're looking at is different, you might try looking at some scholarship emerging from digital humanities scholars.
There are some "in-development" software tools and processes in this area that do exactly what you seem to be describing -- coding for words and patterns of words in large blocks of text to facilitate the identification of patterns.
On Jun 29, 2012, at 6:17 AM, Nate Doogan wrote:
> ***** To join INSNA, visit http://www.insna.org *****
> Hi all.
> I've got several large corpuses (10K - 40K documents each) of small
> documents (2 - 30 words). With these I can develop networks of words
> given membership to documents and vice versa (binary two-mode nets
> projected to weighted one-mode nets). My present goal is to codify
> documents and use the codes in further analysis. Are there any
> literature recommendations that might help me think through the best
> way to go about codifying documents (or words) in this way?
> Nathan Doogan
> Doctoral Candidate -- Social Work
> The Ohio State University
> SOCNET is a service of INSNA, the professional association for social
> network researchers (http://www.insna.org). To unsubscribe, send
> an email message to [log in to unmask] containing the line
> UNSUBSCRIBE SOCNET in the body of the message.
School of Information
University of Texas at Austin
[log in to unmask]
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.