***** To join INSNA, visit http://www.sfu.ca/~insna/ *****
We did some works two years ago on this. A relative paper can be found at:
----- Original Message -----
From: "Kennedy, Mark" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Thursday, March 25, 2004 6:31 PM
Subject: Re: Extracting SNA matrices from e-mail fields
> ***** To join INSNA, visit http://www.sfu.ca/~insna/ *****
> >I'm currently working on a research employing social network analysis
> >on an e-mail archive. My problem is how to extract automatically from
> >the "from", "to" and "cc" fields of EACH e-mail a relational matrix.
> >Does anybody of you have an idea or suggestion (some software or short
> I had to solve a similar problem myself and found it necessary to write
> the software to do it. (I was extracting networks showing who is
> co-mentioned with whom in about 60,000 single-spaced pages of archived
> news coverage.)
> As other posters have suggested, the task is easier if you can put a
> decent number of e-mails together in a largeish file.
> Once you have the file, read it closely and develop a graph that
> captures the syntax of the fields you care about. The main thing is
> that you need to know when messages begin and end and what you want from
> them. Since the stuff you care about occurs after well-defined tokens
> that come at the beginnging of lines, the parsing logic is pretty
> simple. A super-simple version would go something like this:
> 1. Get a line from your input file.
> 2. Until it starts with "From:", dump it and get another.
> 3. Extract everything to the end of line as the from data
> 4. Get a line from your input file.
> 5. Until it starts with "To:", dump it and get another.
> 6. Extract everything to the end of line as the from data
> Of course, you'll want to tease apart multiple recipients, and you may
> want other fields as well. The more you may change what you want, the
> more it makes sense to specify what you are looking for by using a graph
> that describes the order of the fields you care about.
> Say you wanted to allow for link decay and you wanted to capture notions
> of principals, stakeholders and confidantes, you'd also want the date,
> cc: and bcc: fields. We can write the graph as text by putting each
> token you care about at the beginning of a line and then putting the
> tokens that could follow on the same line. That way, your syntax graph
> could look like this:
> Date: To:
> To: CC: BCC: Reply-to: From:
> CC: BCC: Relpy-To: From:
> BCC: Reply-To: From:
> Reply-To: From:
> From: To:
> (This is all linear text, so you don't have to worry about cycles.)
> Just as above, if you only care about the field info that follows the
> token marking the data, you can just get lines and eat them if they
> don't start with tokens that match nodes in your syntax graph. Doing
> this makes it possible to change what you are looking for without
> rewriting your code.
> If you're not programming-savvy, CS undergrads who have taken compilers
> and done well can usually handle this kind of task without too much
> agony. Perl, Java, C++ and even Visual Basic are all reasonable
> languages for the job.
> If you are developing very many networks with techniques like this, I
> recommend you embed routines for computing the network measures you want
> into your program. Saves a lot of pointing and clicking in UCINet,
> Pajek, or whatever else. For that, I recommend NetStat+ if you are
> using C or C++ (that's what I did -- it words great) or JUNG if you are
> using JAVA.
> Hope that helps and good luck,
> Mark T. Kennedy, Ph.D.
> Department of Management & Organization
> Marshall School Business | University of Southern California
> [log in to unmask]
> SOCNET is a service of INSNA, the professional association for social
> network researchers (http://www.sfu.ca/~insna/). To unsubscribe, send
> an email message to [log in to unmask] containing the line
> UNSUBSCRIBE SOCNET in the body of the message.
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.sfu.ca/~insna/). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.