Print

Print


*****  To join INSNA, visit http://www.sfu.ca/~insna/  *****

We did some works two years ago on this. A relative paper can be found at:
http://www-personal.si.umich.edu/~junzh/papers.html.

Jun

----- Original Message -----
From: "Kennedy, Mark" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Thursday, March 25, 2004 6:31 PM
Subject: Re: Extracting SNA matrices from e-mail fields


> *****  To join INSNA, visit http://www.sfu.ca/~insna/  *****
>
> >I'm currently working on a research employing social network analysis
> >on an e-mail archive. My problem is how to extract automatically from
> >the "from", "to" and "cc" fields of EACH e-mail a relational matrix.
> >Does anybody of you have an idea or suggestion (some software or short
> >cut)?
>
> I had to solve a similar problem myself and found it necessary to write
> the software to do it.  (I was extracting networks showing who is
> co-mentioned with whom in about 60,000 single-spaced pages of archived
> news coverage.)
>
> As other posters have suggested, the task is easier if you can put a
> decent number of e-mails together in a largeish file.
>
> Once you have the file, read it closely and develop a graph that
> captures the syntax of the fields you care about.  The main thing is
> that you need to know when messages begin and end and what you want from
> them.  Since the stuff you care about occurs after well-defined tokens
> that come at the beginnging of lines, the parsing logic is pretty
> simple.  A super-simple version would go something like this:
>
> 1.  Get a line from your input file.
> 2.  Until it starts with "From:", dump it and get another.
> 3.  Extract everything to the end of line as the from data
> 4.  Get a line from your input file.
> 5.  Until it starts with "To:", dump it and get another.
> 6.  Extract everything to the end of line as the from data
>
> Of course, you'll want to tease apart multiple recipients, and you may
> want other fields as well.  The more you may change what you want, the
> more it makes sense to specify what you are looking for by using a graph
> that describes the order of the fields you care about.
>
> Say you wanted to allow for link decay and you wanted to capture notions
> of principals, stakeholders and confidantes, you'd also want the date,
> cc: and bcc: fields.  We can write the graph as text by putting each
> token you care about at the beginning of a line and then putting the
> tokens that could follow on the same line.  That way, your syntax graph
> could look like this:
>
> Date: To:
> To: CC: BCC: Reply-to: From:
> CC: BCC: Relpy-To: From:
> BCC: Reply-To: From:
> Reply-To: From:
> From: To:
>
> (This is all linear text, so you don't have to worry about cycles.)
>
> Just as above, if you only care about the field info that follows the
> token marking the data, you can just get lines and eat them if they
> don't start with tokens that match nodes in your syntax graph.  Doing
> this makes it possible to change what you are looking for without
> rewriting your code.
>
> If you're not programming-savvy, CS undergrads who have taken compilers
> and done well can usually handle this kind of task without too much
> agony.  Perl, Java, C++ and even Visual Basic are all reasonable
> languages for the job.
>
> If you are developing very many networks with techniques like this, I
> recommend you embed routines for computing the network measures you want
> into your program.  Saves a lot of pointing and clicking in UCINet,
> Pajek, or whatever else.  For that, I recommend NetStat+ if you are
> using C or C++ (that's what I did -- it words great) or JUNG if you are
> using JAVA.
>
> Hope that helps and good luck,
> -mk.
>
> Mark T. Kennedy, Ph.D.
> Department of Management & Organization
> Marshall School Business | University of Southern California
> [log in to unmask]
> 213.821.5668
>
> _____________________________________________________________________
> SOCNET is a service of INSNA, the professional association for social
> network researchers (http://www.sfu.ca/~insna/). To unsubscribe, send
> an email message to [log in to unmask] containing the line
> UNSUBSCRIBE SOCNET in the body of the message.
>
>

_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.sfu.ca/~insna/). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.