***** To join INSNA, visit http://www.insna.org *****
I think work like that of Daryl Pregimon and colleagues is one obvious
direction that such large scale analysis would take.
Of course, from what little I have seen of the published work, their use of
social network data is very limited. But they could certainly have added in
slightly more sophisticated network measures than they were using 4 years
ago. Nonetheless, using basic summary statistics and measures like "is this
number receiving calls from numbers that, in the past, contacted numbers we
shut down for fraud?" they were able to analyze massive data stream of ATT
long distance calls in real-enough time to catch fraudulent users within a
few days. The long distance ATT data constitutes about 300 million edges
per day (if I recall correctly).
Hancock is a C based program, and I believe it stores some of the data at
the level of bits rather than bytes, thus saving some space. It also
lacks(ed) the gui niceties, visualization ability, and nuanced positional
measures typical of the programs familiar to us.
Pregimon is at Google now, so I guess we won't be seeing any more
publications that would reveal where his work is going.
That shift points to, I feel, a very disturbing (more disturbing?)
development in recent years, the expansion of research resources for
"closed" research communities like NSA, military, and now, Google. This is
coupled with the obvious drain of talent (and resources) away from open
research communities like those in academia and firms like ATT, Yahoo, etc.
where researchers are encouraged to publish and share their findings with
the academic community.
What are the likely implications of the shift from open to closed research
coupled with a shift from small scale to massive resource intensive
Anyway, the Hancock work is interesting as a comp sci solution to the
problem of analyzing big data.
On 5/25/06, George Barnett <[log in to unmask]> wrote:
> ***** To join INSNA, visit http://www.insna.org *****
> I recently reviewed a NSF proposal to use the GRID for the analysis of
> 2-3 petrabytes (1,056 terabytes) of data. So, the capacity is there for
> the storage and analysis of these quantities of data. As suggested in
> previous postings software is probably theissue.
> George Barnett
> SOCNET is a service of INSNA, the professional association for social
> network researchers (http://www.insna.org ). To unsubscribe, send
> an email message to [log in to unmask] containing the line
> UNSUBSCRIBE SOCNET in the body of the message.
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.