***** To join INSNA, visit http://www.insna.org *****
the size of the dense matrix would be n(n-1)/2, which for a 1M node matrix
means that you would have 500+GB of data, plus overhead. probably a
couple of terabytes of data, in the end. expensive!
storing the raw matrix isn't typically done with super-huge networks. it
is horribly wasteful when networks are often actually quite sparse... you
want to store the sparse nets.
look for things related to the Harwell-Boeing format. pretty common.
or just store lists of nodes and edges, and query them out of your data
store as you need them. think about what your research questions are, and
store the data in ways that you can query to answer those questions.
when people start talking about million-node networks, we usually start
talking about sampling and modeling. you CAN NOT process a network of
that size with available hardware. it just isn't going to happen. even
very small (this is always relative, of course), sampled networks will
chew up all of the resources available on a high-end desktop machine.
sorry if this seems a little low-level. i'd be happy to talk off-list
about things that we've tried and that we've found helpful or not --
particularly with regard to very large social networks.
--elijah
On Tue, 12 Jul 2005, Corey Phelps wrote:
> Date: Tue, 12 Jul 2005 17:45:40 -0700
> From: Corey Phelps <[log in to unmask]>
> To: [log in to unmask]
> Subject: 2 questions: data file size & construction of data file
>
> ***** To join INSNA, visit http://www.insna.org *****
>
> A colleague of mine has asked for help, which I hope the members of this
> list can provide. There are 2 questions:
>
> 1) He has data on the relationships (presence/absence) among nearly 1
> million individuals. He would like to store this in a single, flat file
> as an adjacency matrix. Furthermore, he needs to be able to calculate
> network measures on the individuals in this network (e.g., between
> centrality). I have never worked with a data set this large. Is this
> possible? If so, what file format would work and what SNA program or
> programming language should he use?
>
> 2) In addition to the relational data, he also has data on an attribute
> of each individual (coded as dichotomous: present/absent). He would like
> to be able to combine these two types of data in order to calculate the
> path length between a focal individual and an individual who possesses
> the attribute. For example, if person A is connected to person B (who
> has the attribute), the path length would be 1. If person A is connected
> to person B (who does NOT have the attribute), who is connected to
> person C (who has the attribute), thent he path length would be 2. He
> would like to use the ability to calculate such path lengths to
> calculate a type of Information Centrality (Stephenson & Zelen, 1989)
> for each actor in the network. This measure would only consider the path
> lengths between ego and those alters who possess the attribute. If you
> have recommendations on how to do any of these steps and/or recommended
> references, please let me know.
>
> Thanks in advance.
>
> Corey Phelps, PhD
> Asst. Professor, Management & Organization
> University of Washington Business School
> Box 353200
> Seattle, WA 98195
> (206) 543-6579
> [log in to unmask]
>
> _____________________________________________________________________
> SOCNET is a service of INSNA, the professional association for social
> network researchers (http://www.insna.org). To unsubscribe, send
> an email message to [log in to unmask] containing the line
> UNSUBSCRIBE SOCNET in the body of the message.
>
_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.
|