Print

Print


*****  To join INSNA, visit http://www.insna.org  *****

A couple of suggestions:

- You can compare the measured clustering coefficient with null models (e.g. random graph model with the same degree sequence http://www.pnas.org/content/99/suppl.1/2566.full ). You can also plot and see local clustering coefficient as a function of degree. 

- Assortativity: http://en.wikipedia.org/wiki/Assortativity 

- One way to visualize huge graphs is getting coarse-grained representations such as communities and drawing the relationship between them (the network of communities. For example, check out Fig. 3 from http://arxiv.org/pdf/0803.0476v2.pdf)

- There are many community identification methods that scale quite well. For instance Louvain method is known to scale very well. Our link community method can also be applied to networks with multi-million nodes if the network doesn't have very large hubs. 

Louvain method: https://sites.google.com/site/findcommunities/
Link community paper: http://www.nature.com/nature/journal/v466/n7307/abs/nature09182.html
C++ version of link clustering: https://github.com/bagrow/linkcomm/tree/master/cpp
A review on community detection: http://arxiv.org/abs/0906.0612


Best, 
yy

--
Yong-Yeol Ahn
Assistant Professor
School of Informatics and Computing
Indiana University Bloomington
Web: http://yongyeol.com

On Jun 6, 2012, at 5:17 PM, Hyokun Yun wrote:

> ***** To join INSNA, visit http://www.insna.org ***** Dear list members,
> 
> 
> I would like to gather suggestions on: 
> 
> When confronted by a large network data (1M to 10M nodes, 100M to 1b edges),
> what are your favorite first steps to understand the data and to figure out the data makes sense?
> 
> 
> I agree that it may depend on what is the objective of the analysis, but I think there should be
> certain steps those might be very common in many projects irrespective of what the goal is.
> 
> 
> 
> Here are a list of things that comes into my mind:
> 
> 1) draw (in/out) degree distribution and check whether it is long-tailed / follows power-law
> 2) hop plot (distribution of the number of pairs as a function of geodesic distance), approximated by ANF method
> 3) number of (weak/strong) components and their sizes
> 4) apply scalable clustering algorithms (ex: METIS/Graclus) and get profile information of each cluster
> 
> 
> Followings are some other methods but I am not sure how to apply:
> 
> 1) calculate approximate clustering coefficient - but what should I do with this number? Compare with clustering coefficients of previously known networks?
> 2) visualize the graph - but I am not sure whether there is an algorithm which will scale to 1M nodes, and even if there is, how would I make sense out of it.
> 3) apply community detection algorithms - but is there an algorithm that would scale to 1M node graphs? How would it be different from graph clustering algorithms such as Graclus?
> 
> 
> Suggestion in any form including pointer to papers/books should be very appreciated!
> 
> 
> 
> Thanks,
> Hyokun Yun
> 
> Ph.D Student
> Department of Statistics
> Purdue University
> _____________________________________________________________________ SOCNET is a service of INSNA, the professional association for social network researchers (http://www.insna.org). To unsubscribe, send an email message to [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of the message.

_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.