Print

Print


***** To join INSNA, visit http://www.insna.org *****
Dear Socnetters,

I am conducting a study that applies available link prediction algorithms over a longitudinal network. I have a training network with 70000 edges and the weight distribution for this network is as follows


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
50
100
200
50111
9419
3535
1880
1187
816
239
201
195
126
105
82
74
53
65
68
48
434
110
33
9
3
3


You can observe that most of the edges have weight 1 and contains both high and low important nodes. On the other hand, if I consider the edges with weight greater then 1 then I loose some 
important infrequent nodes that will appear in the test network. If I consider them all (both edges with 1 and grater than 1) then apply link prediction algorithms for around 5000 nodes, the resultant predicted network will be 
highly dense (50%) with more than 2 million edges. But comparably my test network's size is smaller with only 10k-20k edges. In terms of evaluation, it will affect the performance with lots of false positives. 

Therefore, I am looking for suggestions and or any related publications with regard to this problem in order to balance the size of the training and test networks.  I am wondering what should be the best way to pick up edges 
with such a skewed distribution of their weights. 

Thanks and regards
Nazim


_____________________________________________________________________ SOCNET is a service of INSNA, the professional association for social network researchers (http://www.insna.org). To unsubscribe, send an email message to [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of the message.