Print

Print


*****  To join INSNA, visit http://www.insna.org  *****

Dear Socnetters,

I am conducting a study that applies available link prediction algorithms over a longitudinal network. I have a training network with 70000 edges and the weight distribution for this network is as follows


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

50

100

200

50111

9419

3535

1880

1187

816

239

201

195

126

105

82

74

53

65

68

48

434

110

33

9

3

3



You can observe that most of the edges have weight 1 and contains both high and low important nodes. On the other hand, if I consider the edges with weight greater then 1 then I loose some
important infrequent nodes that will appear in the test network. If I consider them all (both edges with 1 and grater than 1) then apply link prediction algorithms for around 5000 nodes, the resultant predicted network will be
highly dense (50%) with more than 2 million edges. But comparably my test network's size is smaller with only 10k-20k edges. In terms of evaluation, it will affect the performance with lots of false positives.

Therefore, I am looking for suggestions and or any related publications with regard to this problem in order to balance the size of the training and test networks.  I am wondering what should be the best way to pick up edges
with such a skewed distribution of their weights.

Thanks and regards
Nazim
________________________________


_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.