***** To join INSNA, visit http://www.insna.org ***** Hi George,

Thank you for pointing out these publications. They are really interesting and we will compare our results. While reading through the publications I found the mentioned dataset containing around 15 billion hyperlinks, but I do not really understand why our dataset, including 128 billion hyperlinks should be smaller? Maybe you can also point out where I can get the dataset you used for your analyses, so I can dig a little bit deeper into this dataset as well.

Thanks a lot for your help,
Robert

Am 18.11.2013 19:03, schrieb George Barnett:
[log in to unmask]" type="cite">
Robert,
   You're wrong.  Han Woo Park & I have published a series of papers with over 15 Billion hyperlinks.  Also, you might want to look at a paper by Barnett, et al., in Social Networks and Mining.

           Park, H.W., Barnett, G.A. & Chung, C.J. (2011). Structural Changes in the Global Hyperlink Network 2003-2009, Global Networks, 11(4), 522-544. 

 

            Barnett, G.A., Chung, C.J., & Park, H.W. (2011).  Uncovering transnational hyperlink patterns and web-mediated contents: A new approach based on cracking .com domain, SSCORE (Social Science Computer Research and Evaluation), 29 (3), 369-384.


Barnett, G.A., & Park, H.W. (2012). Examining the International Internet Using Multiple Measures: New methods for measuring the communication base of globalized cyberspace.  Quality and Quantity. DOI 10.1007/s11135-012-9787-z


Barnett, G.A., Ruiz, J., Hammond, J., & Xin, Z. (2013). An examination of the relationship between international telecommunication networks, terrorism and global news coverage. Social Networks and Mining. (DOI) 10.1007/s13278-013-0117-9 


George


George A. Barnett, Ph.D.

Professor & Chair

Department of Communication

University of California, Davis

Davis, CA 95616 USA



 




On Sun, Nov 17, 2013 at 11:18 PM, Robert Meusel <[log in to unmask]> wrote:
***** To join INSNA, visit http://www.insna.org *****

Hi all,

the Web Data Commons team is happy to announce the publication of a new large hyperlink graph.

The graph has been extracted from the Common Crawl 2012 web corpus [1] and covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public.

The graph can be downloaded in various formats from

http://webdatacommons.org/hyperlinkgraph

We provide initial statistics about the topology of the graph at

http://webdatacommons.org/hyperlinkgraph/topology.html

We hope that the graph will be useful for researchers who develop

·         Search algorithms that rank results based on the hyperlinks between pages.

·         SPAM detection methods which identity networks of web pages that are published in order to trick search engines.

·         Graph analysis algorithms and can use the hyperlink graph for testing the scalability and performance of their tools.

·         Web Science researchers who want to analyze the linking patterns within specific topical domains in order to identify the social mechanisms that govern these domains.

We want to thanks the Common Crawl project for providing their great web crawl and thus enabling the creation of the WDC Hyperlink Graph.

The creation of the WDC Hyperlink Graph was supported by the EU research project PlanetData and by Amazon Web Services.  We thank your sponsors a lot.

Best Regards,

Chris, Oliver & Robert

 [1] http://commoncrawl.org/

_____________________________________________________________________ SOCNET is a service of INSNA, the professional association for social network researchers (http://www.insna.org). To unsubscribe, send an email message to [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of the message.


-- 
Robert Meusel
Chair of Information Systems V
Web-based Systems Group
Universität Mannheim
B6, 26, Room C1.04
D-68159 Mannheim
Phone: +49 621 181 2648
Mail: [log in to unmask]
Web: dws.informatik.uni-mannheim.de
_____________________________________________________________________ SOCNET is a service of INSNA, the professional association for social network researchers (http://www.insna.org). To unsubscribe, send an email message to [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of the message.