***** To join INSNA, visit http://www.insna.org *****
Robert,

Thank you very much for the link.

For those of you who can't wait to see the network parameters, I collected some descriptive statistics.
The network has 838,105 nodes and 1,208,613 edges. There are 11,816 connected components, of which the largest one (the GCC) has 730,601 nodes and 1,104,306 edges. The component sizes have a roughly power law distribution with the exponent of -0.026.

The GCC has an excellent community structure: 2009 community, modularity m~0.92. The largest community consists of 106,184 nodes.

One of the communities contains references to Sergey Roldugin, Russian cellist and close friend of the Russian president. The SVG file showing the structure of that community can be found at https://github.com/dzinoviev/PanamaPapers/blob/master/cc32.svg.  (Right click on the 'Raw' button and select "Save link as...". I believe most modern Web browsers are capable of displaying SVG files directly.) Node sizes represent degrees, and colors represent second-level communities. Numeric node names represent "THE BEARER." (Apparently an anonymous official.)

If anyone is interested, I can upload Python files for reading the dataset into the repository.

On Mon, May 9, 2016 at 3:50 PM, Robert Marriott <[log in to unmask]> wrote:
***** To join INSNA, visit http://www.insna.org *****
Valdis Krebs mentioned the Panama Papers a couple weeks ago. Well, the full dataset of the Panama Papers, as well as a related offshore entity investigation, have just been released by the ICIJ.  I'm not affiliated with the releasing organizations in any way, but there's likely to be a feeding frenzy on this, so I wanted to be sure the listserv received immediate notice once the data became available.

The dataset is already up online in a graphical network format for the casual use of the public and media. More interesting for our purposes, the whole enchiada is available in multiple csv formats. 


A word of warning, it is truly enormous; the edgelist file is beyond the line limit for excel. Also bear in mind that the dataset is subject to some form of GPL-style open data licensing that I'm not familiar with- I recommend checking that information before you rev up your preferred analysis tool.

In the event that folks haven't been following the news, the Panama Papers represent a massive leak of offshore corporate entity information (~320,000 entities) from the Panamanian law firm Mossack Fonseca. Offshore entities of the sort included in the leak can have legitimate or legal purposes, but they are primarily seen as a means of tax avoidance or evasion, as well as a venue of organized criminal activity. This is particularly the case with firms like the one targeted in the leak. Until today, released information from the leak had been more selective, but individual disclosures had already brought down the Icelandic PM. This seems like a hot potato, but the sheer size of the leak is making it difficult for the responsible organizations, or the press, to process.

Regards,

Robert Marriott

Penn State University

_____________________________________________________________________ SOCNET is a service of INSNA, the professional association for social network researchers (http://www.insna.org). To unsubscribe, send an email message to [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of the message.



--
Dmitry Zinoviev
Professor of Computer Science
Suffolk University, Boston, MA 02114
_____________________________________________________________________ SOCNET is a service of INSNA, the professional association for social network researchers (http://www.insna.org). To unsubscribe, send an email message to [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the body of the message.