Print

Print


*****  To join INSNA, visit http://www.insna.org  *****

I spent some time on the Panama Papers metadata and the first thing that
comes to mind for making the problem more manageable is the elimination of
things which no longer exist.

Unique_ID;Entity_ID1;Entity_ID2;description_;date_from;date_to;direction;chinesePos;linkType

There are 562016 entities in edges_1DNW.csv but only 230565 have a date
attribute. I spent some time pawing around with awk and it looks like only
34853 entities have both a start and end date.

We have to keep in mind this is metadata created from an overwhelming sized
document cache, the lack of end dates gives me the feeling that maybe the
start dates are explicit on most documents and perhaps were automatically
extracted. Those end dates may be inferred and/or manual entries that exist
for just the documents that were of great interest.


I'm new here and definitely not academic. Part of what I do for a living is
digging through intrusion data to see what can be discerned from it. Having
this metadata is great, but it's just a partial outline of what's in those
documents. Processing the full text might reveal other links, things like
multiple buildings in an office park where the only commonality is the
burst of development at a specific place and time. Really Getting at the
goodies is going to require people with domain specific knowledge pulling
threads and correlating them with external events.









On Sat, May 14, 2016 at 6:55 AM, Moses Boudourides <
[log in to unmask]> wrote:

> *****  To join INSNA, visit http://www.insna.org  *****
>
> Hello,
>
> I've been also spending some time with the Panama Papers dataset.
> However, in what concerns the network structure that could be
> extracted from this data set, it's not yet very clear to me which
> relations could be used for that purpose. The relational part of the
> dataset is the file all_edges.csv, which contains 3 columns: the first
> and the third columns contain node_ids and the second refers to the
> following five types of relations that associate a node of the first
> column with the corresponding node of the third column:
> 'intermediary_of', 'officer_of', 'registered_address', 'similar',
> 'underlying'. Apparently only the fourth type ('similar') is symmetric
> (undirected) and all the other four types are obviously directed.
>
> Given that the total number of all relations (edges) is very high
> (1265690), I was wondering what sort of aggregations among types of
> relations might simplify the complexity of the Panama papers network.
>
> I would appreciate if someone is willing to share any ideas about a
> meaningful aggregation scheme for relations. Of course, one could
> disregard any sort of relational aggregation and treat the network as
> a multilayered (multiplex) one, although the size and the complexity
> of the Panama Papers network appear to be rather restraining.
>
> I should add that, following Dmitry Zinoviev's original work on this
> dataset, at the moment, I can make a number of computations and
> visualizations for parts of the network (I'm using Python's Networkx
> and Lightning-Python for interactive visualizations). For instance,
> being motivated by what Dmitry is doing, I've managed to analyze the
> ego-networks extracted from egos which are nodes of certain type
> (officers, intermediaries, addresses, entities) associated with a
> particular country and being connected to alters according to a
> certain relationship type.
>
> For instance, this is the (symmetric) network in the case that egos
> are Greek officers and alters correspond to international entities and
> addresses (aggregated by all types of relations-edges):
>
>
> http://public.lightning-viz.org/visualizations/0a2669fc-f8bd-4cbf-b66f-1110f63c49df/public/
>
> (This is just an example: I can produce such ego-centric networks for
> any country in the Panama Papers data.)
>
> Admittedly, I'm not pleased with the aggregation of relations I'm
> doing here (perhaps the inclusion of addresses was redundant too) and,
> thus, I would ask for your ideas, comments or suggestions.
>
> --Moses
>
> On Mon, May 9, 2016 at 10:50 PM, Robert Marriott <[log in to unmask]> wrote:
> > ***** To join INSNA, visit http://www.insna.org *****
> > Valdis Krebs mentioned the Panama Papers a couple weeks ago. Well, the
> full
> > dataset of the Panama Papers, as well as a related offshore entity
> > investigation, have just been released by the ICIJ.  I'm not affiliated
> with
> > the releasing organizations in any way, but there's likely to be a
> feeding
> > frenzy on this, so I wanted to be sure the listserv received immediate
> > notice once the data became available.
> >
> > The dataset is already up online in a graphical network format for the
> > casual use of the public and media. More interesting for our purposes,
> the
> > whole enchiada is available in multiple csv formats.
> >
> > https://www.occrp.org/en/panamapapers/database
> >
> > A word of warning, it is truly enormous; the edgelist file is beyond the
> > line limit for excel. Also bear in mind that the dataset is subject to
> some
> > form of GPL-style open data licensing that I'm not familiar with- I
> > recommend checking that information before you rev up your preferred
> > analysis tool.
> >
> > In the event that folks haven't been following the news, the Panama
> Papers
> > represent a massive leak of offshore corporate entity information
> (~320,000
> > entities) from the Panamanian law firm Mossack Fonseca. Offshore
> entities of
> > the sort included in the leak can have legitimate or legal purposes, but
> > they are primarily seen as a means of tax avoidance or evasion, as well
> as a
> > venue of organized criminal activity. This is particularly the case with
> > firms like the one targeted in the leak. Until today, released
> information
> > from the leak had been more selective, but individual disclosures had
> > already brought down the Icelandic PM. This seems like a hot potato, but
> the
> > sheer size of the leak is making it difficult for the responsible
> > organizations, or the press, to process.
> >
> > Regards,
> >
> > Robert Marriott
> >
> > Penn State University
> >
> > _____________________________________________________________________
> SOCNET
> > is a service of INSNA, the professional association for social network
> > researchers (http://www.insna.org). To unsubscribe, send an email
> message to
> > [log in to unmask] containing the line UNSUBSCRIBE SOCNET in the
> body of
> > the message.
>
> _____________________________________________________________________
> SOCNET is a service of INSNA, the professional association for social
> network researchers (http://www.insna.org). To unsubscribe, send
> an email message to [log in to unmask] containing the line
> UNSUBSCRIBE SOCNET in the body of the message.
>



-- 
mailto:[log in to unmask]

_____________________________________________________________________
SOCNET is a service of INSNA, the professional association for social
network researchers (http://www.insna.org). To unsubscribe, send
an email message to [log in to unmask] containing the line
UNSUBSCRIBE SOCNET in the body of the message.