Skip to content

Instantly share code, notes, and snippets.

@rvanbruggen
Last active November 10, 2021 21:55
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rvanbruggen/00d259a453de13106091e2d507c2d86c to your computer and use it in GitHub Desktop.
Save rvanbruggen/00d259a453de13106091e2d507c2d86c to your computer and use it in GitHub Desktop.
Pandora Papers - the Power Players

The Pandora Papers Guide

Graphic courtesy of ICIJ

Background

The Pandora Papers have rocked the world. News organisations began publishing their explosive contents on October 3, the giant leak has dominated headlines and posed questions of some of the world’s most powerful people and their financial propriety.

Some highlights:

  • a 2.94 terabyte data trove
  • from more than 200 countries and territories
  • more than 330 politicians
  • 130 Forbes billionaires, as well as celebrities, fraudsters, drug dealers, royal family members and leaders of religious groups around the world.

The Pandora Papers investigation is the world’s largest-ever journalistic collaboration, involving more than 600 journalists from 150 media outlets in 117 countries.


Lots of information can be found online

Here are some great articles:


The Pandora Papers Graph

Not all the data has been made available yet - but some data is already there.

Here's Data model:

Our friend @Mesirii has also created a couple of nice elements that we can include: take a look at this Github page.


The Power Players dataset

Here's the import statement:

// import nodes from JSON which is an array of nodes/edges for each power-player

CALL apoc.load.jsonArray("https://gist.githubusercontent.com/jexp/8afb65325bf99014c8de68a9511a549b/raw/pandora.json") yield value
WITH collect(value) as values
UNWIND range(0, size(values)-1) as set
WITH apoc.convert.toMap(values[set]) as value, set
UNWIND value.nodes as n
CALL apoc.merge.node(n.data.categories, {node_id: n.data.properties.node_id},n.data.properties) YIELD node
SET node.id=set+"-"+n.id
RETURN count(*);

CALL apoc.load.jsonArray("https://gist.githubusercontent.com/jexp/8afb65325bf99014c8de68a9511a549b/raw/pandora.json") yield value
WITH collect(value) as values
UNWIND range(0, size(values)-1) as set
WITH apoc.convert.toMap(values[set]) as value, set
UNWIND value.edges as e
MATCH (n) where n.id=set+"-"+e.source
MATCH (m) where m.id=set+"-"+e.target
CALL apoc.create.relationship(n, e.data.type, 
apoc.map.clean(e.data.properties,["edge_id","power_player_profile_id"],[]) ,m) yield rel
RETURN count(*);

// merge duplicate entities by name
MATCH (e:Entity)
WITH e.name as name, collect(e) as nodes
WHERE size(nodes) > 1
CALL apoc.refactor.mergeNodes(nodes, {properties:"discard", mergeRels:true}) YIELD node return count(*);

CREATE INDEX on :Entity(name);
CREATE INDEX on :Officer(name);
CREATE INDEX on :Entity(provider);

Some refactoring / graphifying:

Some additional refactoring that I have found useful - basically pulling the properties out of the Officer and Entity nodes, and making them into their own Nodes that are then connected:

  • Linking Entity to Country:
    MATCH (n:Entity)
    MERGE (c:Country {name: coalesce(n.jurisdiction,"UNKNOWN")})
    MERGE (n)-[:RESIDES_IN]->(c);
  • Linking Officer to Country:
    MATCH (o:Officer)
    MERGE (c:Country {name: coalesce(o.country, "UNKNOWN")})
    MERGE (o)-[:CITIZEN_OF]->(c);
  • Linking Entities to data Providers:
    MATCH (n:Entity)
    MERGE (p:Provider {name: coalesce(n.provider,"UNKNOWN")})
    MERGE (n)-[:PROVIDED_BY]->(p);

This gives for a richer graph model:

Exploring the Power Player dataset

Entities covered and their jurisdiction:

MATCH (n:Entity)
RETURN distinct n.jurisdiction, count(n);

Entities in the British Virgin Islands:

MATCH (o:Officer)-[rel]->(e:Entity) 
WHERE e.jurisdiction CONTAINSBritish Virgin IslandsRETURN o, rel, e;

Most frequent providers

MATCH (e:Entity) return e.provider, count(*) as c order by c desc;

Most frequent jurisdiction

MATCH (e:Entity) return e.jurisdiction, count(*) as c order by c desc;

Most frequent Officers / Entities in countries

MATCH (c1:Country)<--(o:Officer)-->(e:Entity)--(c2:Country)
WITH distinct c1.name as OfficerCountry, c2.name as EntityCountry, count(*) as PatternFrequency
WHERE PatternFrequency >= 5
RETURN OfficerCountry, EntityCountry, PatternFrequency
ORDER BY PatternFrequency desc;

Here’s an example for the Aliyev family in Azerbaijan.

MATCH (o:Officer)-->(e:Entity)
WHERE toLower(o.name) CONTAINS 'aliyev'
RETURN *;

Or look at the Blairs

MATCH (o:Officer)-[*..2]->(conn)
WHERE o.name contains "Blair"
RETURN *;

Clearly there's a lot more to come here, and I know that I will be eagerly looking out for more details from our heroes at the ICIJ.

Stay tuned!

{
"title": "Pandora Papers Dashboard",
"version": "1.1",
"editable": false,
"pagenumber": 0,
"pages": [
{
"title": "INTRODUCTION",
"reports": [
{
"title": "The Pandora Papers Guide",
"width": 12,
"height": 8,
"type": "text",
"query": "![](https://dist.neo4j.com/wp-content/uploads/20211005124023/PowerPlayers_2.png)\n\nGraphic [courtesy of ICIJ](www.icij.org)\n\n## Background\n[The Pandora Papers](https://www.icij.org/investigations/pandora-papers/global-investigation-tax-havens-offshore/) have rocked the world. News organisations began publishing their explosive contents on October 3, the giant leak has dominated headlines and posed questions of some of the world’s most powerful people and their financial propriety. \n\nSome highlights:\n* a 2.94 terabyte data trove \n* from more than 200 countries and territories\n* more than 330 politicians \n* 130 Forbes billionaires, as well as celebrities, fraudsters, drug dealers, royal family members and leaders of religious groups around the world.\n\nThe Pandora Papers investigation is the world’s largest-ever journalistic collaboration, involving more than 600 journalists from 150 media outlets in 117 countries.\n\n## Lots of information can be found online\nHere are some great articles:\n\n* It news Asia: [How new tech helped to expose the Pandora Papers leak](https://www.itnews.asia/news/how-new-tech-helped-to-expose-the-pandora-papers-leak-571485)\n* Data Storage Asea: [Connecting the Dots: How Graph Technology Made the Pandora Papers Possible](https://datastorageasean.com/expert-opinions-executive-interviews/connecting-dots-how-graph-technology-made-pandora-papers)\n* Wired: [How the secrets of the Pandora Papers were freed](https://www.wired.co.uk/article/pandora-papers-leak)\n* (Pandora Papers: An offshore data tsunami)[https://www.icij.org/investigations/pandora-papers/about-pandora-papers-leak-dataset/]\n",
"page": 1,
"properties": [],
"parameters": "",
"refresh": 0
},
{}
]
},
{
"title": "IMPORTING THE DATA",
"reports": [
{
"title": "The Power Players dataset",
"width": 12,
"height": 4,
"type": "text",
"query": "#### Import nodes from JSON which is an array of nodes/edges for each power-player\n\n```cypher\nCALL apoc.load.jsonArray(\"https://gist.githubusercontent.com/jexp/8afb65325bf99014c8de68a9511a549b/raw/pandora.json\") yield value\nWITH collect(value) as values\nUNWIND range(0, size(values)-1) as set\nWITH apoc.convert.toMap(values[set]) as value, set\nUNWIND value.nodes as n\nCALL apoc.merge.node(n.data.categories, {node_id: n.data.properties.node_id},n.data.properties) YIELD node\nSET node.id=set+\"-\"+n.id\nRETURN count(*);\n\nCALL apoc.load.jsonArray(\"https://gist.githubusercontent.com/jexp/8afb65325bf99014c8de68a9511a549b/raw/pandora.json\") yield value\nWITH collect(value) as values\nUNWIND range(0, size(values)-1) as set\nWITH apoc.convert.toMap(values[set]) as value, set\nUNWIND value.edges as e\nMATCH (n) where n.id=set+\"-\"+e.source\nMATCH (m) where m.id=set+\"-\"+e.target\nCALL apoc.create.relationship(n, e.data.type, \napoc.map.clean(e.data.properties,[\"edge_id\",\"power_player_profile_id\"],[]) ,m) yield rel\nRETURN count(*);\n```\n#### Merge duplicate entities by name\n```cypher\nMATCH (e:Entity)\nWITH e.name as name, collect(e) as nodes\nWHERE size(nodes) > 1\nCALL apoc.refactor.mergeNodes(nodes, {properties:\"discard\", mergeRels:true}) YIELD node return count(*);\n```\n\n#### And then create some indexes:\n```cypher\nCREATE INDEX on :Entity(name);\nCREATE INDEX on :Officer(name);\nCREATE INDEX on :Entity(provider);\n```\n",
"page": 1,
"properties": [],
"parameters": "",
"refresh": 0
},
{}
]
},
{
"title": "ENHANCING THE DATA",
"reports": [
{
"title": "Some refactoring / graphifying:",
"width": 12,
"height": 8,
"type": "text",
"query": "Some additional refactoring that I have found useful - basically pulling the properties out of the `Officer` and `Entity` nodes, and making them into their own Nodes that are then connected:\n\n* Linking Entity to Country:\n ```cypher\n MATCH (n:Entity)\n MERGE (c:Country {name: coalesce(n.jurisdiction,\"UNKNOWN\")})\n MERGE (n)-[:RESIDES_IN]->(c);\n ```\n* Linking Officer to Country:\n ```cypher\n MATCH (o:Officer)\n MERGE (c:Country {name: coalesce(o.country, \"UNKNOWN\")})\n MERGE (o)-[:CITIZEN_OF]->(c);\n ```\n* Linking Entities to data Providers:\n ```cypher\n MATCH (n:Entity)\n MERGE (p:Provider {name: coalesce(n.provider,\"UNKNOWN\")})\n MERGE (n)-[:PROVIDED_BY]->(p);\n ```\n\nThis gives for a richer graph model:\n![](https://drive.google.com/uc?export=view&id=1bHk09Uw1icWAA-HwQPDTdwDLHKGboIw6)",
"page": 1,
"properties": [],
"parameters": "",
"refresh": 0
},
{}
]
},
{
"title": "EXPLORING THE DATA",
"reports": [
{
"title": "Entities covered and their jurisdiction:",
"width": 6,
"height": 4,
"type": "table",
"query": "MATCH (n:Entity)\nRETURN distinct n.jurisdiction, count(n);",
"page": 1,
"properties": [],
"parameters": "",
"refresh": 0
},
{
"title": "Most frequent providers",
"width": 6,
"height": 4,
"type": "table",
"query": "MATCH (e:Entity) return e.provider, count(*) as c order by c desc;\n",
"page": 1,
"properties": [],
"parameters": "",
"refresh": 0
},
{
"title": "Entities in the British Virgin Islands:",
"width": 12,
"height": 8,
"type": "graph",
"query": "MATCH (o:Officer)-[rel]->(e:Entity) \nWHERE e.jurisdiction CONTAINS 'British Virgin Islands'\nRETURN o, rel, e;\n",
"page": 29,
"properties": [
"name",
"name"
],
"parameters": "",
"refresh": 0
},
{
"title": "Most frequent Officers / Entities in countries",
"width": 12,
"height": 4,
"type": "table",
"query": "MATCH (c1:Country)<--(o:Officer)-->(e:Entity)--(c2:Country)\nWITH distinct c1.name as OfficerCountry, c2.name as EntityCountry, count(*) as PatternFrequency\nWHERE PatternFrequency >= 5\nRETURN OfficerCountry, EntityCountry, PatternFrequency\nORDER BY PatternFrequency desc;",
"page": 1,
"properties": [],
"parameters": "",
"refresh": 0
},
{}
]
},
{
"title": "ABOUT ALIYEV and BLAIR",
"reports": [
{
"title": "Example for the Aliyev family in Azerbaijan",
"width": 12,
"height": 8,
"type": "graph",
"query": "MATCH (o:Officer)-[r*..2]->(e:Entity)\nWHERE toLower(o.name) CONTAINS 'aliyev'\nRETURN *;\n",
"page": 28,
"properties": [
"name",
"name"
],
"parameters": "",
"refresh": 0
},
{
"title": "Example for the Blairs",
"width": 12,
"height": 4,
"type": "graph",
"query": "MATCH (o:Officer)-[r*..3]->(conn)\nWHERE o.name contains \"Blair\"\nRETURN *;",
"page": 15,
"properties": [
"name",
"name",
"name",
"name"
],
"parameters": "",
"refresh": 0
},
{}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment