Skip to content

Instantly share code, notes, and snippets.

@jexp
Last active March 27, 2019 02:54
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jexp/0706b1668a29451a917b08ffd7aa9b97 to your computer and use it in GitHub Desktop.
Save jexp/0706b1668a29451a917b08ffd7aa9b97 to your computer and use it in GitHub Desktop.
More detailed Neo4j Import of TrumpWorld from Buzzfeed https://www.buzzfeed.com/johntemplon/help-us-map-trumpworld

The TrumpWorld Graph

Importing the Buzzfeed TrumpWorld Dataset into Neo4j

I came across this tweet by Sanchez Castro (@SCHZCAS) which I’m more than happy to support.

Hey @neo4j please do this once again!!! #panamapapers #neo4j
https://t.co/B8pjxNKCyA

— sanchezcastro (@SCHZCAS) January 15, 2017

As part of the Buzzfeed article Help Us Map TrumpWorld, the four investigative journalists, John Templon, Alex Campbell, Anthony Cormier, and Jeremy Singer-Vine asked the public to help them map and analyze the data that they investigated, confirmed and published:

Now we are asking the public to use our data to find connections we may have missed, and to give us context we don’t currently understand. We hope you will help us — and the public — learn more about TrumpWorld and how this unprecedented array of businesses might affect public policy.

Setup with Local Database

If you want to work with this data locally, please download and install Neo4j 3.1.

When using Neo4j Desktop, you can find configuration and directories on the Options pane which you open with the button of the same name.

If you want to run this guide inside of your Neo4j-Browser, you can edit the database configuration from the Options pane and add the following setting:

browser.remote_content_hostname_whitelist=http://portal.graphgist.org

Then restart the database and you’re ready to roll, by running this command:

:play http://portal.graphgist.org/graph_gists/trumpworld-graph/graph_guide
Neo4j Desktop Options

To import the data with richer relationships, we need a few user defined functions and procedures from the APOC (Awesome Procedures for Cypher) library. Just grab the release for Neo4j 3.1 and place it in the $NEO4J_HOME/plugins directory.

(In the Neo4j Desktop, the plugins directory resides inside the graph database directory).

Looking at the Data

The Buzzfeed article linked int he tweet points to a Google Spreadsheet with collected and verified/researched data of relationships of Trump organizations to other organizations, in total we have 770 organizations (as of today) with 611 relationships.

The data is fortunately available as a public google doc, so we can use our old trick of loading the CSV-Download URL with LOAD CSV into Neo4j.

Let’s have a look at the data first, here as a sample:

WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1996904412' AS url
LOAD CSV WITH HEADERS FROM url AS row
RETURN row.`Entity A`,row.`Entity A Type`, row.`Entity B`,row.`Entity B Type`,row.Connection, row.`Source(s)`
LIMIT 5
╒════════════════════════════════════╤═════════════════════╤═══════════════════════════════╤═════════════════════╤════════════════╤══════════════════════════════════════════════════════════════════════════════════════╕
│"row.`Entity A`"                    │"row.`Entity A Type`"│"row.`Entity B`"               │"row.`Entity B Type`"│"row.Connection"│"row.`Source(s)`"                                                                     │
╞════════════════════════════════════╪═════════════════════╪═══════════════════════════════╪═════════════════════╪════════════════╪══════════════════════════════════════════════════════════════════════════════════════╡
│"4 SHADOW TREE LANE MEMBER CORP."   │"Organization"       │"4 SHADOW TREE LANE LLC"       │"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
├────────────────────────────────────┼─────────────────────┼───────────────────────────────┼─────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│"40 WALL DEVELOPMENT ASSOCIATES LLC"│"Organization"       │"40 WALL STREET LLC"           │"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
├────────────────────────────────────┼─────────────────────┼───────────────────────────────┼─────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│"40 WALL STREET LLC"                │"Organization"       │"40 WALL STREET COMMERCIAL LLC"│"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
├────────────────────────────────────┼─────────────────────┼───────────────────────────────┼─────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│"40 WALL STREET MEMBER CORP."       │"Organization"       │"40 WALL STREET LLC"           │"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
├────────────────────────────────────┼─────────────────────┼───────────────────────────────┼─────────────────────┼────────────────┼──────────────────────────────────────────────────────────────────────────────────────┤
│"401 MEZZ VENTURE LLC"              │"Organization"       │"401 NORTH WABASH VENTURE LLC" │"Organization"       │"Ownership"     │"https://www.documentcloud.org/documents/2838696-Trump-2016-Financial-Disclosure.html"│
└────────────────────────────────────┴─────────────────────┴───────────────────────────────┴─────────────────────┴────────────────┴──────────────────────────────────────────────────────────────────────────────────────┘

The most frequently mentioned organizations found by this query:

WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1996904412' AS url
LOAD CSV WITH HEADERS FROM url AS row
WITH row
WHERE row.`Entity A Type` = 'Organization' AND row.`Entity B Type` = 'Organization'
UNWIND  [row.`Entity A`, row.`Entity B`] AS org
RETURN org, count(*)
ORDER BY count(*) DESC LIMIT 10

We see the usual suspect on top.

╒══════════════════════════════╤══════════╕
│"org"                         │"count(*)"│
╞══════════════════════════════╪══════════╡
│"THRIVE CAPITAL"              │"84"      │
├──────────────────────────────┼──────────┤
│"MERCER FAMILY FOUNDATION"    │"41"      │
├──────────────────────────────┼──────────┤
│"40 WALL STREET LLC"          │"40"      │
├──────────────────────────────┼──────────┤
│"DJT HOLDINGS LLC"            │"35"      │
├──────────────────────────────┼──────────┤
│"KUSHNER COMPANIES"           │"30"      │
├──────────────────────────────┼──────────┤
│"TRUMP HOTELS & CASINO RESORTS│"28"      │
│, INC."                       │          │
├──────────────────────────────┼──────────┤
│"TRUMP TOWER COMMERCIAL LLC"  │"21"      │
├──────────────────────────────┼──────────┤
│"TRUMP ORGANIZATION LLC"      │"18"      │
├──────────────────────────────┼──────────┤
│"THE TRUMP ORGANIZATION, INC."│"13"      │
├──────────────────────────────┼──────────┤
│"TTTT VENTURE LLC"            │"8"       │
└──────────────────────────────┴──────────┘

What kind of relationships exist in the data:

WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1996904412' AS url
LOAD CSV WITH HEADERS FROM url AS row
RETURN row.Connection AS type, count(*)
ORDER BY count(*) DESC
╒════════════════════════════════════════════════════════════════════════════════════════════════════╤══════════╕
│"type"                                                                                              │"count(*)"│
╞════════════════════════════════════════════════════════════════════════════════════════════════════╪══════════╡
│"Ownership"                                                                                         │309       │
├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│"Investor"                                                                                          │90        │
├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│"Owns collateralized debt"                                                                          │54        │
├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│"Subsidiary"                                                                                        │41        │
├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│"NKA/FKA"                                                                                           │28        │
├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│"DBA"                                                                                               │16        │
....
│"Anbang would become one of the equity partners in the [666 Fifth Avenue] redevelopment if an agreem│1         │
│ent is finalized, per New York Times"                                                               │          │
├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│"an investment firm involved in Trump Organization projects abroad, per New York Times"             │1         │
├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│"Kushner-affiliated corporate entity, per New York Magazine"                                        │1         │
│"Lobbied for"                                                                                       │1         │
├────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────┤
│"Partners on Trump Hotel Rio de Janeiro"                                                            │1         │
└────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────┘

While some of them like Ownership, or Investor are straightforward others (Anbang would become one of the equity partners in the [666 Fifth Avenue] redevelopment if an agreement is finalized, per New York Times) are very specific, and probably not a good choice for relationship-type to query on.

So we have two options, one could be to use a [simple, direct import] using generic relationships and put all the Connection information into a property, or alternatively we do some cleanup/unification and have a richer set of relationships.

Here we want to look into the more detailed variant.

We want to map those relationship types to fewer, more distinct ones.

Cleaning Relationship-Types

WITH
'https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=csv&gid=1996904412' AS url,
['LOAN','LOBBIED','SALE','SUPPLIER','SHAREHOLDER','LICENSES','AFFILIATED','TIES','NEGOTIATION','INVOLVED','PARTNER'] AS terms
LOAD CSV WITH HEADERS FROM url AS row
WITH terms, row WHERE row.`Entity A Type` = 'Organization' AND row.`Entity B Type` = 'Organization'
WITH apoc.text.regreplace(toUpper(row.Connection),'\\W+','_') AS type, row, terms
WITH head(filter(term IN terms WHERE type CONTAINS term)) AS found, type, row
RETURN substring(coalesce(found,type),0,30) AS type, count(*), collect(distinct row.Connection)[0..5] AS connections
ORDER BY count(*) DESC
╒══════════════════════════╤══════════╤════════════════════════════════════════════════════════════════════════════════════════════════════╕
│"type"                    │"count(*)"│"connections"                                                                                       │
╞══════════════════════════╪══════════╪════════════════════════════════════════════════════════════════════════════════════════════════════╡
│"OWNERSHIP"               │309       │["Ownership"]                                                                                       │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"INVESTOR"                │90        │["Investor"]                                                                                        │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"OWNS_COLLATERALIZED_DEBT"│54        │["Owns collateralized debt"]                                                                        │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"SUBSIDIARY"              │41        │["Subsidiary"]                                                                                      │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"NKA_FKA"                 │28        │["NKA/FKA"]                                                                                         │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"PARTNER"                 │19        │["Anbang would become one of the equity partners in the [666 Fifth Avenue] redevelopment if an agree│
│                          │          │ment is finalized, per New York Times","Real-estate partner","Partners on Trump Hotel Rio de Janeiro│
│                          │          │","Partners on The Apprentice, Seasons 1–13","Partners on The Apprentice, Seasons 15–","Partnership"│
│                          │          │,"Partnership on Trump Tower Century City","Business partnership","Partners on Trump Towers Pune","d│
│                          │          │evelopment partner in India, per New York Times","Partners on The Apprentice, Season 14"]           │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"DBA"                     │16        │["DBA"]                                                                                             │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"LOAN"                    │15        │["Loaned money","Loan","springing loan","Term loan and mortgage"]                                   │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"BOUGHT_BUILDING"         │3         │["Bought building"]                                                                                 │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"LICENSES"                │3         │["Licenses \"Trump\" name for Trump Tower Punta del Este","Licenses \"Trump\" name for Trump Towers │
│                          │          │Rio"]                                                                                               │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
....
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"ACQUIRED"                │1         │["Acquired"]                                                                                        │
├──────────────────────────┼──────────┼────────────────────────────────────────────────────────────────────────────────────────────────────┤
│"BUSINESS_RELATIONSHIP"   │1         │["Business relationship"]                                                                           │
└──────────────────────────┴──────────┴────────────────────────────────────────────────────────────────────────────────────────────────────┘

This looks much better now, so that we can start importing the data.

Import Organization Relationships

@flq
Copy link

flq commented Jan 20, 2017

Hey Michael,
I am trying to follow this article and import data into a Neo instance, however, I am getting the following which seems to suggest that the data has somehow changed since you tried this out...

At https://docs.google.com/spreadsheets/u/1/d/1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss/export?format=tsv&id=1Z5Vo5pbvxKJ5XpfALZXvCzW26Cl4we3OaN73K9Ae5Ss&gid=634968401:22 
-  there's a field starting with a quote and whereas it ends that quote there seems to be characters in that field after that ending quote. 
That isn't supported. This is what I read: 
'Anbang would become one of the equity partners in the [666 Fifth Avenue] redevelopment if an agreement is finalized,"'

Update 22.01.
It's working fine again, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment