Skip to content

Instantly share code, notes, and snippets.

@jbarrasa
Last active July 27, 2016 14:04
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save jbarrasa/de9dc7e588c3ee3ed52d3bfdddbd5955 to your computer and use it in GitHub Desktop.

European Cities, Political Parties and Ideologies

I guess there are two interesting things in this gist, one is the graph itself but the other ones is the fact that it’s built directly from querying an RDF triple store via a SPARQL endpoint and consuming the output of the SPARQL query directly with 'LOAD CSV' in Cypher.

The dataset

For this example I’ve used DBpedia data about European cities, the political parties in their local governments and their ideologies. So big disclaimer, this data is not meant to be complete or more accurate than what can be expected from DBpedia/Wikipedia data. The DBpedia stores data using the RDF model so in order to query it we’ll have to use the SPARQL query language.

Here is the SPARQL query I’ve used to extract the cities and countries and the political parties currently in the local government. The query can be tested on the DBPedia SPARQL endpoint.

select distinct ?party ?city ?cityName ?ctr ?ctrName ?pop
where {

     ?city a dbo:Location ;
          rdfs:label ?cityName ;
           dbo:country ?ctr ;
           dbo:leaderParty ?party .
     FILTER(langMatches(lang(?cityName), "EN"))

     ?ctr dct:subject dbc:Countries_in_Europe ;
          rdfs:label ?ctrName .
     FILTER(langMatches(lang(?ctrName), "EN"))

     optional { ?city dbo:populationTotal ?pop }
}

And a second SPARQL query that returns for each party, the ideologies they are associated with. Again, according to DBpedia/Wikipedia of the different political parties.

select distinct ?party ?partyName ?ideology ?ideologyName where {

     ?city a dbo:Location ;
           dbo:country ?ctr ;
           dbo:leaderParty ?party .

     ?ctr dct:subject dbc:Countries_in_Europe .

     ?party dbo:ideology ?ideology ;
            rdfs:label ?partyName .
     FILTER(langMatches(lang(?partyName), "EN"))

     ?ideology rdfs:label ?ideologyName
     FILTER(langMatches(lang(?ideologyName), "EN"))
}

SPARQL queries of type SELECT return a set of variable bindings, or in other words, a tabular structure that the endpoint can serialise as CSV. This is quite convenient because it can be used directly by the LOAD CSV instruction in Cypher.

Loading the data into Neo4j

Here are the two Cypher statements that create the model in Neo4j. You may also want to create an index on city nodes first to get better performance:

CREATE INDEX ON :City(city_uri))
WITH "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fparty+%3Fcity+%3FcityName+%3Fctr+%3FctrName+%3Fpop+where+%7B%0D%0A%3Fcity+a+dbo%3ALocation+%3B%0D%0A++++++dbo%3Acountry+%3Fctr+%3B%0D%0A++++++dbo%3AleaderParty+%3Fparty+.%0D%0A%0D%0A%3Fctr+dct%3Asubject+dbc%3ACountries_in_Europe+.%0D%0A%0D%0A%3Fparty+dbo%3Aideology+%3Fideology+.%0D%0A%0D%0Aoptional+%7B+%3Fcity+dbo%3ApopulationTotal+%3Fpop+%7D%0D%0A%0D%0A%3Fcity+rdfs%3Alabel+%3FcityName%0D%0AFILTER%28langMatches%28lang%28%3FcityName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%3Fctr+rdfs%3Alabel+%3FctrName%0D%0AFILTER%28langMatches%28lang%28%3FctrName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%7D&format=csv" AS queryOnSPARQLEndpoint
LOAD CSV WITH HEADERS FROM queryOnSPARQLEndpoint AS row
WITH row LIMIT 500
MERGE (cty:City {city_uri: row.city}) SET cty.city_pop= coalesce(row.pop,0), cty.city_name = row.cityName
MERGE (ctr:Country {ctry_uri: row.ctr}) SET ctr.ctry_name = row.ctrName
MERGE (pty:Party {party_uri: row.party})
MERGE (cty)-[:CTY_IN_COUNTRY]->(ctr)
MERGE (cty)-[:GOVERNING_PARTY]->(pty)

Note that the query runs directly off the DBpedia SPARQL endpoint (same for the next one) so if the endpoint is down for maintenance or any other reason, this graphgist won’t do much :)

Another point worth mentioning is the 'LIMIT 500' in the query. This is just to avoid hitting the limit when running the query as a GraphGist, but of course feel free to remove it if you run it on your own Neo4j instance.

WITH "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fparty+%3FpartyName+%3Fideology+%3FideologyName+where+%7B%0D%0A%0D%0A%3Fcity+a+dbo%3ALocation+%3B%0D%0A++++++dbo%3Acountry+%3Fctr+%3B%0D%0A++++++dbo%3AleaderParty+%3Fparty+.%0D%0A%0D%0A%3Fctr+dct%3Asubject+dbc%3ACountries_in_Europe+.%0D%0A%0D%0A%3Fparty+dbo%3Aideology+%3Fideology+%3B%0D%0A+++++++rdfs%3Alabel+%3FpartyName+.%0D%0AFILTER%28langMatches%28lang%28%3FpartyName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%3Fideology+rdfs%3Alabel+%3FideologyName%0D%0AFILTER%28langMatches%28lang%28%3FideologyName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%7D&format=csv" AS queryOnSPARQLEndpoint
LOAD CSV WITH HEADERS FROM queryOnSPARQLEndpoint AS row
WITH row
MERGE (pty:Party {party_uri: row.party}) SET pty.party_name = row.partyName
MERGE (ide:Ideology {ideology_uri: row.ideology}) SET ide.ideology_name = row.ideologyName
MERGE (pty)-[:HAS_IDEOLOGY]->(ide)

And that’s pretty much it. The data in your graph should look something like this for a given city. Bern in Switzerland:

MATCH  (ctr:Country)<-[cic:CTY_IN_COUNTRY]->(cty:City {city_name:'Bern'})-[gp:GOVERNING_PARTY]->(party:Party)-[hi:HAS_IDEOLOGY]->(id:Ideology) return *

Or like this for a few cities in Spain:

MATCH  (ctr:Country {ctry_name: 'Spain'})<-[cic:CTY_IN_COUNTRY]->(cty:City) WITH ctr, cic,  cty LIMIT 10
MATCH (cty)-[gp:GOVERNING_PARTY]->(party:Party)-[hi:HAS_IDEOLOGY]->(id:Ideology) return *

Some queries

Now a couple of interesting queries:

Which are the most transnational ideologies in Europe?

MATCH (ctr:Country)<-[:CTY_IN_COUNTRY]-(city)-[:GOVERNING_PARTY]->(party)-[:HAS_IDEOLOGY]->(i:Ideology)
RETURN i.ideology_name, COUNT(DISTINCT(ctr)) AS presentInCountries, COLLECT(DISTINCT(ctr.ctry_name)) AS CountryList
ORDER BY presentInCountries DESC LIMIT 10

Which are the most ideologically similar parties from different countries?

MATCH (p1:Party)-[:HAS_IDEOLOGY]->(i)<-[:HAS_IDEOLOGY]-(p2:Party)
WHERE (ID(p1) < ID(p2))
WITH p1, p2, COUNT(DISTINCT(i)) AS sharedIdeologyCount, COLLECT(DISTINCT(i.ideology_name)) AS sharedIdeologies
WHERE sharedIdeologyCount > 2
MATCH (p1)<-[:GOVERNING_PARTY]-()-[:CTY_IN_COUNTRY]->(ctr1), (p2)<-[:GOVERNING_PARTY]-()-[:CTY_IN_COUNTRY]->(ctr2)
WHERE ctr1 <> ctr2
RETURN DISTINCT p1.party_name, ctr1.ctry_name, p2.party_name, ctr2.ctry_name, sharedIdeologyCount, sharedIdeologies ORDER BY sharedIdeologyCount DESC LIMIT 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment