Skip to content

Instantly share code, notes, and snippets.

@joernhees
Created November 19, 2016 22:04
Show Gist options
  • Save joernhees/5f00020fbaff86d2e59efa15bae0af28 to your computer and use it in GitHub Desktop.
Save joernhees/5f00020fbaff86d2e59efa15bae0af28 to your computer and use it in GitHub Desktop.
DBpedia JSON Extractor
{"comment": "Kaiserslautern ([\u02ccka\u026az\u0250s\u02c8la\u028at\u0250n]) is a city in southwest Germany, located in the Bundesland (State) of Rhineland-Palatinate (Rheinland-Pfalz) at the edge of the Palatinate Forest (Pf\u00e4lzerwald). The historic centre dates to the 9th century. It is 459 kilometres (285 miles) from Paris, 117 km (73 miles) from Frankfurt am Main, and 159 km (99 miles) from Luxembourg.", "pic": "http://commons.wikimedia.org/wiki/Special:FilePath/Kaiserslautern-Stadtwappen.svg", "uri": "http://dbpedia.org/resource/Kaiserslautern", "label": "Kaiserslautern", "areaTotal": "1.3972e+08", "types": ["http://www.w3.org/2002/07/owl#Thing", "http://www.wikidata.org/entity/Q3957", "http://www.wikidata.org/entity/Q486972", "http://dbpedia.org/ontology/PopulatedPlace", "http://dbpedia.org/ontology/Settlement", "http://dbpedia.org/ontology/Town", "http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing", "http://schema.org/Place", "http://dbpedia.org/ontology/Place", "http://dbpedia.org/ontology/Location", "http://umbel.org/umbel/rc/Town", "http://dbpedia.org/class/yago/AdministrativeDistrict108491826", "http://dbpedia.org/class/yago/City108524735", "http://dbpedia.org/class/yago/District108552138", "http://dbpedia.org/class/yago/GeographicalArea108574314", "http://dbpedia.org/class/yago/Location100027167", "http://dbpedia.org/class/yago/Municipality108626283", "http://dbpedia.org/class/yago/Object100002684", "http://dbpedia.org/class/yago/PhysicalEntity100001930", "http://dbpedia.org/class/yago/Region108630985", "http://dbpedia.org/class/yago/Town108665504", "http://dbpedia.org/class/yago/UrbanArea108675967", "http://dbpedia.org/class/yago/YagoGeoEntity", "http://dbpedia.org/class/yago/YagoLegalActorGeo", "http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity", "http://dbpedia.org/class/yago/WikicatCitiesInRhineland-Palatinate", "http://dbpedia.org/class/yago/WikicatImperialFreeCities", "http://dbpedia.org/class/yago/WikicatTownsInRhineland-Palatinate", "http://dbpedia.org/class/yago/WikicatUniversityTownsInGermany"]}
{"comment": "The Federal City of Bonn ([\u02c8b\u0254n]; Latin: Bonna) is a city on the banks of the Rhine and northwest of theSiebengebirge (Seven Mountains) in the German state of North Rhine-Westphalia, with a population of 311,287 within its administrative limits. Bonn serves alongside the capital Berlin as the seat of government of Germany. The city is the second official seat and second official residence of the President of Germany, the Chancellor of Germany, the Bundesrat, and the first official seat and first official residence of six German federal ministries. Bonn is located in the southernmost part of the Rhine-Ruhr region, the largest metropolitan area of Germany, with over 11 million inhabitants.", "pic": "http://commons.wikimedia.org/wiki/Special:FilePath/Bonn\u00dcbersicht.jpg", "uri": "http://dbpedia.org/resource/Bonn", "label": "Bonn", "areaTotal": "1.4106e+08", "types": ["http://www.w3.org/2002/07/owl#Thing", "http://www.wikidata.org/entity/Q486972", "http://www.wikidata.org/entity/Q515", "http://dbpedia.org/ontology/City", "http://dbpedia.org/ontology/PopulatedPlace", "http://dbpedia.org/ontology/Settlement", "http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing", "http://schema.org/City", "http://schema.org/Place", "http://dbpedia.org/ontology/Place", "http://dbpedia.org/ontology/Location", "http://umbel.org/umbel/rc/PopulatedPlace", "http://umbel.org/umbel/rc/Village", "http://umbel.org/umbel/rc/Location_Underspecified", "http://dbpedia.org/class/yago/Abstraction100002137", "http://dbpedia.org/class/yago/AdministrativeDistrict108491826", "http://dbpedia.org/class/yago/Area108497294", "http://dbpedia.org/class/yago/Artifact100021939", "http://dbpedia.org/class/yago/Camp102944826", "http://dbpedia.org/class/yago/Capital108518505", "http://dbpedia.org/class/yago/Center108523483", "http://dbpedia.org/class/yago/City108524735", "http://dbpedia.org/class/yago/District108552138", "http://dbpedia.org/class/yago/GeographicalArea108574314", "http://dbpedia.org/class/yago/Housing103546340", "http://dbpedia.org/class/yago/LanguageUnit106284225", "http://dbpedia.org/class/yago/LivingQuarters103679384", "http://dbpedia.org/class/yago/Location100027167", "http://dbpedia.org/class/yago/MilitaryQuarters103763727", "http://dbpedia.org/class/yago/Municipality108626283", "http://dbpedia.org/class/yago/Name106333653", "http://dbpedia.org/class/yago/NationalCapital108691669", "http://dbpedia.org/class/yago/Object100002684", "http://dbpedia.org/class/yago/Part113809207", "http://dbpedia.org/class/yago/PhysicalEntity100001930", "http://dbpedia.org/class/yago/Region108630985", "http://dbpedia.org/class/yago/Relation100031921", "http://dbpedia.org/class/yago/Seat108647945", "http://dbpedia.org/class/yago/Site108651247", "http://dbpedia.org/class/yago/Structure104341686", "http://dbpedia.org/class/yago/Town108665504", "http://dbpedia.org/class/yago/Tract108673395", "http://dbpedia.org/class/yago/TradeName106845599", "http://dbpedia.org/class/yago/UrbanArea108675967", "http://dbpedia.org/class/yago/Whole100003553", "http://dbpedia.org/class/yago/YagoGeoEntity", "http://dbpedia.org/class/yago/YagoLegalActorGeo", "http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity", "http://dbpedia.org/class/yago/WikicatBeerBrandsOfGermany", "http://dbpedia.org/class/yago/WikicatCitiesInGermany", "http://dbpedia.org/class/yago/WikicatCitiesInNorthRhine-Westphalia", "http://dbpedia.org/class/yago/WikicatFormerNationalCapitals", "http://dbpedia.org/class/yago/WikicatMunicipalitiesInNorthRhine-Westphalia", "http://dbpedia.org/class/yago/WikicatPopulatedPlacesOnTheRhine", "http://dbpedia.org/class/yago/WikicatRomanLegions'CampsInGermany", "http://dbpedia.org/class/yago/WikicatRomanTownsAndCities", "http://dbpedia.org/class/yago/WikicatRomanTownsAndCitiesInGermany", "http://dbpedia.org/class/yago/WikicatTownsInNorthRhine-Westphalia", "http://dbpedia.org/class/yago/WikicatUniversityTownsInGermany"]}
{"comment": "Cologne (/k\u0259\u02c8lo\u028an/; German K\u00f6ln [k\u0153ln], Colognian: K\u00f6lle [\u02c8k\u0153\u026b\u0259]), is the largest city both in the federal State of North Rhine-Westphalia in Germany and is the fourth-largest city in the country (after Berlin, Hamburg, and Munich). It is located within the Rhine-Ruhr metropolitan region, one of the major European metropolitan regions and the largest in Germany with more than ten million inhabitants.", "pic": "http://commons.wikimedia.org/wiki/Special:FilePath/Cologne_montage.png", "uri": "http://dbpedia.org/resource/Cologne", "label": "Cologne", "areaTotal": "4.0515e+08", "types": ["http://www.w3.org/2002/07/owl#Thing", "http://www.wikidata.org/entity/Q486972", "http://www.wikidata.org/entity/Q515", "http://dbpedia.org/ontology/City", "http://dbpedia.org/ontology/PopulatedPlace", "http://dbpedia.org/ontology/Settlement", "http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing", "http://schema.org/City", "http://schema.org/Place", "http://dbpedia.org/ontology/Place", "http://dbpedia.org/ontology/Location", "http://umbel.org/umbel/rc/City", "http://umbel.org/umbel/rc/PopulatedPlace", "http://umbel.org/umbel/rc/Village", "http://umbel.org/umbel/rc/Location_Underspecified", "http://dbpedia.org/class/yago/AdministrativeDistrict108491826", "http://dbpedia.org/class/yago/Artifact100021939", "http://dbpedia.org/class/yago/Camp102944826", "http://dbpedia.org/class/yago/City108524735", "http://dbpedia.org/class/yago/District108552138", "http://dbpedia.org/class/yago/GeographicalArea108574314", "http://dbpedia.org/class/yago/Housing103546340", "http://dbpedia.org/class/yago/LivingQuarters103679384", "http://dbpedia.org/class/yago/Location100027167", "http://dbpedia.org/class/yago/MilitaryQuarters103763727", "http://dbpedia.org/class/yago/Municipality108626283", "http://dbpedia.org/class/yago/Object100002684", "http://dbpedia.org/class/yago/PhysicalEntity100001930", "http://dbpedia.org/class/yago/Region108630985", "http://dbpedia.org/class/yago/Site108651247", "http://dbpedia.org/class/yago/Structure104341686", "http://dbpedia.org/class/yago/Town108665504", "http://dbpedia.org/class/yago/Tract108673395", "http://dbpedia.org/class/yago/UrbanArea108675967", "http://dbpedia.org/class/yago/Whole100003553", "http://dbpedia.org/class/yago/YagoGeoEntity", "http://dbpedia.org/class/yago/YagoLegalActorGeo", "http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity", "http://dbpedia.org/class/yago/WikicatCatholicPilgrimageSites", "http://dbpedia.org/class/yago/WikicatCitiesInGermany", "http://dbpedia.org/class/yago/WikicatCitiesInNorthRhine-Westphalia", "http://dbpedia.org/class/yago/WikicatHolyCities", "http://dbpedia.org/class/yago/WikicatImperialFreeCities", "http://dbpedia.org/class/yago/WikicatMunicipalitiesInNorthRhine-Westphalia", "http://dbpedia.org/class/yago/WikicatPopulatedPlacesEstablishedInThe1stCenturyBC", "http://dbpedia.org/class/yago/WikicatPopulatedPlacesOnTheRhine", "http://dbpedia.org/class/yago/WikicatRomanLegions'CampsInGermany", "http://dbpedia.org/class/yago/WikicatRomanTownsAndCitiesInGermany", "http://dbpedia.org/class/yago/WikicatTownsInNorthRhine-Westphalia", "http://dbpedia.org/class/yago/WikicatUniversityTownsInGermany", "http://dbpedia.org/class/yago/WikicatWorldHeritageSitesInGermany"]}
{"comment": "Berlin (/b\u0259r\u02c8l\u026an/, [b\u025b\u0250\u032f\u02c8li\u02d0n]) is the capital of Germany and one of the 16 states of Germany. With a population of 3.5 million people, it is the second most populous city proper and the seventh most populous urban area in the European Union. Located in northeastern Germany on the banks of Rivers Spree and Havel, it is the centre of the Berlin-Brandenburg Metropolitan Region, which has about six million residents from over 180 nations. Due to its location in the European Plain, Berlin is influenced by a temperate seasonal climate. Around one-third of the city's area is composed of forests, parks, gardens, rivers and lakes.", "pic": "http://commons.wikimedia.org/wiki/Special:FilePath/Coat_of_arms_of_Berlin.svg", "uri": "http://dbpedia.org/resource/Berlin", "label": "Berlin", "areaTotal": "8.9185e+08", "types": ["http://www.w3.org/2002/07/owl#Thing", "http://www.wikidata.org/entity/Q3455524", "http://www.wikidata.org/entity/Q486972", "http://dbpedia.org/ontology/AdministrativeRegion", "http://dbpedia.org/ontology/PopulatedPlace", "http://dbpedia.org/ontology/Region", "http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing", "http://schema.org/AdministrativeArea", "http://schema.org/Place", "http://dbpedia.org/ontology/Place", "http://dbpedia.org/ontology/Location", "http://dbpedia.org/ontology/%3Chttp://purl.org/dc/terms/Jurisdiction%3E", "http://umbel.org/umbel/rc/City", "http://umbel.org/umbel/rc/PopulatedPlace", "http://umbel.org/umbel/rc/Village", "http://umbel.org/umbel/rc/Location_Underspecified", "http://dbpedia.org/class/yago/AdministrativeDistrict108491826", "http://dbpedia.org/class/yago/Area108497294", "http://dbpedia.org/class/yago/Capital108518505", "http://dbpedia.org/class/yago/Center108523483", "http://dbpedia.org/class/yago/City108524735", "http://dbpedia.org/class/yago/District108552138", "http://dbpedia.org/class/yago/GeographicalArea108574314", "http://dbpedia.org/class/yago/Location100027167", "http://dbpedia.org/class/yago/Municipality108626283", "http://dbpedia.org/class/yago/Object100002684", "http://dbpedia.org/class/yago/PhysicalEntity100001930", "http://dbpedia.org/class/yago/Region108630985", "http://dbpedia.org/class/yago/Seat108647945", "http://dbpedia.org/class/yago/Section108648322", "http://dbpedia.org/class/yago/Site108651247", "http://dbpedia.org/class/yago/StateCapital108695539", "http://dbpedia.org/class/yago/Town108665504", "http://dbpedia.org/class/yago/Tract108673395", "http://dbpedia.org/class/yago/UrbanArea108675967", "http://dbpedia.org/class/yago/Vicinity108641113", "http://dbpedia.org/class/yago/YagoGeoEntity", "http://dbpedia.org/class/yago/YagoLegalActorGeo", "http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity", "http://dbpedia.org/class/yago/WikicatCapitalsInEurope", "http://dbpedia.org/class/yago/WikicatCitiesInGermany", "http://dbpedia.org/class/yago/WikicatCitiesWithMillionsOfInhabitants", "http://dbpedia.org/class/yago/WikicatGermanStateCapitals", "http://dbpedia.org/class/yago/WikicatLocalitiesOfBerlin", "http://dbpedia.org/class/yago/WikicatMunicipalitiesOfGermany", "http://dbpedia.org/class/yago/WikicatPopulatedPlacesEstablishedInThe13thCentury", "http://dbpedia.org/class/yago/WikicatStatesAndTerritoriesEstablishedIn1237", "http://dbpedia.org/class/yago/WikicatStatesOfGermany", "http://dbpedia.org/class/yago/WikicatUniversityTownsInGermany", "http://dbpedia.org/class/yago/WikicatWorldHeritageSitesInGermany"]}
{"comment": "Leipzig (/\u02c8la\u026aps\u026a\u0261/; [\u02c8la\u026apts\u026a\u00e7]) is the largest city in the federal state of Saxony, Germany. With its population of 544,479 inhabitants (1,001,220 residents in the larger urban zone) Leipzig, one of Germany's top 15 cities by population, is located about 160 kilometers (99 miles) southwest of Berlin at the confluence of the White Elster, Pleisse, and Parthe rivers at the southerly end of the North German Plain.", "pic": "http://commons.wikimedia.org/wiki/Special:FilePath/Flag_of_Leipzig.svg", "uri": "http://dbpedia.org/resource/Leipzig", "label": "Leipzig", "areaTotal": "2.9736e+08", "types": ["http://www.w3.org/2002/07/owl#Thing", "http://www.wikidata.org/entity/Q3957", "http://www.wikidata.org/entity/Q486972", "http://dbpedia.org/ontology/PopulatedPlace", "http://dbpedia.org/ontology/Settlement", "http://dbpedia.org/ontology/Town", "http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing", "http://schema.org/Place", "http://dbpedia.org/ontology/Place", "http://dbpedia.org/ontology/Location", "http://umbel.org/umbel/rc/PopulatedPlace", "http://umbel.org/umbel/rc/Town", "http://umbel.org/umbel/rc/Village", "http://umbel.org/umbel/rc/Location_Underspecified", "http://dbpedia.org/class/yago/AdministrativeDistrict108491826", "http://dbpedia.org/class/yago/City108524735", "http://dbpedia.org/class/yago/District108552138", "http://dbpedia.org/class/yago/GeographicalArea108574314", "http://dbpedia.org/class/yago/Location100027167", "http://dbpedia.org/class/yago/Municipality108626283", "http://dbpedia.org/class/yago/Object100002684", "http://dbpedia.org/class/yago/PhysicalEntity100001930", "http://dbpedia.org/class/yago/Region108630985", "http://dbpedia.org/class/yago/Town108665504", "http://dbpedia.org/class/yago/UrbanArea108675967", "http://dbpedia.org/class/yago/YagoGeoEntity", "http://dbpedia.org/class/yago/YagoLegalActorGeo", "http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity", "http://dbpedia.org/class/yago/WikicatCitiesInGermany", "http://dbpedia.org/class/yago/WikicatCitiesInSaxony", "http://dbpedia.org/class/yago/WikicatUniversityTownsInGermany"]}
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# DBpedia JSON Extractor\n",
"\n",
"The following shows how you can retrieve information from DBpedia for a large number of URIs in form of a JSON file.\n",
"\n",
"To be a bit more precise, we'll generate what is called JSONL: each line of the output file will contain a full JSON doc.\n",
"\n",
"If you want to extract information for more than 1000 URIs, please don't run the following code against the online DBpedia, but [set up an own endpoint](https://joernhees.de/blog/2015/11/23/setting-up-a-linked-data-mirror-from-rdf-dumps-dbpedia-2015-04-freebase-wikidata-linkedgeodata-with-virtuoso-7-2-1-and-docker-optional/)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:rdflib:RDFLib Version: 4.2.1\n"
]
}
],
"source": [
"import json\n",
"from rdflib import URIRef\n",
"import SPARQLWrapper"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Settings"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"OUTFILE='dbpedia.jsonl' # a line containing 1 json doc per line\n",
"INPUT_URIFILE='uris.txt' # file with input URIs, 1 per line\n",
"BLOCKSIZE=100 # how many URIs to query in a single transaction\n",
"SPARQL_ENDPOINT = 'http://dbpedia.org/sparql' # please run an own endpoint if INPUT_URIFILE is big!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### URI File\n",
"\n",
"The accompanying example URI file ([uris.txt](./uris.txt)) is just a sample. Extend it as desired, or write a SPARQL query to create the file.\n",
"\n",
"In case you want to retrieve information for the whole DBpedia, we run into issues with SPARQL sadly not offering an efficient way to \"iterate\" over results. In those cases i'd suggest to manually create a URI listing from all subjects in the dump files directly, for example as described [here](https://joernhees.de/blog/2015/01/28/dbpedia-2014-stats-top-subjects-predicates-and-objects/).\n",
"\n",
"### Blocksize\n",
"\n",
"The round trip time for a single query typically is big in comparison to the actual work the enpoint has to perform and the result it sends. The code below performs some magic to bundle several of these requests into one query. This obviously is a trade-off between speed, single result size and query timeouts. So experiment what is quickest for your queries!\n",
"\n",
"### Query Template (what to extract)\n",
"\n",
"Below you'll find a sample query that defines what you want to extract per URI.\n",
"- `SELECT`: be sure to select all variables you'd like to extract.\n",
"- `VALUES`: this does the `BLOCKSIZE` magic. It's for efficiency reasons only, don't touch.\n",
"- `OPTIONAL`: make sure that you use `OPTIONAL` in case you'd prefer an empty variable for a URI over no results for that URI at all.\n",
"- `GROUP_CONCAT`: It's a pretty good idea to `GROUP_CONCAT` variables like `?types` that can occur several times per URI. Especially if you have different ones, your results will otherwise be artificially inflated by the cartesian product of all of their combinations! Make sure you put them into the `GROUP_FIELD_SEPARATORS` and the code below will take care to split them into a json list. As spaces can't occur in URIs, they are a great separator for URIs. If you want to extract a multi-string attribute, make sure to choose a separator that isn't contained in the strings (can be several chars like `'MAgIcString~'`!\n",
"\n",
"### Warning: Loss of meta-information from RDF to JSON\n",
"\n",
"RDF for example distinguishes between the following:\n",
"- `<http://dbpedia.org/resource/Kaiserslautern>` (a URI)\n",
"- `\"http://dbpedia.org/resource/Kaiserslautern\"` (a String)\n",
"- `\"Kaiserslautern\"@en` (a langstring)\n",
"- `\"17\"^^xsd:integer` (an integer)\n",
"\n",
"I guess that your intention is to easily use the outputs, so you probably don't want to parse N3 from the generated JSON.\n",
"\n",
"This means, that your resulting JSON will just see the string values for all of the above:\n",
"- `\"http://dbpedia.org/resource/Kaiserslautern\"`\n",
"- `\"http://dbpedia.org/resource/Kaiserslautern\"`\n",
"- `\"Kaiserslautern\"`\n",
"- `\"17\"`"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"QUERY_TEMPLATE='''\n",
"SELECT ?uri ?label ?areaTotal ?pic (GROUP_CONCAT(?type; separator=\" \") as ?types) ?comment {\n",
" VALUES (?uri) {%(uris)s}\n",
" ?uri rdfs:label ?label . FILTER(lang(?label) = \"en\")\n",
" ?uri rdfs:comment ?comment . FILTER(lang(?comment) = \"en\")\n",
" OPTIONAL { ?uri dbo:areaTotal ?areaTotal . }\n",
" OPTIONAL { ?uri foaf:depiction ?pic . }\n",
" ?uri a ?type .\n",
"}\n",
"'''\n",
"GROUP_FIELD_SEPARATORS = {\n",
" \"types\": \" \", # remove the leading '?' from vars!\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Helpers"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# some helper for chunking queries into blocks...\n",
"from itertools import izip_longest\n",
"def chunker(iterable, n, fillvalue=None):\n",
" \"\"\"Like a grouper but last tuple is shorter.\n",
"\n",
" >>> list(chunker([1, 2, 3, 4, 5], 3))\n",
" [[1, 2, 3], [4, 5]]\n",
" \"\"\"\n",
" if n < 1:\n",
" raise ValueError(\"can't chunk by n=%d\" % n)\n",
" args = [iter(iterable)] * n\n",
" return (\n",
" [e for e in t if e is not None]\n",
" for t in izip_longest(*args, fillvalue=fillvalue)\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def uri_generator(uri_filename):\n",
" \"\"\"Reads URI file line by line and yields them.\"\"\"\n",
" with open(uri_filename, 'r') as f:\n",
" for line in f:\n",
" uri = line.strip().strip('<>').strip()\n",
" if uri:\n",
" yield uri\n",
"\n",
"\n",
"def result_generator(sparql_endpoint, uris):\n",
" \"\"\"A generator that retrieves results in blocks and yields them one by one.\"\"\"\n",
" sparql = SPARQLWrapper.SPARQLWrapper(sparql_endpoint, returnFormat=SPARQLWrapper.JSON)\n",
" for uri_chunk in chunker(uris, BLOCKSIZE):\n",
" q = QUERY_TEMPLATE % {\n",
" 'uris': ' '.join(['(%s)' % URIRef(u).n3() for u in uri_chunk])\n",
" }\n",
" # print q\n",
" sparql.setQuery(q)\n",
" res = sparql.queryAndConvert()\n",
" vars_ = res.get('head', []).get('vars', [])\n",
" bindings = res.get('results', []).get('bindings', [])\n",
" for row in bindings:\n",
" out = {}\n",
" for var in vars_:\n",
" value = row.get(var, {}).get('value')\n",
" if value:\n",
" if var in GROUP_FIELD_SEPARATORS:\n",
" value = value.split(GROUP_FIELD_SEPARATORS[var])\n",
" out[var] = value\n",
" # print json.dumps(out, indent=2)\n",
" yield out\n",
"\n",
"\n",
"def retrieve(sparql_endpoint=SPARQL_ENDPOINT, uri_filename=INPUT_URIFILE, out_filename=OUTFILE):\n",
" \"\"\"Calls the result generator and writes its results into the outfile.\"\"\"\n",
" count = 0\n",
" with open(out_filename, 'w') as outf:\n",
" for res_dict in result_generator(sparql_endpoint, uri_generator(uri_filename)):\n",
" outf.write(json.dumps(res_dict) + '\\n')\n",
" count += 1\n",
" return count"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retrieve()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
http://dbpedia.org/resource/Kaiserslautern
http://dbpedia.org/resource/Leipzig
http://dbpedia.org/resource/Berlin
http://dbpedia.org/resource/Bonn
http://dbpedia.org/resource/Cologne
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment