Reading Wikidata dumps via Spark
# Takes around 30 minutes just to show df.head() | |
%%time | |
wikidata_dump_path="/path/to/latest-all.json.bz2" | |
df = sql.read.option("multiline", "true").json(wikidata_dump_path) | |
df.head() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment