Skip to content

Instantly share code, notes, and snippets.

@mikesname
Last active September 22, 2016 16:44
Show Gist options
  • Save mikesname/a7a2a8ced559b0aa4fd278704ef1263d to your computer and use it in GitHub Desktop.
Save mikesname/a7a2a8ced559b0aa4fd278704ef1263d to your computer and use it in GitHub Desktop.
Fetch id, name, and scope-content data for documentary units # and write as tab-separated values.
#!/usr/bin/env python3
# Fetch id, name, and scope-content data for documentary units
# and write as tab-separated values.
import sys, requests, csv, urllib
if len(sys.argv) < 1:
sys.stderr.write("usage: scopecontent.py <initial-api-url>\n")
sys.exit(1)
csvwriter = csv.writer(sys.stdout, delimiter="\t", quoting=csv.QUOTE_MINIMAL)
csvwriter.writerow(["id", "name", "scopeAndContent"]) # header
def scope_content(url):
sys.stderr.write("Fetching: %s\n" % url)
r = requests.get(url)
data = r.json()
for item in data["data"]:
try:
# fetch the ID and first description...
id = item["id"]
description = item["attributes"]["descriptions"][0]
name = description["name"]
scopecontent = description["scopeAndContent"]
csvwriter.writerow([id, name, scopecontent])
except (IndexError, KeyError):
# no description or scope and content found... skipping...
pass
# fetch the next page of data...
if data.get("links") and data["links"].get("next"):
scope_content(urllib.parse.unquote(data["links"]["next"]))
scope_content(sys.argv[1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment