Skip to content

Instantly share code, notes, and snippets.

@thlor
Created November 14, 2018 15:54
Show Gist options
  • Save thlor/d32a3af0fdbd02cb5ecee95734796ef1 to your computer and use it in GitHub Desktop.
Save thlor/d32a3af0fdbd02cb5ecee95734796ef1 to your computer and use it in GitHub Desktop.
CKAN crawler
# First install CKANapi module from the command line:
# pip3 install ckanapi
from ckanapi import RemoteCKAN
import json
with RemoteCKAN("https://www.data.gv.at/katalog/", get_only=True) as ckan:
page = 0
rows = 100
limit_pages = 10 # Limit number of pages to be crawled. DEBUG reasons. Set this to -1 to crawl unlimited pages.
while True:
metadatas = ckan.action.package_search(rows=rows, start=page * rows)
page = page + 1
if len(metadatas["results"]) == 0:
break
if page == limit_pages:
break
for metadata in metadatas["results"]:
# place logic working with the "metadata" variable here:
print(json.dumps(metadata)[0:100] + " ...")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment