Skip to content

Instantly share code, notes, and snippets.

@jjjake
Created August 13, 2012 21:04
Show Gist options
  • Save jjjake/3344096 to your computer and use it in GitHub Desktop.
Save jjjake/3344096 to your computer and use it in GitHub Desktop.
Given an IA catalog.php URL, returns a list of dictionaries (1 dictionary/row).
#!/usr/bin/env python
import os, sys
import requests
import json
COOKIES = {'logged-in-sig': os.environ['LOGGED_IN_SIG'],
'logged-in-user': os.environ['LOGGED_IN_USER'],
'verbose': '1',
}
def get_rows(URL):
CATALOG_PARAMS = {'json': 1, 'output': 'json', 'callback': 'foo',
'verbose': 1}
req = requests.get(URL, params=CATALOG_PARAMS, cookies=COOKIES)
rows = [json.loads(x.strip(',').replace('])', '')) for x in
req.text.replace('foo','').split('\n') if len(x) > 20]
dict_list = []
for row in rows:
catalog_dict = dict(identifier=row[0],
srvr=row[1],
cmd=row[2],
time=row[3],
submitter=row[4],
args=row[5],
extra=row[6])
dict_list.append(catalog_dict)
return dict_list
# USAGE:
URL = 'http://www-tracey.us.archive.org/catalog.php?mode=users'
print get_rows(URL)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment