Skip to content

Instantly share code, notes, and snippets.

@rubinsztajn
Last active December 21, 2015 00:58
Show Gist options
  • Save rubinsztajn/6223857 to your computer and use it in GitHub Desktop.
Save rubinsztajn/6223857 to your computer and use it in GitHub Desktop.
Simple script to download MARCXML and PDFs from the internet archive
#!/usr/bin/env python
import os, sys
ids = open(sys.argv[1])
for id in ids:
id = id.strip()
pdf_cmd = "wget http://archive.org/download/%s/%s.pdf" % (id, id)
marc_cmd = "wget http://archive.org/download/%s/%s_archive_marc.xml" % (id, id)
os.system(pdf_cmd)
os.system(marc_cmd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment