Skip to content

Instantly share code, notes, and snippets.

@rgegriff
Created July 27, 2012 02:08
Show Gist options
  • Save rgegriff/3185805 to your computer and use it in GitHub Desktop.
Save rgegriff/3185805 to your computer and use it in GitHub Desktop.
A script to download all of the epubs from a calibre library
#!/bin/env python
#python scrape.py "http://some.url.nyud.net/"
import urllib, re, sys, time, shutil
url = sys.argv[1]
search_str = "mobile?search=;order=descending;sort=author;num=10000000;start=1"
epub_list = [l for l in re.findall('href=[\"\'](.[^\"\']+)[\"\']', urllib.urlopen(url+search_str).read(), re.I) if l[-4:] == "epub"]
if len(sys.argv) == 3:
print "RESUMING AT #"+str(sys.argv[2])
count = int(sys.argv[2])
else:
count = 0
total = len(epub_list)-1
while count < total:
for book_url in epub_list[count:]:
book_fname = urllib.unquote(book_url.split('/')[-1])
print "Downloading ",count,"/",total, book_fname
try:
start = time.time()
with open(book_fname ,"wb") as f:
shutil.copyfileobj(urllib.urlopen(url + book_url),f )
f.close()
end = time.time()
print "Done!........ ", end - start
count += 1
except IOError:
print "Network Error... Retrying."
break
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment