Skip to content

Instantly share code, notes, and snippets.

@maurobaraldi
Forked from anonymous/gist:dc9c249318f3f92d716b
Created December 29, 2015 16:56
Show Gist options
  • Save maurobaraldi/899a88fe0bf7471574c1 to your computer and use it in GitHub Desktop.
Save maurobaraldi/899a88fe0bf7471574c1 to your computer and use it in GitHub Desktop.
Download math books from springer
from urllib2 import urlopen
from re import findall, search
base_url = "http://link.springer.com"
links = []
for i in xrange(1, 13):
index = base_url + '/search/page/%d?facet-series="136"&facet-content-type="Book"&showAll=false' % i
links.extend(findall('<a class="title" href="(.*?)"', urlopen(index).read()))
for link in links:
page = urlopen(base_url + link).read()
name = search('<h1 id="title">(.*?)<', page).group(1)
pdf = re.search('<a id="toc-download-book-pdf.*?href="(.*?)"', page).group(1)
print 'Downloading', name
with open(name+'.pdf', 'w') as book:
book.write(urlopen(base_url + pdf).read())
print 'Done'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment