Skip to content

Instantly share code, notes, and snippets.

@mlafeldt
Created July 2, 2011 18:07
Show Gist options
  • Save mlafeldt/1061471 to your computer and use it in GitHub Desktop.
Save mlafeldt/1061471 to your computer and use it in GitHub Desktop.
[Python] Find download links to all PragPub magazines
#!/usr/bin/env python
"""
find download links to all PragPub magazines
usage:
$ ./pragpub_get.py [pdf | html | epub | mobi] > pragpub.lst
$ wget -c -i pragpub.lst
written by Mathias Lafeldt <mathias.lafeldt@gmail.com>
"""
import sys, re, urllib
if len(sys.argv) > 1:
ext = sys.argv[1]
else:
ext = 'pdf'
url = 'http://pragprog.com/magazines'
pattern = re.compile(r'"(%s/download/.+?\.%s)"' % (url, ext), flags=re.IGNORECASE)
page = 1
while True:
html = urllib.urlopen(url + '?page=' + str(page)).read()
links = pattern.findall(html)
if not links:
break
for l in links:
print urllib.urlopen(l).geturl()
page += 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment