Skip to content

Instantly share code, notes, and snippets.

@starenka
Created October 5, 2013 11:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save starenka/6839721 to your computer and use it in GitHub Desktop.
Save starenka/6839721 to your computer and use it in GitHub Desktop.
fetches all images from http://historje.tumblr.com and makes a pdf from them
#!/usr/bin/env python
from pyquery import PyQuery as pq
import requests
bigs = []
for y in (2012,2013):
for m in range(1,13):
try:
doc = pq(requests.get('http://historje.tumblr.com/archive/%d/%d' % (y,m)).content)
imgs = doc.find('div.has_imageurl')
bigs += map(lambda x: pq(x).attr('data-imageurl').replace('_500.','_1280.'), imgs)
except Exception as e:
print e
fh = open('links','w')
fh.write('\n'.join(set(bigs)))
fh.close()
print """
./historje.py
mkdir img
cd img
cat ../links | uniq | xargs -P 10 -r -n 1 wget -nv
convert * do_$(date +"%d-%m-%y").pdf
cd -
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment