Skip to content

Instantly share code, notes, and snippets.

@seanh
Created June 8, 2013 22:34
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save seanh/5736861 to your computer and use it in GitHub Desktop.
Scraper for the 130 1600x1200 wallpapers from the [National Geographic Photo Contest 2012](http://ngm.nationalgeographic.com/ngm/photo-contest/2012/), saves having to click on each one manually.
#!/usr/bin/env python2
"""
Scrapes all the 1600x1200 wallpapers from the National Geographic Photo Contest
2012: <http://ngm.nationalgeographic.com/ngm/photo-contest/2012>
This just prints out the URLs, you can pipe them to a file and then use wget
to download them:
./scrape.py > wallpapers.txt && wget -i wallpapers.txt
Requires Python 2 and Beautiful Soup 4.
"""
from bs4 import BeautifulSoup
import urllib2
wallpapers = []
weeks = ["http://ngm.nationalgeographic.com/ngm/photo-contest/2012/entries/wallpaper/nature-week-{num}/".format(num=num) for num in range(1,14)]
for week in weeks:
soup = BeautifulSoup(urllib2.urlopen(week).read())
for wallpaper in [a['href'] for a in soup.find_all('a') if a.get('class') == [u'download', u'wallpaper', u'monitor']]:
wallpapers.append(wallpaper)
for wallpaper in wallpapers:
print "http://ngm.nationalgeographic.com" + wallpaper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment