Skip to content

Instantly share code, notes, and snippets.

@justinwyer
Created March 3, 2013 21:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save justinwyer/5078274 to your computer and use it in GitHub Desktop.
Save justinwyer/5078274 to your computer and use it in GitHub Desktop.
import codecs
import requests
import multiprocessing
def process_url(page):
print page
url = "http://www.southafricaschools.co.za/schools2?page=%ld&tid=13&tid_1=All&tid_2=All&title=" % (page)
r = requests.get(url)
out = codecs.open('eastern-cape/page-%ld' % (page), 'w', 'utf-8')
out.write(r.text)
out.close()
pool = multiprocessing.Pool(16) # how much parallelism?
pool.map(process_url, range(304))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment