Skip to content

Instantly share code, notes, and snippets.

@bwhite
Created September 23, 2012 10:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bwhite/3769619 to your computer and use it in GitHub Desktop.
Save bwhite/3769619 to your computer and use it in GitHub Desktop.
Gevent Crawler Demo
from gevent import monkey, spawn, joinall
monkey.patch_all() # Magic!
import re, requests, random
URLS = set(['http://umd.edu', 'http://nytimes.com'])
def crawl(urls):
try:
while True:
url = random.choice(list(urls))
URLS.add(url)
print(url)
d = requests.get(url).content
urls = re.findall('href="(http[^"]+)"', d)
urls = set(urls) - URLS
except:
spawn(crawl, URLS)
joinall([spawn(crawl, URLS) for _ in range(100)])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment