Skip to content

Instantly share code, notes, and snippets.

@stevenpollack
Last active April 25, 2016 14:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stevenpollack/c1fff12faae2a9549a84994056f251fd to your computer and use it in GitHub Desktop.
Save stevenpollack/c1fff12faae2a9549a84994056f251fd to your computer and use it in GitHub Desktop.
Perform an async crawl of google.com/movies when we're not sure how many pages we need to crawl...
def crawl(self, current_page):
# modifies various attributes of self depending on the
# html in current_page and returns nothing.
@coroutine
def coro(current_page):
# either return a future of the body of the "next" page or None
next_page_url = next_page_link(current_page)
if next_page_url is None:
yield None
yield async_fetch(next_page_url).body
next_page = fetch_url("http://google.com/movies?near=Berlin")
while next_page:
current_page = next_page
next_page_future = coro(current_page)
next_page_future.add_done_callback(lambda f: next_page = f.result())
parse(self, current_page)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment