Skip to content

Instantly share code, notes, and snippets.

@rmax
Created September 28, 2016 15:13
Show Gist options
  • Save rmax/95201824cc913c4909b5fd06ed782062 to your computer and use it in GitHub Desktop.
Save rmax/95201824cc913c4909b5fd06ed782062 to your computer and use it in GitHub Desktop.
settings = {}
bot = scrapy.CrawlerBot(name="mybot/1.0", settings=settings)
def follow_links(response):
for link in response.iter_links():
bot.crawl(link.url, callback=follow_links, referer=response)
bot.emit({
"url": response.url,
"status": response.status,
})
bot.crawl("http://google.com", callback=parse_google)
bot.start()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment