Skip to content

Instantly share code, notes, and snippets.

@Ddedalus
Created June 24, 2020 06:45
Show Gist options
  • Save Ddedalus/4a9a1ced85e1830412d886c6d784eb98 to your computer and use it in GitHub Desktop.
Save Ddedalus/4a9a1ced85e1830412d886c6d784eb98 to your computer and use it in GitHub Desktop.
Capture scrapy crawler results to a Python list
import scrapy
from scrapy import signals
from scrapy.signalmanager import dispatcher
class MySpider(scrapy.Spider):
...
# gather the results, see: https://stackoverflow.com/a/40240712
nurseries = []
def crawler_results(signal, sender, item, response, spider):
nurseries.append(item)
dispatcher.connect(crawler_results, signal=signals.item_passed)
# Run the Spider
process = CrawlerProcess()
process.crawl(MySpider)
process.start()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment