Skip to content

Instantly share code, notes, and snippets.

@amferraz
Last active December 23, 2015 22:29
Show Gist options
  • Save amferraz/6703206 to your computer and use it in GitHub Desktop.
Save amferraz/6703206 to your computer and use it in GitHub Desktop.
An example of ItemPipeline with FollowAllSpider
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import signals
from testspiders.spiders.followall import FollowAllSpider
class MyPipeline(object):
def process_item(self, item, spider):
print item['url']
spider = FollowAllSpider(domain='scrapinghub.com')
# take a look
# https://scrapy.readthedocs.org/en/latest/topics/item-pipeline.html?#activating-an-item-pipeline-component
settings = Settings(
{
'ITEM_PIPELINES': {
'main.MyPipeline': 1
}
}
)
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
crawler.stats
reactor.run()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment