Skip to content

Instantly share code, notes, and snippets.

@nahali
Last active December 22, 2015 06:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nahali/6434932 to your computer and use it in GitHub Desktop.
Save nahali/6434932 to your computer and use it in GitHub Desktop.
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy.http import Request
class MyImagesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
for image_url in item['image_urls']:
yield Request(image_url)
def item_completed(self, results, item, info):
image_paths = [x['path'] for ok, x in results if ok]
if not image_paths:
raise DropItem("Item contains no images")
item['image_paths'] = image_paths
return item
def process_item(self, item, spider):
if spider.name == 'vinNico3':
print "inside process_item"
#class Projetvinnicolas3Pipeline(object):
# def process_item(self, item, spider):
# return item
@redapple
Copy link

redapple commented Sep 6, 2013

You have to call the superclass' method and return its result
otherwise you are discarding all items

    def process_item(self, item, spider):
        if spider.name == 'vinNico3':
            print "inside process_item"
        return super(MyImagesPipeline, self).process_item(item, spider)

You can also NOT define the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment