Skip to content

Instantly share code, notes, and snippets.

@jbinfo
Last active March 7, 2022 05:49
Show Gist options
  • Save jbinfo/f8d00416303c092ea140 to your computer and use it in GitHub Desktop.
Save jbinfo/f8d00416303c092ea140 to your computer and use it in GitHub Desktop.
How to create a Scrapy CSV Exporter with a custom delimiter and order fields

Create a scrapy exporter on the root of your scrapy project, we suppose the name of your project is my_project, we can name this exporter: my_project_csv_item_exporter.py

from scrapy.conf import settings
from scrapy.contrib.exporter import CsvItemExporter

class MyProjectCsvItemExporter(CsvItemExporter):

    def __init__(self, *args, **kwargs):
        delimiter = settings.get('CSV_DELIMITER', ',')
        kwargs['delimiter'] = delimiter

        fields_to_export = settings.get('FIELDS_TO_EXPORT', [])
        if fields_to_export :
        	kwargs['fields_to_export'] = fields_to_export

        super(MyProjectCsvItemExporter, self).__init__(*args, **kwargs)

In settings.py import this exporter and set the fields to export and the order to follow, like this:

FEED_EXPORTERS = {
    'csv': 'my_project.my_project_csv_item_exporter.MyProjectCsvItemExporter',
}
FIELDS_TO_EXPORT = [
    'id',
    'name',
    'email',
    'address'
]

For the CSV delimiter, you can set in settings.py or when you execute the spider in CLI In settings.py

CSV_DELIMITER = "\t" # For tab

OR in CLI scrapy crawl my_spider -o output.csv -t csv -a CSV_DELIMITER="\t"

@star-szr
Copy link

star-szr commented Oct 9, 2014

I just wanted to say thanks for sharing this! This seems much saner than the other approaches I've seen.

@jbinfo
Copy link
Author

jbinfo commented Jun 1, 2015

@cottser you are welcome :)

@cantgetthatping
Copy link

Thank you so much @cottser, you saved my day and much more to come. Please enjoy your day.

@cantgetthatping
Copy link

Thank you so much @jbinfo, you saved my day and much more to come. Please enjoy your day.

@bnussey
Copy link

bnussey commented Oct 22, 2015

Hey @jbinfo is it possible to use this with a pipeline? Thanks

@dianjuar
Copy link

Nice work. Is possible to set a custom column name ?

@dramoslance
Copy link

You could just pass fields_to_export, by using self.exporter.fields_to_export to avoid global settings.

@zhorifiandi
Copy link

Thanks, this really save my confusion 🤣

@oussama-ht
Copy link

thanks :D

@k4mrul
Copy link

k4mrul commented Jun 18, 2019

Thank you! However I need to change:
from scrapy.contrib.exporter import CsvItemExporter
to
from scrapy.exporters import CsvItemExporter

@TheCodingLady
Copy link

Thank you! However I need to change:
from scrapy.contrib.exporter import CsvItemExporter
to
from scrapy.exporters import CsvItemExporter

SAME HERE. Also many thanks from me too!
Just a note to you that the delimiter did not work and I had part of one field going into neighbouring fields. So I took out the tab delimiter and it just put in commas. Everything else was good! Happy Coding!

@Josemariabordes
Copy link

There is a way much more easy if you want to change the comma to other character, add the parameter delimiter = ";" inside on CsvItemExporter() that it works for me:

CsvItemExporter(file, delimiter = ";")

That´s my pipeline.py, take a look on the spider_opened function, hope that helps:

import scrapy
from scrapy import signals
from scrapy.exporters import CsvItemExporter
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
from scrapy import Request
import csv

class MercadoPipeline(object):
    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline

    def spider_opened(self, spider):
        file = open('%s_items.csv' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = CsvItemExporter(file, delimiter = ";")
        self.exporter.fields_to_export = ['titulo', 'precio']

        self.exporter.start_exporting()

    def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

class MercadoImagenesPipeline(ImagesPipeline):

    def get_media_requests(self, item, info):
        return [Request(x, meta={'image_name': item["image_name"]})
                for x in item.get('image_urls', [])]

    def file_path(self, request, response=None, info=None):
        return '%s.jpg' % request.meta['image_name']

@bdmorin
Copy link

bdmorin commented Mar 20, 2020

super(MyProjectCsvItemExporter, self).init(*args, **kwargs)

What does this do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment