Skip to content

Instantly share code, notes, and snippets.

@WooodHead
Forked from samiujan/scrapy-installation.md
Created August 4, 2019 05:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save WooodHead/415da8626b73d54e517ad2d07ae9aceb to your computer and use it in GitHub Desktop.
Save WooodHead/415da8626b73d54e517ad2d07ae9aceb to your computer and use it in GitHub Desktop.
How to install Scrapy on Ubuntu

Scrapy is the web-scraper's scraper - it handles typical issues like distributed, asynchronous crawling, retrying during down-time, throttling download speeds, pagination, image downloads, generates beautiful logs and does much much more

You need a few modules to run scrapy on a Ubuntu/Debian machine (I used a cloud-based Ubuntu 14.04.4 LTS)

Following are the steps (and some recommendations)

The following was executed on a vanilla DigtialOcean Ubuntu (5 USD per month, 512 MB RAM). I feel this is sufficient to run a Scrapy crawler running at approx 1 HTTP request per second (with auto-throttle and delays turned on)

sudo apt-get update
sudo apt-get install -y build-essential autoconf libtool pkg-config python-opengl python-imaging python-pyrex python-pyside.qtopengl idle-python2.7 qt4-dev-tools qt4-designer libqtgui4 libqtcore4 libqt4-xml libqt4-test libqt4-script libqt4-network libqt4-dbus python-qt4 python-qt4-gl libgle3 python-dev libffi-dev libssl-dev libxml2-dev libxslt1-dev python-pip libjpeg-dev

I would recommend using virtualenv to keep the system Python installation pure

pip install virtualenv

virtualenv venv

source venv/bin/activate

pip install pillow


Now install scrapy using pip (the Ubuntu version is outdated)

pip install scrapy

(Scrapy.org also lists an installation process for Ubuntu but I ran into quite a few problems with this)

Create or clone a project, go to the project root and run:

scrapy crawl <project_name> -o <filename>.json

If you have written one or two web-crawlers or scrapers before, the best way to learn scrapy is through their tutorial here: scrapy tutorial

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment