Skip to content

Instantly share code, notes, and snippets.

@vijayanandrp
Forked from redapple/in_container_console_logs.txt
Last active June 19, 2017 08:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vijayanandrp/e01cceb82a90ceaa54c671a70780bd25 to your computer and use it in GitHub Desktop.
Save vijayanandrp/e01cceb82a90ceaa54c671a70780bd25 to your computer and use it in GitHub Desktop.
Installing scrapy 1.1 on Ubuntu 16.04 on Python 3, using virtualenvwrapper
# When the cache is clear, pip is working again.
hash -r
1. sudo apt-get install python3 python-dev python3-dev build-essential libssl-dev libffi-dev libxml2-dev libxslt-dev python3-pip
2. sudo pip3 install virtualenvwrapper
3. workon [try this command in terminal if it not works go to point 4]
4. source /usr/local/bin/virtualenvwrapper.sh (or) source ~/.local/bin/virtualenvwrapper.sh
5. workon [try this command in terminal, it will work definitely]
6. mkvirtualenv --python=python3 scrapy.py3 # Create a environment variable for scrapy project
7. pip3 install scrapy
8. scrapy
9. scrapy version -v
10. scrapy shell http://scrapy.org
11. deactivate
12. rmvirtualenv venv # To delete the virtual environment variable
# 1. --- install system dependencies (sudo apt-get install)

$ sudo apt-get install python3 python-dev python3-dev \
>     build-essential libssl-dev libffi-dev \
>     libxml2-dev libxslt-dev \
>     python-pip
[sudo] password for scrapyuser: 
Reading package lists... Done


# 2. --- install virtualenvwrapper (sudo pip install)
# see https://virtualenvwrapper.readthedocs.io/en/latest/install.html
# also check http://roundhere.net/journal/virtualenv-ubuntu-12-10/
$ sudo pip install virtualenvwrapper

Installing collected packages: virtualenv-clone, pbr, six, stevedore, virtualenv, virtualenvwrapper
  Running setup.py install for virtualenv-clone ... done
Successfully installed pbr-1.10.0 six-1.10.0 stevedore-1.15.0 virtualenv-15.0.2 virtualenv-clone-0.2.6 virtualenvwrapper-4.7.1
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.



$ workon
bash: workon: command not found

# 3. --- need to load startup file
# see https://virtualenvwrapper.readthedocs.io/en/latest/install.html#shell-startup-file
$ source /usr/local/bin/virtualenvwrapper.sh

$ workon 


# 4. --- create a Python 3 virtual environment
# also see https://virtualenv.pypa.io/en/stable/reference/#virtualenv-command for options

$ mkvirtualenv --python=python3 scrapy.py3

# 5. --- install scrapy in the virtualenv
$ pip install scrapy

# 6. --- testing scrapy commands
$ scrapy

Scrapy 1.1.0 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
  bench         Run quick benchmark test
  commands      
  fetch         Fetch a URL using the Scrapy downloader
  runspider     Run a self-contained spider (without creating a project)
  settings      Get settings values
  shell         Interactive scraping console
  startproject  Create new project
  version       Print Scrapy version
  view          Open URL in browser, as seen by Scrapy

  [ more ]      More commands available when run from project directory

Use "scrapy <command> -h" to see more info about a command


# 7. --- check that you're running Scrapy with Python 3
(scrapy.py3) scrapyuser@8fb08da8f18b:/$ scrapy version -v
Scrapy    : 1.1.0
lxml      : 3.6.0.0
libxml2   : 2.9.3
Twisted   : 16.2.0
Python    : 3.5.1+ (default, Mar 30 2016, 22:46:26) - [GCC 5.3.1 20160330]
pyOpenSSL : 16.0.0 (OpenSSL 1.0.2g-fips  1 Mar 2016)
Platform  : Linux-4.4.0-24-generic-x86_64-with-Ubuntu-16.04-xenial


# 8. --- test scrapy shell
(scrapy.py3) scrapyuser@8fb08da8f18b:/$ scrapy shell http://scrapy.org
2016-06-17 11:05:49 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-06-17 11:05:49 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-06-17 11:05:49 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats']
2016-06-17 11:05:50 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-06-17 11:05:50 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-06-17 11:05:50 [scrapy] INFO: Enabled item pipelines:
[]
2016-06-17 11:05:50 [scrapy] INFO: Spider opened
2016-06-17 11:05:50 [scrapy] DEBUG: Crawled (200) <GET http://scrapy.org> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f5d174d65c0>
[s]   item       {}
[s]   request    <GET http://scrapy.org>
[s]   response   <200 http://scrapy.org>
[s]   settings   <scrapy.settings.Settings object at 0x7f5d1b7b1390>
[s]   spider     <DefaultSpider 'default' at 0x7f5d166f2a90>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser
>>> response.xpath('//h1')
[]
>>> response.xpath('//title').extract_first()
'<title>Scrapy | A Fast and Powerful Scraping and Web Crawling Framework</title>'
>>> 
@vijayanandrp
Copy link
Author

vijayanandrp commented Jun 6, 2017

# When the cache is clear, pip is working again.
hash -r

sudo apt-get install python3 python-dev python3-dev build-essential libssl-dev libffi-dev libxml2-dev libxslt-dev python-pip

sudo pip install virtualenvwrapper

workon

source /usr/local/bin/virtualenvwrapper.sh

workon

mkvirtualenv --python=python3 scrapy.py3

pip install scrapy

scrapy

scrapy version -v

scrapy shell http://scrapy.org

deactivate 

rmvirtualenv venv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment