Skip to content

Instantly share code, notes, and snippets.

@redapple
redapple / console.txt
Created July 26, 2016 09:04
StackOverflow #38577374
$ scrapy runspider sitemapspider.py
2016-07-26 10:41:29 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-07-26 10:41:29 [scrapy] INFO: Overridden settings: {}
2016-07-26 10:41:32 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats']
2016-07-26 10:41:34 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
@redapple
redapple / console.log
Created June 29, 2016 09:52
Scrapy spider returning multiple requests in a callback after some delay. Based on https://gist.github.com/dangra/2781744
$ scrapy runspider delayspider.py
2016-06-29 11:52:19 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-06-29 11:52:19 [scrapy] INFO: Overridden settings: {}
2016-06-29 11:52:19 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats']
2016-06-29 11:52:19 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
@redapple
redapple / in_container_console_logs.txt
Created June 17, 2016 11:21
Installing scrapy 1.1 on Ubuntu 16.04 on Python 3, using virtualenvwrapper
# --- install system dependencies (sudo apt-get install)
scrapyuser@8fb08da8f18b:/$ sudo apt-get install python3 python-dev python3-dev \
> build-essential libssl-dev libffi-dev \
> libxml2-dev libxslt-dev \
> python-pip
[sudo] password for scrapyuser:
Reading package lists... Done
Building dependency tree
Reading state information... Done
@redapple
redapple / Dockerfile
Created June 17, 2016 10:44
Installing scrapy 1.1 with Python 2 on Ubuntu 16:04, installing system dependencies and using pip in user scheme
# adapted from http://stackoverflow.com/questions/25845538/using-sudo-inside-a-docker-container
FROM ubuntu:16.04
RUN apt-get update
RUN apt-get -y install sudo
RUN useradd -m scrapyuser && echo "scrapyuser:scrapypwd" | chpasswd && adduser scrapyuser sudo
USER scrapyuser
CMD /bin/bash
@redapple
redapple / Dockerfile
Last active May 4, 2016 11:00
Dockerized Install scrapy 1.1 RC4 on CentOS7
FROM centos:centos7
RUN yum update -y
# Install Python and dev headers
RUN yum install -y \
python-devel
# Install cryptography
# https://cryptography.io/en/latest/installation/#building-cryptography-on-linux
@redapple
redapple / Dockerfile
Created April 11, 2016 15:56
Scrapy Ubuntu Trusty (14.04) Dockerfile
FROM ubuntu:trusty
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update
# Install Python3 and dev headers
RUN apt-get install -y \
python3 \
python-dev \
@redapple
redapple / console.txt
Created April 5, 2016 22:17
stackoverflow 36391781 error
$ scrapy crawl httpbin
2016-04-06 00:16:58 [scrapy] INFO: Scrapy 1.1.0rc3 started (bot: mwtest)
2016-04-06 00:16:58 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'mwtest.spiders', 'SPIDER_MODULES': ['mwtest.spiders'], 'BOT_NAME': 'mwtest'}
2016-04-06 00:16:58 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2016-04-06 00:16:58 [py.warnings] WARNING: /home/paul/tmp/mwtest/mwtest/middlewares.py:1: ScrapyDeprecationWarning: Module `scrapy.log` has been deprecated, Scrapy now relies on the builtin Python library for logging. Read the updated logging entry in the documentation to learn more.
from scrapy import log, signals
$ scrapy shell
2016-02-01 12:41:35 [scrapy] INFO: Scrapy 1.1.0dev1 started (bot: scrapybot)
2016-02-01 12:41:35 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-02-01 12:41:35 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2016-02-01 12:41:35 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
$ scrapy shell
2016-01-28 18:21:43 [scrapy] INFO: Scrapy 1.1.0dev1 started (bot: scrapybot)
2016-01-28 18:21:43 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter'}
2016-01-28 18:21:43 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2016-01-28 18:21:44 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
@redapple
redapple / scrapyshell
Created December 30, 2014 14:25
YouTube js2xml
$ scrapy shell "https://www.youtube.com/watch?v=1EFnX1UkXVU"
/usr/local/lib/python2.7/dist-packages/twisted/internet/_sslverify.py:184: UserWarning: You do not have the service_identity module installed. Please install it from <https://pypi.python.org/pypi/service_identity>. Without the service_identity module and a recent enough pyOpenSSL tosupport it, Twisted can perform only rudimentary TLS client hostnameverification. Many valid certificate/hostname mappings may be rejected.
verifyHostname, VerificationError = _selectVerifyImplementation()
2014-12-30 15:18:08+0100 [scrapy] INFO: Scrapy 0.24.4 started (bot: scrapybot)
2014-12-30 15:18:08+0100 [scrapy] INFO: Optional features available: ssl, http11, boto, django
2014-12-30 15:18:08+0100 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-12-30 15:18:08+0100 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-12-30 15:18:08+0100 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddlewa