Skip to content

Instantly share code, notes, and snippets.

@dangra
Created June 22, 2016 14:07
Show Gist options
  • Save dangra/5b274d2eab3e95196687328ae2a23804 to your computer and use it in GitHub Desktop.
Save dangra/5b274d2eab3e95196687328ae2a23804 to your computer and use it in GitHub Desktop.
$ docker run -it --rm scrapinghub/scrapinghub-stack-hworker scrapy fetch https://dk.trustpilot.com/review/www.telia.dk
[sudo] password for daniel:
Unable to find image 'scrapinghub/scrapinghub-stack-hworker:latest' locally
latest: Pulling from scrapinghub/scrapinghub-stack-hworker
4edf76921243: Already exists
044c0d9e0cd9: Already exists
331fbd6c3dec: Already exists
8f76788f1cb3: Already exists
a3ed95caeb02: Already exists
5621025737cb: Already exists
0c6148c6d8f8: Already exists
8d183022f597: Already exists
4c63841c88b5: Already exists
93cf7ad7b15e: Already exists
5a36e6dd55b0: Already exists
29f6da5d1090: Already exists
8e96e8f9d540: Already exists
8ef7c8d77ef8: Already exists
0d100e2fca6f: Already exists
694dd37ae0bc: Already exists
a5eae757e960: Already exists
Digest: sha256:037841d944a7523092eac4b0e5413d4c668b51729f51614b438dc5eab94619e1
Status: Downloaded newer image for scrapinghub/scrapinghub-stack-hworker:latest
2016-06-22 14:03:40 [scrapy] INFO: Scrapy 1.0.5 started (bot: scrapybot)
2016-06-22 14:03:40 [scrapy] INFO: Optional features available: ssl, http11, boto
2016-06-22 14:03:40 [scrapy] INFO: Overridden settings: {}
2016-06-22 14:03:40 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2016-06-22 14:03:40 [boto] DEBUG: Retrieving credentials from metadata server.
2016-06-22 14:03:41 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
2016-06-22 14:03:41 [boto] ERROR: Unable to read instance data, giving up
2016-06-22 14:03:41 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-06-22 14:03:41 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-06-22 14:03:41 [scrapy] INFO: Enabled item pipelines:
2016-06-22 14:03:41 [scrapy] INFO: Spider opened
2016-06-22 14:03:41 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-06-22 14:03:41 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-06-22 14:03:41 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 1 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2016-06-22 14:03:41 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 2 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2016-06-22 14:03:41 [scrapy] DEBUG: Gave up retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 3 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2016-06-22 14:03:41 [scrapy] ERROR: Error downloading <GET https://dk.trustpilot.com/review/www.telia.dk>: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>]
2016-06-22 14:03:41 [scrapy] INFO: Closing spider (finished)
2016-06-22 14:03:41 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 3,
'downloader/exception_type_count/scrapy.xlib.tx._newclient.ResponseNeverReceived': 3,
'downloader/request_bytes': 702,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 6, 22, 14, 3, 41, 871462),
'log_count/DEBUG': 5,
'log_count/ERROR': 3,
'log_count/INFO': 7,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2016, 6, 22, 14, 3, 41, 588967)}
2016-06-22 14:03:41 [scrapy] INFO: Spider closed (finished)
$ docker run -it --rm scrapinghub/scrapinghub-stack-scrapy:1.0 scrapy fetch https://dk.trustpilot.com/review/www.telia.dk
Unable to find image 'scrapinghub/scrapinghub-stack-scrapy:1.0' locally
1.0: Pulling from scrapinghub/scrapinghub-stack-scrapy
efd26ecc9548: Pull complete
a3ed95caeb02: Pull complete
d1784d73276e: Pull complete
72e581645fc3: Pull complete
9709ddcc4d24: Pull complete
2d600f0ec235: Pull complete
def2162116ce: Pull complete
498b2ad5725d: Pull complete
406b2ac41563: Pull complete
6358b959e7b0: Pull complete
d6b81cbbf698: Pull complete
Digest: sha256:97218a7558ef42ff2d01f7d9bca2e1abdad1315c49264de6beec82a171cd8d7a
Status: Downloaded newer image for scrapinghub/scrapinghub-stack-scrapy:1.0
2016-06-22 14:05:22 [scrapy] INFO: Scrapy 1.0.5 started (bot: scrapybot)
2016-06-22 14:05:22 [scrapy] INFO: Optional features available: ssl, http11
2016-06-22 14:05:22 [scrapy] INFO: Overridden settings: {}
2016-06-22 14:05:22 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2016-06-22 14:05:22 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-06-22 14:05:22 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-06-22 14:05:22 [scrapy] INFO: Enabled item pipelines:
2016-06-22 14:05:22 [scrapy] INFO: Spider opened
2016-06-22 14:05:22 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-06-22 14:05:22 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-06-22 14:05:22 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 1 times): 500 Internal Server Error
2016-06-22 14:05:22 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 2 times): 500 Internal Server Error
2016-06-22 14:05:22 [scrapy] DEBUG: Gave up retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 3 times): 500 Internal Server Error
2016-06-22 14:05:22 [scrapy] DEBUG: Crawled (500) <GET https://dk.trustpilot.com/review/www.telia.dk> (referer: None)
<!DOCTYPE html><html lang="da-dk"><head>
...
<title>Trustpilot</title><meta charset="UTF-8" /><script type="text/javascript">window.NREUM||(NREUM={});NREUM.info = {"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","licenseKey":"5596035ebd",
...
$ docker run -it --rm scrapinghub/scrapinghub-stack-scrapy:1.1 scrapy fetch https://dk.trustpilot.com/review/www.telia.dk
Unable to find image 'scrapinghub/scrapinghub-stack-scrapy:1.1' locally
1.1: Pulling from scrapinghub/scrapinghub-stack-scrapy
8b87079b7a06: Already exists
a3ed95caeb02: Already exists
1bb8eaf3d643: Already exists
3e04171ce2e5: Already exists
0b73d3fea769: Already exists
167a085f33b1: Already exists
a498799bc49b: Already exists
c2e64a7ec940: Already exists
3be26987bd94: Already exists
0d6c0868d65b: Already exists
1a8d951cd3af: Already exists
089603c4e105: Already exists
a98e15826cda: Already exists
Digest: sha256:3c3f419a51cc694a32faff8a2f6388234a068d951af5af19f6009cff1e8e2fe3
Status: Downloaded newer image for scrapinghub/scrapinghub-stack-scrapy:1.1
2016-06-22 14:06:49 [scrapy] INFO: Scrapy 1.1.0rc4 started (bot: scrapybot)
2016-06-22 14:06:49 [scrapy] INFO: Overridden settings: {}
2016-06-22 14:06:49 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2016-06-22 14:06:49 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-06-22 14:06:49 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-06-22 14:06:49 [scrapy] INFO: Enabled item pipelines:
[]
2016-06-22 14:06:49 [scrapy] INFO: Spider opened
2016-06-22 14:06:49 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-06-22 14:06:49 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-06-22 14:06:49 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 1 times): 500 Internal Server Error
2016-06-22 14:06:49 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 2 times): 500 Internal Server Error
2016-06-22 14:06:49 [scrapy] DEBUG: Gave up retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 3 times): 500 Internal Server Error
2016-06-22 14:06:49 [scrapy] DEBUG: Crawled (500) <GET https://dk.trustpilot.com/review/www.telia.dk> (referer: None)
<!DOCTYPE html><html lang="da-dk"><head>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment