Created
June 22, 2016 14:07
-
-
Save dangra/5b274d2eab3e95196687328ae2a23804 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ docker run -it --rm scrapinghub/scrapinghub-stack-hworker scrapy fetch https://dk.trustpilot.com/review/www.telia.dk | |
[sudo] password for daniel: | |
Unable to find image 'scrapinghub/scrapinghub-stack-hworker:latest' locally | |
latest: Pulling from scrapinghub/scrapinghub-stack-hworker | |
4edf76921243: Already exists | |
044c0d9e0cd9: Already exists | |
331fbd6c3dec: Already exists | |
8f76788f1cb3: Already exists | |
a3ed95caeb02: Already exists | |
5621025737cb: Already exists | |
0c6148c6d8f8: Already exists | |
8d183022f597: Already exists | |
4c63841c88b5: Already exists | |
93cf7ad7b15e: Already exists | |
5a36e6dd55b0: Already exists | |
29f6da5d1090: Already exists | |
8e96e8f9d540: Already exists | |
8ef7c8d77ef8: Already exists | |
0d100e2fca6f: Already exists | |
694dd37ae0bc: Already exists | |
a5eae757e960: Already exists | |
Digest: sha256:037841d944a7523092eac4b0e5413d4c668b51729f51614b438dc5eab94619e1 | |
Status: Downloaded newer image for scrapinghub/scrapinghub-stack-hworker:latest | |
2016-06-22 14:03:40 [scrapy] INFO: Scrapy 1.0.5 started (bot: scrapybot) | |
2016-06-22 14:03:40 [scrapy] INFO: Optional features available: ssl, http11, boto | |
2016-06-22 14:03:40 [scrapy] INFO: Overridden settings: {} | |
2016-06-22 14:03:40 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState | |
2016-06-22 14:03:40 [boto] DEBUG: Retrieving credentials from metadata server. | |
2016-06-22 14:03:41 [boto] ERROR: Caught exception reading instance data | |
Traceback (most recent call last): | |
File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url | |
r = opener.open(req, timeout=timeout) | |
File "/usr/lib/python2.7/urllib2.py", line 400, in open | |
response = self._open(req, data) | |
File "/usr/lib/python2.7/urllib2.py", line 418, in _open | |
'_open', req) | |
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain | |
result = func(*args) | |
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open | |
return self.do_open(httplib.HTTPConnection, req) | |
File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open | |
raise URLError(err) | |
URLError: <urlopen error timed out> | |
2016-06-22 14:03:41 [boto] ERROR: Unable to read instance data, giving up | |
2016-06-22 14:03:41 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats | |
2016-06-22 14:03:41 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware | |
2016-06-22 14:03:41 [scrapy] INFO: Enabled item pipelines: | |
2016-06-22 14:03:41 [scrapy] INFO: Spider opened | |
2016-06-22 14:03:41 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2016-06-22 14:03:41 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 | |
2016-06-22 14:03:41 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 1 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>] | |
2016-06-22 14:03:41 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 2 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>] | |
2016-06-22 14:03:41 [scrapy] DEBUG: Gave up retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 3 times): [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>] | |
2016-06-22 14:03:41 [scrapy] ERROR: Error downloading <GET https://dk.trustpilot.com/review/www.telia.dk>: [<twisted.python.failure.Failure <class 'OpenSSL.SSL.Error'>>] | |
2016-06-22 14:03:41 [scrapy] INFO: Closing spider (finished) | |
2016-06-22 14:03:41 [scrapy] INFO: Dumping Scrapy stats: | |
{'downloader/exception_count': 3, | |
'downloader/exception_type_count/scrapy.xlib.tx._newclient.ResponseNeverReceived': 3, | |
'downloader/request_bytes': 702, | |
'downloader/request_count': 3, | |
'downloader/request_method_count/GET': 3, | |
'finish_reason': 'finished', | |
'finish_time': datetime.datetime(2016, 6, 22, 14, 3, 41, 871462), | |
'log_count/DEBUG': 5, | |
'log_count/ERROR': 3, | |
'log_count/INFO': 7, | |
'scheduler/dequeued': 3, | |
'scheduler/dequeued/memory': 3, | |
'scheduler/enqueued': 3, | |
'scheduler/enqueued/memory': 3, | |
'start_time': datetime.datetime(2016, 6, 22, 14, 3, 41, 588967)} | |
2016-06-22 14:03:41 [scrapy] INFO: Spider closed (finished) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ docker run -it --rm scrapinghub/scrapinghub-stack-scrapy:1.0 scrapy fetch https://dk.trustpilot.com/review/www.telia.dk | |
Unable to find image 'scrapinghub/scrapinghub-stack-scrapy:1.0' locally | |
1.0: Pulling from scrapinghub/scrapinghub-stack-scrapy | |
efd26ecc9548: Pull complete | |
a3ed95caeb02: Pull complete | |
d1784d73276e: Pull complete | |
72e581645fc3: Pull complete | |
9709ddcc4d24: Pull complete | |
2d600f0ec235: Pull complete | |
def2162116ce: Pull complete | |
498b2ad5725d: Pull complete | |
406b2ac41563: Pull complete | |
6358b959e7b0: Pull complete | |
d6b81cbbf698: Pull complete | |
Digest: sha256:97218a7558ef42ff2d01f7d9bca2e1abdad1315c49264de6beec82a171cd8d7a | |
Status: Downloaded newer image for scrapinghub/scrapinghub-stack-scrapy:1.0 | |
2016-06-22 14:05:22 [scrapy] INFO: Scrapy 1.0.5 started (bot: scrapybot) | |
2016-06-22 14:05:22 [scrapy] INFO: Optional features available: ssl, http11 | |
2016-06-22 14:05:22 [scrapy] INFO: Overridden settings: {} | |
2016-06-22 14:05:22 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState | |
2016-06-22 14:05:22 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats | |
2016-06-22 14:05:22 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware | |
2016-06-22 14:05:22 [scrapy] INFO: Enabled item pipelines: | |
2016-06-22 14:05:22 [scrapy] INFO: Spider opened | |
2016-06-22 14:05:22 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2016-06-22 14:05:22 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 | |
2016-06-22 14:05:22 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 1 times): 500 Internal Server Error | |
2016-06-22 14:05:22 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 2 times): 500 Internal Server Error | |
2016-06-22 14:05:22 [scrapy] DEBUG: Gave up retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 3 times): 500 Internal Server Error | |
2016-06-22 14:05:22 [scrapy] DEBUG: Crawled (500) <GET https://dk.trustpilot.com/review/www.telia.dk> (referer: None) | |
<!DOCTYPE html><html lang="da-dk"><head> | |
... | |
<title>Trustpilot</title><meta charset="UTF-8" /><script type="text/javascript">window.NREUM||(NREUM={});NREUM.info = {"beacon":"bam.nr-data.net","errorBeacon":"bam.nr-data.net","licenseKey":"5596035ebd", | |
... |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ docker run -it --rm scrapinghub/scrapinghub-stack-scrapy:1.1 scrapy fetch https://dk.trustpilot.com/review/www.telia.dk | |
Unable to find image 'scrapinghub/scrapinghub-stack-scrapy:1.1' locally | |
1.1: Pulling from scrapinghub/scrapinghub-stack-scrapy | |
8b87079b7a06: Already exists | |
a3ed95caeb02: Already exists | |
1bb8eaf3d643: Already exists | |
3e04171ce2e5: Already exists | |
0b73d3fea769: Already exists | |
167a085f33b1: Already exists | |
a498799bc49b: Already exists | |
c2e64a7ec940: Already exists | |
3be26987bd94: Already exists | |
0d6c0868d65b: Already exists | |
1a8d951cd3af: Already exists | |
089603c4e105: Already exists | |
a98e15826cda: Already exists | |
Digest: sha256:3c3f419a51cc694a32faff8a2f6388234a068d951af5af19f6009cff1e8e2fe3 | |
Status: Downloaded newer image for scrapinghub/scrapinghub-stack-scrapy:1.1 | |
2016-06-22 14:06:49 [scrapy] INFO: Scrapy 1.1.0rc4 started (bot: scrapybot) | |
2016-06-22 14:06:49 [scrapy] INFO: Overridden settings: {} | |
2016-06-22 14:06:49 [scrapy] INFO: Enabled extensions: | |
['scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2016-06-22 14:06:49 [scrapy] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.retry.RetryMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2016-06-22 14:06:49 [scrapy] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2016-06-22 14:06:49 [scrapy] INFO: Enabled item pipelines: | |
[] | |
2016-06-22 14:06:49 [scrapy] INFO: Spider opened | |
2016-06-22 14:06:49 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2016-06-22 14:06:49 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023 | |
2016-06-22 14:06:49 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 1 times): 500 Internal Server Error | |
2016-06-22 14:06:49 [scrapy] DEBUG: Retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 2 times): 500 Internal Server Error | |
2016-06-22 14:06:49 [scrapy] DEBUG: Gave up retrying <GET https://dk.trustpilot.com/review/www.telia.dk> (failed 3 times): 500 Internal Server Error | |
2016-06-22 14:06:49 [scrapy] DEBUG: Crawled (500) <GET https://dk.trustpilot.com/review/www.telia.dk> (referer: None) | |
<!DOCTYPE html><html lang="da-dk"><head> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment