Created
August 28, 2018 09:56
-
-
Save whalebot-helmsman/427908453e28e4d91ba24062a3a1aa05 to your computer and use it in GitHub Desktop.
Performance comprasion
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 10 --book_url http://localhost:8080/books.toscrape.com/ bookworm in /home/nikita/ves/scrapy-bench | |
2018-08-28 09:07:03 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:07:03 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:07:03 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:07:03 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:07:03 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:07:03 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:07:03 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:07:03 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:07:03 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:07:03 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:07:07 [scrapy.extensions.logstats] INFO: Crawled 224 pages (at 4480 pages/min), scraped 202 items (at 4040 items/min) | |
2018-08-28 09:07:10 [scrapy.extensions.logstats] INFO: Crawled 492 pages (at 5360 pages/min), scraped 472 items (at 5400 items/min) | |
2018-08-28 09:07:13 [scrapy.extensions.logstats] INFO: Crawled 772 pages (at 5600 pages/min), scraped 743 items (at 5420 items/min) | |
2018-08-28 09:07:15 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:07:16 [scrapy.extensions.logstats] INFO: Crawled 1078 pages (at 6120 pages/min), scraped 1042 items (at 5980 items/min) | |
2018-08-28 09:07:16 [scrapy.extensions.feedexport] INFO: Stored csv feed (1078 items) in: items.csv | |
The average speed of the spider is 85.19939933195813 items/sec | |
2018-08-28 09:07:16 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414549, | |
'downloader/request_count': 1078, | |
'downloader/request_method_count/GET': 1078, | |
'downloader/response_bytes': 23642440, | |
'downloader/response_count': 1078, | |
'downloader/response_status_count/200': 1078, | |
'dupefilter/filtered': 15023, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 7, 16, 239236), | |
'item_scraped_count': 1078, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52228096, | |
'memusage/startup': 52228096, | |
'request_depth_max': 9, | |
'response_received_count': 1078, | |
'scheduler/dequeued': 1078, | |
'scheduler/dequeued/memory': 1078, | |
'scheduler/enqueued': 1101, | |
'scheduler/enqueued/memory': 1101, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 3, 917687)} | |
2018-08-28 09:07:16 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:07:16 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:07:16 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:07:16 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:07:16 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:07:16 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:07:16 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:07:16 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:07:16 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:07:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:07:16 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:07:19 [scrapy.extensions.logstats] INFO: Crawled 212 pages (at 4240 pages/min), scraped 205 items (at 4100 items/min) | |
2018-08-28 09:07:22 [scrapy.extensions.logstats] INFO: Crawled 511 pages (at 5980 pages/min), scraped 469 items (at 5280 items/min) | |
2018-08-28 09:07:25 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 5280 pages/min), scraped 767 items (at 5960 items/min) | |
2018-08-28 09:07:28 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:07:28 [scrapy.extensions.logstats] INFO: Crawled 1069 pages (at 5880 pages/min), scraped 1065 items (at 5960 items/min) | |
2018-08-28 09:07:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (1069 items) in: items.csv | |
The average speed of the spider is 88.36135238416237 items/sec | |
2018-08-28 09:07:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 411082, | |
'downloader/request_count': 1069, | |
'downloader/request_method_count/GET': 1069, | |
'downloader/response_bytes': 23500203, | |
'downloader/response_count': 1069, | |
'downloader/response_status_count/200': 1069, | |
'dupefilter/filtered': 15006, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 7, 28, 906855), | |
'item_scraped_count': 1069, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 52129792, | |
'memusage/startup': 52129792, | |
'request_depth_max': 10, | |
'response_received_count': 1069, | |
'scheduler/dequeued': 1069, | |
'scheduler/dequeued/memory': 1069, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 16, 818949)} | |
2018-08-28 09:07:28 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:07:29 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:07:29 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:07:29 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:07:29 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:07:29 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:07:29 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:07:29 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:07:29 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:07:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:07:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:07:32 [scrapy.extensions.logstats] INFO: Crawled 209 pages (at 4180 pages/min), scraped 203 items (at 4060 items/min) | |
2018-08-28 09:07:35 [scrapy.extensions.logstats] INFO: Crawled 475 pages (at 5320 pages/min), scraped 470 items (at 5340 items/min) | |
2018-08-28 09:07:38 [scrapy.extensions.logstats] INFO: Crawled 803 pages (at 6560 pages/min), scraped 753 items (at 5660 items/min) | |
2018-08-28 09:07:41 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:07:41 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 5520 pages/min), scraped 1055 items (at 6040 items/min) | |
2018-08-28 09:07:41 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 86.64585359347458 items/sec | |
2018-08-28 09:07:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 415124, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 7, 41, 762999), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52117504, | |
'memusage/startup': 52117504, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 29, 505044)} | |
2018-08-28 09:07:41 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:07:42 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:07:42 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:07:42 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:07:42 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:07:42 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:07:42 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:07:42 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:07:42 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:07:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:07:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:07:45 [scrapy.extensions.logstats] INFO: Crawled 225 pages (at 4500 pages/min), scraped 202 items (at 4040 items/min) | |
2018-08-28 09:07:48 [scrapy.extensions.logstats] INFO: Crawled 493 pages (at 5360 pages/min), scraped 475 items (at 5460 items/min) | |
2018-08-28 09:07:51 [scrapy.extensions.logstats] INFO: Crawled 777 pages (at 5680 pages/min), scraped 746 items (at 5420 items/min) | |
2018-08-28 09:07:53 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:07:54 [scrapy.extensions.logstats] INFO: Crawled 1074 pages (at 5940 pages/min), scraped 1056 items (at 6200 items/min) | |
2018-08-28 09:07:54 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 86.9510758781473 items/sec | |
2018-08-28 09:07:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414850, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 7, 54, 748334), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52105216, | |
'memusage/startup': 52105216, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 42, 358562)} | |
2018-08-28 09:07:54 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:07:55 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:07:55 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:07:55 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:07:55 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:07:55 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:07:55 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:07:55 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:07:55 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:07:55 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:07:55 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:07:58 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 205 items (at 4100 items/min) | |
2018-08-28 09:08:01 [scrapy.extensions.logstats] INFO: Crawled 519 pages (at 6160 pages/min), scraped 467 items (at 5240 items/min) | |
2018-08-28 09:08:04 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 5120 pages/min), scraped 767 items (at 6000 items/min) | |
2018-08-28 09:08:07 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:08:07 [scrapy.extensions.logstats] INFO: Crawled 1069 pages (at 5880 pages/min), scraped 1053 items (at 5720 items/min) | |
2018-08-28 09:08:07 [scrapy.extensions.feedexport] INFO: Stored csv feed (1069 items) in: items.csv | |
The average speed of the spider is 86.69472099064264 items/sec | |
2018-08-28 09:08:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 411007, | |
'downloader/request_count': 1069, | |
'downloader/request_method_count/GET': 1069, | |
'downloader/response_bytes': 23500340, | |
'downloader/response_count': 1069, | |
'downloader/response_status_count/200': 1069, | |
'dupefilter/filtered': 15006, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 7, 544255), | |
'item_scraped_count': 1069, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 52109312, | |
'memusage/startup': 52109312, | |
'request_depth_max': 9, | |
'response_received_count': 1069, | |
'scheduler/dequeued': 1069, | |
'scheduler/dequeued/memory': 1069, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 55, 346436)} | |
2018-08-28 09:08:07 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:08:08 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:08:08 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:08:08 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:08:08 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:08:08 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:08:08 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:08:08 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:08:08 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:08:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:08:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:08:11 [scrapy.extensions.logstats] INFO: Crawled 212 pages (at 4240 pages/min), scraped 205 items (at 4100 items/min) | |
2018-08-28 09:08:14 [scrapy.extensions.logstats] INFO: Crawled 502 pages (at 5800 pages/min), scraped 483 items (at 5560 items/min) | |
2018-08-28 09:08:17 [scrapy.extensions.logstats] INFO: Crawled 809 pages (at 6140 pages/min), scraped 753 items (at 5400 items/min) | |
2018-08-28 09:08:19 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:08:20 [scrapy.extensions.logstats] INFO: Crawled 1056 pages (at 4940 pages/min), scraped 1049 items (at 5920 items/min) | |
2018-08-28 09:08:20 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv | |
The average speed of the spider is 86.72703503427279 items/sec | |
2018-08-28 09:08:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 406518, | |
'downloader/request_count': 1057, | |
'downloader/request_method_count/GET': 1057, | |
'downloader/response_bytes': 23209920, | |
'downloader/response_count': 1057, | |
'downloader/response_status_count/200': 1057, | |
'dupefilter/filtered': 14790, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 20, 307838), | |
'item_scraped_count': 1057, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52531200, | |
'memusage/startup': 52531200, | |
'request_depth_max': 9, | |
'response_received_count': 1057, | |
'scheduler/dequeued': 1057, | |
'scheduler/dequeued/memory': 1057, | |
'scheduler/enqueued': 1080, | |
'scheduler/enqueued/memory': 1080, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 8, 128974)} | |
2018-08-28 09:08:20 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:08:20 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:08:20 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:08:20 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:08:20 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:08:20 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:08:20 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:08:20 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:08:20 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:08:20 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:08:20 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:08:24 [scrapy.extensions.logstats] INFO: Crawled 226 pages (at 4520 pages/min), scraped 196 items (at 3920 items/min) | |
2018-08-28 09:08:27 [scrapy.extensions.logstats] INFO: Crawled 486 pages (at 5200 pages/min), scraped 470 items (at 5480 items/min) | |
2018-08-28 09:08:30 [scrapy.extensions.logstats] INFO: Crawled 762 pages (at 5520 pages/min), scraped 748 items (at 5560 items/min) | |
2018-08-28 09:08:32 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:08:32 [scrapy.extensions.logstats] INFO: Crawled 1053 pages (at 5820 pages/min), scraped 1049 items (at 6020 items/min) | |
2018-08-28 09:08:33 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv | |
The average speed of the spider is 86.12669129960925 items/sec | |
2018-08-28 09:08:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 406997, | |
'downloader/request_count': 1058, | |
'downloader/request_method_count/GET': 1058, | |
'downloader/response_bytes': 23261366, | |
'downloader/response_count': 1058, | |
'downloader/response_status_count/200': 1058, | |
'dupefilter/filtered': 14863, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 33, 164731), | |
'item_scraped_count': 1058, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52207616, | |
'memusage/startup': 52207616, | |
'request_depth_max': 9, | |
'response_received_count': 1058, | |
'scheduler/dequeued': 1058, | |
'scheduler/dequeued/memory': 1058, | |
'scheduler/enqueued': 1081, | |
'scheduler/enqueued/memory': 1081, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 20, 902393)} | |
2018-08-28 09:08:33 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:08:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:08:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:08:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:08:33 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:08:33 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:08:33 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:08:33 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:08:33 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:08:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:08:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:08:36 [scrapy.extensions.logstats] INFO: Crawled 209 pages (at 4180 pages/min), scraped 204 items (at 4080 items/min) | |
2018-08-28 09:08:39 [scrapy.extensions.logstats] INFO: Crawled 481 pages (at 5440 pages/min), scraped 477 items (at 5460 items/min) | |
2018-08-28 09:08:42 [scrapy.extensions.logstats] INFO: Crawled 765 pages (at 5680 pages/min), scraped 754 items (at 5540 items/min) | |
2018-08-28 09:08:45 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:08:45 [scrapy.extensions.logstats] INFO: Crawled 1070 pages (at 6100 pages/min), scraped 1048 items (at 5880 items/min) | |
2018-08-28 09:08:46 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 86.4397849642877 items/sec | |
2018-08-28 09:08:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414962, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 46, 159024), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52101120, | |
'memusage/startup': 52101120, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 33, 751394)} | |
2018-08-28 09:08:46 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:08:46 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:08:46 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:08:46 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:08:46 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:08:46 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:08:46 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:08:46 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:08:46 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:08:46 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:08:46 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:08:49 [scrapy.extensions.logstats] INFO: Crawled 207 pages (at 4140 pages/min), scraped 205 items (at 4100 items/min) | |
2018-08-28 09:08:52 [scrapy.extensions.logstats] INFO: Crawled 529 pages (at 6440 pages/min), scraped 469 items (at 5280 items/min) | |
2018-08-28 09:08:56 [scrapy.extensions.logstats] INFO: Crawled 797 pages (at 5360 pages/min), scraped 762 items (at 5860 items/min) | |
2018-08-28 09:08:58 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:08:58 [scrapy.extensions.logstats] INFO: Crawled 1059 pages (at 5240 pages/min), scraped 1057 items (at 5900 items/min) | |
2018-08-28 09:08:58 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv | |
The average speed of the spider is 87.85250088725208 items/sec | |
2018-08-28 09:08:58 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 407320, | |
'downloader/request_count': 1059, | |
'downloader/request_method_count/GET': 1059, | |
'downloader/response_bytes': 23312675, | |
'downloader/response_count': 1059, | |
'downloader/response_status_count/200': 1059, | |
'dupefilter/filtered': 14936, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 58, 801872), | |
'item_scraped_count': 1059, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52125696, | |
'memusage/startup': 52125696, | |
'request_depth_max': 10, | |
'response_received_count': 1059, | |
'scheduler/dequeued': 1059, | |
'scheduler/dequeued/memory': 1059, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 46, 749978)} | |
2018-08-28 09:08:58 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:08:59 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:08:59 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:08:59 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:08:59 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:08:59 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:08:59 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:08:59 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:08:59 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:08:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:08:59 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:09:02 [scrapy.extensions.logstats] INFO: Crawled 207 pages (at 4140 pages/min), scraped 205 items (at 4100 items/min) | |
2018-08-28 09:09:05 [scrapy.extensions.logstats] INFO: Crawled 492 pages (at 5700 pages/min), scraped 479 items (at 5480 items/min) | |
2018-08-28 09:09:08 [scrapy.extensions.logstats] INFO: Crawled 773 pages (at 5620 pages/min), scraped 763 items (at 5680 items/min) | |
2018-08-28 09:09:10 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:09:11 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 6120 pages/min), scraped 1070 items (at 6140 items/min) | |
2018-08-28 09:09:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 88.45882549677393 items/sec | |
2018-08-28 09:09:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414962, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 9, 11, 569799), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52412416, | |
'memusage/startup': 52412416, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 59, 389989)} | |
2018-08-28 09:09:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '10' | |
Mean : 86.94572398605807 Median : 86.71087801245771 Std Dev : 0.9639990239084966 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 10 --book_url http://localhost:8080/books.toscrape.com/ bookworm in /home/nikita/ves/scrapy-bench | |
2018-08-28 09:09:29 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:09:29 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:09:29 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:09:29 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:09:29 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:09:29 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:09:29 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:09:29 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:09:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:09:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:09:32 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 204 items (at 4080 items/min) | |
2018-08-28 09:09:35 [scrapy.extensions.logstats] INFO: Crawled 482 pages (at 5420 pages/min), scraped 480 items (at 5520 items/min) | |
2018-08-28 09:09:38 [scrapy.extensions.logstats] INFO: Crawled 800 pages (at 6360 pages/min), scraped 767 items (at 5740 items/min) | |
2018-08-28 09:09:41 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:09:41 [scrapy.extensions.logstats] INFO: Crawled 1072 pages (at 5440 pages/min), scraped 1067 items (at 6000 items/min) | |
2018-08-28 09:09:41 [scrapy.extensions.feedexport] INFO: Stored csv feed (1080 items) in: items.csv | |
The average speed of the spider is 87.72160455346388 items/sec | |
2018-08-28 09:09:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 415333, | |
'downloader/request_count': 1080, | |
'downloader/request_method_count/GET': 1080, | |
'downloader/response_bytes': 23745195, | |
'downloader/response_count': 1080, | |
'downloader/response_status_count/200': 1080, | |
'dupefilter/filtered': 15169, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 9, 41, 974612), | |
'item_scraped_count': 1080, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52178944, | |
'memusage/startup': 52178944, | |
'request_depth_max': 10, | |
'response_received_count': 1080, | |
'scheduler/dequeued': 1080, | |
'scheduler/dequeued/memory': 1080, | |
'scheduler/enqueued': 1103, | |
'scheduler/enqueued/memory': 1103, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 9, 29, 722831)} | |
2018-08-28 09:09:41 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:09:42 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:09:42 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:09:42 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:09:42 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:09:42 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:09:42 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:09:42 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:09:42 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:09:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:09:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:09:45 [scrapy.extensions.logstats] INFO: Crawled 206 pages (at 4120 pages/min), scraped 201 items (at 4020 items/min) | |
2018-08-28 09:09:48 [scrapy.extensions.logstats] INFO: Crawled 472 pages (at 5320 pages/min), scraped 465 items (at 5280 items/min) | |
2018-08-28 09:09:51 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 6060 pages/min), scraped 765 items (at 6000 items/min) | |
2018-08-28 09:09:54 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:09:54 [scrapy.extensions.logstats] INFO: Crawled 1058 pages (at 5660 pages/min), scraped 1055 items (at 5800 items/min) | |
2018-08-28 09:09:54 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv | |
The average speed of the spider is 87.18518182155935 items/sec | |
2018-08-28 09:09:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 407139, | |
'downloader/request_count': 1058, | |
'downloader/request_method_count/GET': 1058, | |
'downloader/response_bytes': 23261366, | |
'downloader/response_count': 1058, | |
'downloader/response_status_count/200': 1058, | |
'dupefilter/filtered': 14863, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 9, 54, 679296), | |
'item_scraped_count': 1058, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52494336, | |
'memusage/startup': 52494336, | |
'request_depth_max': 9, | |
'response_received_count': 1058, | |
'scheduler/dequeued': 1058, | |
'scheduler/dequeued/memory': 1058, | |
'scheduler/enqueued': 1081, | |
'scheduler/enqueued/memory': 1081, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 9, 42, 554629)} | |
2018-08-28 09:09:54 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:09:55 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:09:55 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:09:55 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:09:55 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:09:55 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:09:55 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:09:55 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:09:55 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:09:55 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:09:55 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:09:58 [scrapy.extensions.logstats] INFO: Crawled 203 pages (at 4060 pages/min), scraped 197 items (at 3940 items/min) | |
2018-08-28 09:10:01 [scrapy.extensions.logstats] INFO: Crawled 501 pages (at 5960 pages/min), scraped 480 items (at 5660 items/min) | |
2018-08-28 09:10:04 [scrapy.extensions.logstats] INFO: Crawled 799 pages (at 5960 pages/min), scraped 747 items (at 5340 items/min) | |
2018-08-28 09:10:07 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:10:07 [scrapy.extensions.logstats] INFO: Crawled 1055 pages (at 5120 pages/min), scraped 1051 items (at 6080 items/min) | |
2018-08-28 09:10:07 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 85.799653992805 items/sec | |
2018-08-28 09:10:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414973, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 7, 652824), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52105216, | |
'memusage/startup': 52105216, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 9, 55, 270759)} | |
2018-08-28 09:10:07 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:10:08 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:10:08 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:10:08 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:10:08 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:10:08 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:10:08 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:10:08 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:10:08 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:10:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:10:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:10:11 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 205 items (at 4100 items/min) | |
2018-08-28 09:10:14 [scrapy.extensions.logstats] INFO: Crawled 507 pages (at 5920 pages/min), scraped 471 items (at 5320 items/min) | |
2018-08-28 09:10:17 [scrapy.extensions.logstats] INFO: Crawled 777 pages (at 5400 pages/min), scraped 762 items (at 5820 items/min) | |
2018-08-28 09:10:19 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:10:20 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 6040 pages/min), scraped 1064 items (at 6040 items/min) | |
2018-08-28 09:10:20 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 86.19604560587408 items/sec | |
2018-08-28 09:10:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414860, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 20, 463788), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52125696, | |
'memusage/startup': 52125696, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 8, 244426)} | |
2018-08-28 09:10:20 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:10:21 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:10:21 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:10:21 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:10:21 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:10:21 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:10:21 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:10:21 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:10:21 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:10:21 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:10:21 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:10:24 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 207 items (at 4140 items/min) | |
2018-08-28 09:10:27 [scrapy.extensions.logstats] INFO: Crawled 487 pages (at 5520 pages/min), scraped 480 items (at 5460 items/min) | |
2018-08-28 09:10:30 [scrapy.extensions.logstats] INFO: Crawled 772 pages (at 5700 pages/min), scraped 766 items (at 5720 items/min) | |
2018-08-28 09:10:32 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:10:33 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 6140 pages/min), scraped 1078 items (at 6240 items/min) | |
2018-08-28 09:10:33 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 87.88084122833638 items/sec | |
2018-08-28 09:10:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414975, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 33, 139077), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52408320, | |
'memusage/startup': 52408320, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 21, 48833)} | |
2018-08-28 09:10:33 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:10:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:10:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:10:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:10:33 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:10:33 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:10:33 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:10:33 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:10:33 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:10:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:10:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:10:36 [scrapy.extensions.logstats] INFO: Crawled 223 pages (at 4460 pages/min), scraped 203 items (at 4060 items/min) | |
2018-08-28 09:10:39 [scrapy.extensions.logstats] INFO: Crawled 495 pages (at 5440 pages/min), scraped 473 items (at 5400 items/min) | |
2018-08-28 09:10:42 [scrapy.extensions.logstats] INFO: Crawled 783 pages (at 5760 pages/min), scraped 747 items (at 5480 items/min) | |
2018-08-28 09:10:45 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:10:45 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 5920 pages/min), scraped 1059 items (at 6240 items/min) | |
2018-08-28 09:10:45 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 86.54747083527113 items/sec | |
2018-08-28 09:10:45 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414864, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 45, 959290), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52027392, | |
'memusage/startup': 52027392, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 33, 722323)} | |
2018-08-28 09:10:45 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:10:46 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:10:46 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:10:46 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:10:46 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:10:46 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:10:46 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:10:46 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:10:46 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:10:46 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:10:46 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:10:49 [scrapy.extensions.logstats] INFO: Crawled 226 pages (at 4520 pages/min), scraped 199 items (at 3980 items/min) | |
2018-08-28 09:10:52 [scrapy.extensions.logstats] INFO: Crawled 484 pages (at 5160 pages/min), scraped 471 items (at 5440 items/min) | |
2018-08-28 09:10:55 [scrapy.extensions.logstats] INFO: Crawled 770 pages (at 5720 pages/min), scraped 764 items (at 5860 items/min) | |
2018-08-28 09:10:58 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:10:58 [scrapy.extensions.logstats] INFO: Crawled 1070 pages (at 6000 pages/min), scraped 1065 items (at 6020 items/min) | |
2018-08-28 09:10:58 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 87.01701995983392 items/sec | |
2018-08-28 09:10:58 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 415045, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 58, 863905), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52129792, | |
'memusage/startup': 52129792, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 46, 547978)} | |
2018-08-28 09:10:58 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:10:59 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:10:59 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:10:59 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:10:59 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:10:59 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:10:59 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:10:59 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:10:59 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:10:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:10:59 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:11:02 [scrapy.extensions.logstats] INFO: Crawled 209 pages (at 4180 pages/min), scraped 204 items (at 4080 items/min) | |
2018-08-28 09:11:05 [scrapy.extensions.logstats] INFO: Crawled 495 pages (at 5720 pages/min), scraped 486 items (at 5640 items/min) | |
2018-08-28 09:11:08 [scrapy.extensions.logstats] INFO: Crawled 771 pages (at 5520 pages/min), scraped 767 items (at 5620 items/min) | |
2018-08-28 09:11:10 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:11:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv | |
The average speed of the spider is 86.25596305414507 items/sec | |
2018-08-28 09:11:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 406642, | |
'downloader/request_count': 1057, | |
'downloader/request_method_count/GET': 1057, | |
'downloader/response_bytes': 23209920, | |
'downloader/response_count': 1057, | |
'downloader/response_status_count/200': 1057, | |
'dupefilter/filtered': 14790, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 11, 11, 411467), | |
'item_scraped_count': 1057, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52207616, | |
'memusage/startup': 52207616, | |
'request_depth_max': 9, | |
'response_received_count': 1057, | |
'scheduler/dequeued': 1057, | |
'scheduler/dequeued/memory': 1057, | |
'scheduler/enqueued': 1080, | |
'scheduler/enqueued/memory': 1080, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 59, 450315)} | |
2018-08-28 09:11:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:11:11 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:11:11 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:11:11 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:11:11 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:11:11 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:11:11 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:11:11 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:11:11 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:11:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:11:11 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:11:15 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 205 items (at 4100 items/min) | |
2018-08-28 09:11:18 [scrapy.extensions.logstats] INFO: Crawled 512 pages (at 6020 pages/min), scraped 465 items (at 5200 items/min) | |
2018-08-28 09:11:21 [scrapy.extensions.logstats] INFO: Crawled 764 pages (at 5040 pages/min), scraped 746 items (at 5620 items/min) | |
2018-08-28 09:11:23 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:11:24 [scrapy.extensions.logstats] INFO: Crawled 1054 pages (at 5800 pages/min), scraped 1048 items (at 6040 items/min) | |
2018-08-28 09:11:24 [scrapy.extensions.feedexport] INFO: Stored csv feed (1078 items) in: items.csv | |
The average speed of the spider is 85.46989085226573 items/sec | |
2018-08-28 09:11:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414607, | |
'downloader/request_count': 1078, | |
'downloader/request_method_count/GET': 1078, | |
'downloader/response_bytes': 23642440, | |
'downloader/response_count': 1078, | |
'downloader/response_status_count/200': 1078, | |
'dupefilter/filtered': 15023, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 11, 24, 362757), | |
'item_scraped_count': 1078, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52387840, | |
'memusage/startup': 52387840, | |
'request_depth_max': 9, | |
'response_received_count': 1078, | |
'scheduler/dequeued': 1078, | |
'scheduler/dequeued/memory': 1078, | |
'scheduler/enqueued': 1101, | |
'scheduler/enqueued/memory': 1101, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 11, 11, 992710)} | |
2018-08-28 09:11:24 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-08-28 09:11:24 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-08-28 09:11:24 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid | |
2018-08-28 09:11:24 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-08-28 09:11:24 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-08-28 09:11:24 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-08-28 09:11:24 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-08-28 09:11:24 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-08-28 09:11:24 [scrapy.core.engine] INFO: Spider opened | |
2018-08-28 09:11:24 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-08-28 09:11:24 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-08-28 09:11:27 [scrapy.extensions.logstats] INFO: Crawled 207 pages (at 4140 pages/min), scraped 201 items (at 4020 items/min) | |
2018-08-28 09:11:31 [scrapy.extensions.logstats] INFO: Crawled 491 pages (at 5680 pages/min), scraped 477 items (at 5520 items/min) | |
2018-08-28 09:11:34 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 5680 pages/min), scraped 768 items (at 5820 items/min) | |
2018-08-28 09:11:36 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-08-28 09:11:37 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 6080 pages/min), scraped 1071 items (at 6060 items/min) | |
2018-08-28 09:11:37 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 85.69234060623224 items/sec | |
2018-08-28 09:11:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 414962, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23693886, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 8, 28, 9, 11, 37, 132935), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 13, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52129792, | |
'memusage/startup': 52129792, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 8, 28, 9, 11, 24, 957884)} | |
2018-08-28 09:11:37 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '10' | |
Mean : 86.57660125097868 Median : 86.4017169447081 Std Dev : 0.8022010534209844 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment