Created
November 29, 2018 10:31
-
-
Save whalebot-helmsman/d27104d9d328ae8a9331a8623ef931fd to your computer and use it in GitHub Desktop.
scrapy-bench for round-robin and downloaderaware
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/ | |
2018-11-29 10:17:09 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:17:09 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:17:09 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-11-29 10:17:09 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:17:09 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:17:09 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:17:09 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:17:09 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:17:09 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:17:09 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:17:12 [scrapy.extensions.logstats] INFO: Crawled 120 pages (at 2400 pages/min), scraped 104 items (at 2080 items/min) | |
2018-11-29 10:17:15 [scrapy.extensions.logstats] INFO: Crawled 296 pages (at 3520 pages/min), scraped 277 items (at 3460 items/min) | |
2018-11-29 10:17:18 [scrapy.extensions.logstats] INFO: Crawled 469 pages (at 3460 pages/min), scraped 445 items (at 3360 items/min) | |
2018-11-29 10:17:21 [scrapy.extensions.logstats] INFO: Crawled 628 pages (at 3180 pages/min), scraped 621 items (at 3520 items/min) | |
2018-11-29 10:17:24 [scrapy.extensions.logstats] INFO: Crawled 802 pages (at 3480 pages/min), scraped 794 items (at 3460 items/min) | |
2018-11-29 10:17:27 [scrapy.extensions.logstats] INFO: Crawled 986 pages (at 3680 pages/min), scraped 978 items (at 3680 items/min) | |
2018-11-29 10:17:27 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:17:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv | |
The average speed of the spider is 54.3871510199 items/sec | |
2018-11-29 10:17:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362478, | |
'downloader/request_count': 1044, | |
'downloader/request_method_count/GET': 1044, | |
'downloader/response_bytes': 22995909, | |
'downloader/response_count': 1044, | |
'downloader/response_status_count/200': 1044, | |
'dupefilter/filtered': 14757, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 17, 28, 621201), | |
'item_scraped_count': 1044, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 18, | |
'memusage/max': 46665728, | |
'memusage/startup': 46665728, | |
'request_depth_max': 10, | |
'response_received_count': 1044, | |
'scheduler/dequeued': 1044, | |
'scheduler/dequeued/memory': 1044, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 17, 9, 394821)} | |
2018-11-29 10:17:28 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:17:29 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:17:29 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:17:29 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-11-29 10:17:29 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:17:29 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:17:29 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:17:29 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:17:29 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:17:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:17:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:17:32 [scrapy.extensions.logstats] INFO: Crawled 113 pages (at 2260 pages/min), scraped 105 items (at 2100 items/min) | |
2018-11-29 10:17:35 [scrapy.extensions.logstats] INFO: Crawled 317 pages (at 4080 pages/min), scraped 284 items (at 3580 items/min) | |
2018-11-29 10:17:38 [scrapy.extensions.logstats] INFO: Crawled 479 pages (at 3240 pages/min), scraped 464 items (at 3600 items/min) | |
2018-11-29 10:17:41 [scrapy.extensions.logstats] INFO: Crawled 660 pages (at 3620 pages/min), scraped 637 items (at 3460 items/min) | |
2018-11-29 10:17:44 [scrapy.extensions.logstats] INFO: Crawled 824 pages (at 3280 pages/min), scraped 816 items (at 3580 items/min) | |
2018-11-29 10:17:47 [scrapy.extensions.logstats] INFO: Crawled 1012 pages (at 3760 pages/min), scraped 989 items (at 3460 items/min) | |
2018-11-29 10:17:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:17:48 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 54.9992442872 items/sec | |
2018-11-29 10:17:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363581, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 17, 48, 320235), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46661632, | |
'memusage/startup': 46661632, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 17, 29, 261736)} | |
2018-11-29 10:17:48 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:17:48 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:17:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:17:48 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-11-29 10:17:48 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:17:48 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:17:48 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:17:48 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:17:48 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:17:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:17:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:17:52 [scrapy.extensions.logstats] INFO: Crawled 120 pages (at 2400 pages/min), scraped 105 items (at 2100 items/min) | |
2018-11-29 10:17:54 [scrapy.extensions.logstats] INFO: Crawled 289 pages (at 3380 pages/min), scraped 282 items (at 3540 items/min) | |
2018-11-29 10:17:58 [scrapy.extensions.logstats] INFO: Crawled 464 pages (at 3500 pages/min), scraped 454 items (at 3440 items/min) | |
2018-11-29 10:18:00 [scrapy.extensions.logstats] INFO: Crawled 646 pages (at 3640 pages/min), scraped 618 items (at 3280 items/min) | |
2018-11-29 10:18:03 [scrapy.extensions.logstats] INFO: Crawled 822 pages (at 3520 pages/min), scraped 791 items (at 3460 items/min) | |
2018-11-29 10:18:07 [scrapy.extensions.logstats] INFO: Crawled 963 pages (at 2820 pages/min), scraped 963 items (at 3440 items/min) | |
2018-11-29 10:18:07 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:18:08 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv | |
The average speed of the spider is 53.5703756542 items/sec | |
2018-11-29 10:18:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362729, | |
'downloader/request_count': 1044, | |
'downloader/request_method_count/GET': 1044, | |
'downloader/response_bytes': 22995909, | |
'downloader/response_count': 1044, | |
'downloader/response_status_count/200': 1044, | |
'dupefilter/filtered': 14757, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 18, 8, 455063), | |
'item_scraped_count': 1044, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 18, | |
'memusage/max': 46657536, | |
'memusage/startup': 46657536, | |
'request_depth_max': 10, | |
'response_received_count': 1044, | |
'scheduler/dequeued': 1044, | |
'scheduler/dequeued/memory': 1044, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 17, 48, 946302)} | |
2018-11-29 10:18:08 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:18:09 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:18:09 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:18:09 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-11-29 10:18:09 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:18:09 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:18:09 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:18:09 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:18:09 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:18:09 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:18:09 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:18:12 [scrapy.extensions.logstats] INFO: Crawled 126 pages (at 2520 pages/min), scraped 103 items (at 2060 items/min) | |
2018-11-29 10:18:15 [scrapy.extensions.logstats] INFO: Crawled 289 pages (at 3260 pages/min), scraped 281 items (at 3560 items/min) | |
2018-11-29 10:18:18 [scrapy.extensions.logstats] INFO: Crawled 470 pages (at 3620 pages/min), scraped 445 items (at 3280 items/min) | |
2018-11-29 10:18:21 [scrapy.extensions.logstats] INFO: Crawled 635 pages (at 3300 pages/min), scraped 624 items (at 3580 items/min) | |
2018-11-29 10:18:24 [scrapy.extensions.logstats] INFO: Crawled 797 pages (at 3240 pages/min), scraped 790 items (at 3320 items/min) | |
2018-11-29 10:18:27 [scrapy.extensions.logstats] INFO: Crawled 975 pages (at 3560 pages/min), scraped 962 items (at 3440 items/min) | |
2018-11-29 10:18:27 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:18:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (1043 items) in: items.csv | |
The average speed of the spider is 53.8518536026 items/sec | |
2018-11-29 10:18:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362230, | |
'downloader/request_count': 1043, | |
'downloader/request_method_count/GET': 1043, | |
'downloader/response_bytes': 22941621, | |
'downloader/response_count': 1043, | |
'downloader/response_status_count/200': 1043, | |
'dupefilter/filtered': 14684, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 18, 28, 483271), | |
'item_scraped_count': 1043, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 19, | |
'memusage/max': 46653440, | |
'memusage/startup': 46653440, | |
'request_depth_max': 10, | |
'response_received_count': 1043, | |
'scheduler/dequeued': 1043, | |
'scheduler/dequeued/memory': 1043, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 18, 9, 113817)} | |
2018-11-29 10:18:28 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:18:29 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:18:29 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:18:29 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-11-29 10:18:29 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:18:29 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:18:29 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:18:29 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:18:29 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:18:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:18:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:18:32 [scrapy.extensions.logstats] INFO: Crawled 118 pages (at 2360 pages/min), scraped 104 items (at 2080 items/min) | |
2018-11-29 10:18:35 [scrapy.extensions.logstats] INFO: Crawled 307 pages (at 3780 pages/min), scraped 287 items (at 3660 items/min) | |
2018-11-29 10:18:38 [scrapy.extensions.logstats] INFO: Crawled 464 pages (at 3140 pages/min), scraped 462 items (at 3500 items/min) | |
2018-11-29 10:18:41 [scrapy.extensions.logstats] INFO: Crawled 648 pages (at 3680 pages/min), scraped 641 items (at 3580 items/min) | |
2018-11-29 10:18:44 [scrapy.extensions.logstats] INFO: Crawled 827 pages (at 3580 pages/min), scraped 817 items (at 3520 items/min) | |
2018-11-29 10:18:47 [scrapy.extensions.logstats] INFO: Crawled 1014 pages (at 3740 pages/min), scraped 989 items (at 3440 items/min) | |
2018-11-29 10:18:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:18:48 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv | |
The average speed of the spider is 55.2377428746 items/sec | |
2018-11-29 10:18:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362630, | |
'downloader/request_count': 1044, | |
'downloader/request_method_count/GET': 1044, | |
'downloader/response_bytes': 22995909, | |
'downloader/response_count': 1044, | |
'downloader/response_status_count/200': 1044, | |
'dupefilter/filtered': 14757, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 18, 48, 44454), | |
'item_scraped_count': 1044, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 18, | |
'memusage/max': 46661632, | |
'memusage/startup': 46661632, | |
'request_depth_max': 10, | |
'response_received_count': 1044, | |
'scheduler/dequeued': 1044, | |
'scheduler/dequeued/memory': 1044, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 18, 29, 121805)} | |
2018-11-29 10:18:48 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 54.4092734877 Median : 54.3871510199 Std Dev : 0.640216053568 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/ | |
2018-11-29 09:51:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:51:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:51:33 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 09:51:33 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 09:51:33 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:51:33 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:51:33 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:51:33 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:51:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:51:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:51:36 [scrapy.extensions.logstats] INFO: Crawled 130 pages (at 2600 pages/min), scraped 111 items (at 2220 items/min) | |
2018-11-29 09:51:39 [scrapy.extensions.logstats] INFO: Crawled 315 pages (at 3700 pages/min), scraped 298 items (at 3740 items/min) | |
2018-11-29 09:51:42 [scrapy.extensions.logstats] INFO: Crawled 487 pages (at 3440 pages/min), scraped 467 items (at 3380 items/min) | |
2018-11-29 09:51:45 [scrapy.extensions.logstats] INFO: Crawled 668 pages (at 3620 pages/min), scraped 639 items (at 3440 items/min) | |
2018-11-29 09:51:48 [scrapy.extensions.logstats] INFO: Crawled 843 pages (at 3500 pages/min), scraped 816 items (at 3540 items/min) | |
2018-11-29 09:51:51 [scrapy.extensions.logstats] INFO: Crawled 1007 pages (at 3280 pages/min), scraped 982 items (at 3320 items/min) | |
2018-11-29 09:51:52 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:51:53 [scrapy.extensions.feedexport] INFO: Stored csv feed (1089 items) in: items.csv | |
The average speed of the spider is 55.14306299 items/sec | |
2018-11-29 09:51:53 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 378764, | |
'downloader/request_count': 1089, | |
'downloader/request_method_count/GET': 1089, | |
'downloader/response_bytes': 23896282, | |
'downloader/response_count': 1089, | |
'downloader/response_status_count/200': 1089, | |
'dupefilter/filtered': 15230, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 51, 53, 523689), | |
'item_scraped_count': 1089, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 27, | |
'memusage/max': 46616576, | |
'memusage/startup': 46616576, | |
'request_depth_max': 10, | |
'response_received_count': 1089, | |
'scheduler/dequeued': 1089, | |
'scheduler/dequeued/memory': 1089, | |
'scheduler/enqueued': 1115, | |
'scheduler/enqueued/memory': 1115, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 51, 33, 770465)} | |
2018-11-29 09:51:53 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:51:54 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:51:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:51:54 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 09:51:54 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 09:51:54 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:51:54 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:51:54 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:51:54 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:51:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:51:54 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:51:57 [scrapy.extensions.logstats] INFO: Crawled 113 pages (at 2260 pages/min), scraped 106 items (at 2120 items/min) | |
2018-11-29 09:52:00 [scrapy.extensions.logstats] INFO: Crawled 293 pages (at 3600 pages/min), scraped 289 items (at 3660 items/min) | |
2018-11-29 09:52:03 [scrapy.extensions.logstats] INFO: Crawled 466 pages (at 3460 pages/min), scraped 452 items (at 3260 items/min) | |
2018-11-29 09:52:06 [scrapy.extensions.logstats] INFO: Crawled 633 pages (at 3340 pages/min), scraped 603 items (at 3020 items/min) | |
2018-11-29 09:52:09 [scrapy.extensions.logstats] INFO: Crawled 792 pages (at 3180 pages/min), scraped 785 items (at 3640 items/min) | |
2018-11-29 09:52:12 [scrapy.extensions.logstats] INFO: Crawled 958 pages (at 3320 pages/min), scraped 956 items (at 3420 items/min) | |
2018-11-29 09:52:12 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:52:13 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv | |
The average speed of the spider is 53.5057043813 items/sec | |
2018-11-29 09:52:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 367329, | |
'downloader/request_count': 1059, | |
'downloader/request_method_count/GET': 1059, | |
'downloader/response_bytes': 23312675, | |
'downloader/response_count': 1059, | |
'downloader/response_status_count/200': 1059, | |
'dupefilter/filtered': 14936, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 52, 13, 969656), | |
'item_scraped_count': 1059, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 46620672, | |
'memusage/startup': 46620672, | |
'request_depth_max': 10, | |
'response_received_count': 1059, | |
'scheduler/dequeued': 1059, | |
'scheduler/dequeued/memory': 1059, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 51, 54, 166172)} | |
2018-11-29 09:52:13 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:52:14 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:52:14 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:52:14 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 09:52:14 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 09:52:14 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:52:14 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:52:14 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:52:14 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:52:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:52:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:52:17 [scrapy.extensions.logstats] INFO: Crawled 115 pages (at 2300 pages/min), scraped 108 items (at 2160 items/min) | |
2018-11-29 09:52:20 [scrapy.extensions.logstats] INFO: Crawled 306 pages (at 3820 pages/min), scraped 287 items (at 3580 items/min) | |
2018-11-29 09:52:23 [scrapy.extensions.logstats] INFO: Crawled 452 pages (at 2920 pages/min), scraped 452 items (at 3300 items/min) | |
2018-11-29 09:52:26 [scrapy.extensions.logstats] INFO: Crawled 646 pages (at 3880 pages/min), scraped 615 items (at 3260 items/min) | |
2018-11-29 09:52:29 [scrapy.extensions.logstats] INFO: Crawled 801 pages (at 3100 pages/min), scraped 799 items (at 3680 items/min) | |
2018-11-29 09:52:32 [scrapy.extensions.logstats] INFO: Crawled 972 pages (at 3420 pages/min), scraped 965 items (at 3320 items/min) | |
2018-11-29 09:52:33 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:52:34 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 53.599917227 items/sec | |
2018-11-29 09:52:34 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363375, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 52, 34, 214603), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46620672, | |
'memusage/startup': 46620672, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 52, 14, 622975)} | |
2018-11-29 09:52:34 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:52:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:52:34 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:52:34 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 09:52:34 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 09:52:34 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:52:34 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:52:34 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:52:34 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:52:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:52:34 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:52:37 [scrapy.extensions.logstats] INFO: Crawled 122 pages (at 2440 pages/min), scraped 110 items (at 2200 items/min) | |
2018-11-29 09:52:40 [scrapy.extensions.logstats] INFO: Crawled 308 pages (at 3720 pages/min), scraped 292 items (at 3640 items/min) | |
2018-11-29 09:52:43 [scrapy.extensions.logstats] INFO: Crawled 474 pages (at 3320 pages/min), scraped 460 items (at 3360 items/min) | |
2018-11-29 09:52:46 [scrapy.extensions.logstats] INFO: Crawled 651 pages (at 3540 pages/min), scraped 643 items (at 3660 items/min) | |
2018-11-29 09:52:49 [scrapy.extensions.logstats] INFO: Crawled 817 pages (at 3320 pages/min), scraped 811 items (at 3360 items/min) | |
2018-11-29 09:52:52 [scrapy.extensions.logstats] INFO: Crawled 1018 pages (at 4020 pages/min), scraped 991 items (at 3600 items/min) | |
2018-11-29 09:52:53 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:52:53 [scrapy.extensions.feedexport] INFO: Stored csv feed (1028 items) in: items.csv | |
The average speed of the spider is 55.1098586806 items/sec | |
2018-11-29 09:52:53 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 356943, | |
'downloader/request_count': 1028, | |
'downloader/request_method_count/GET': 1028, | |
'downloader/response_bytes': 22689867, | |
'downloader/response_count': 1028, | |
'downloader/response_status_count/200': 1028, | |
'dupefilter/filtered': 14613, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 52, 53, 510264), | |
'item_scraped_count': 1028, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46624768, | |
'memusage/startup': 46624768, | |
'request_depth_max': 10, | |
'response_received_count': 1028, | |
'scheduler/dequeued': 1028, | |
'scheduler/dequeued/memory': 1028, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 52, 34, 846166)} | |
2018-11-29 09:52:53 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:52:54 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:52:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:52:54 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 09:52:54 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 09:52:54 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:52:54 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:52:54 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:52:54 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:52:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:52:54 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:52:57 [scrapy.extensions.logstats] INFO: Crawled 115 pages (at 2300 pages/min), scraped 106 items (at 2120 items/min) | |
2018-11-29 09:53:00 [scrapy.extensions.logstats] INFO: Crawled 300 pages (at 3700 pages/min), scraped 281 items (at 3500 items/min) | |
2018-11-29 09:53:03 [scrapy.extensions.logstats] INFO: Crawled 460 pages (at 3200 pages/min), scraped 454 items (at 3460 items/min) | |
2018-11-29 09:53:06 [scrapy.extensions.logstats] INFO: Crawled 640 pages (at 3600 pages/min), scraped 620 items (at 3320 items/min) | |
2018-11-29 09:53:09 [scrapy.extensions.logstats] INFO: Crawled 823 pages (at 3660 pages/min), scraped 787 items (at 3340 items/min) | |
2018-11-29 09:53:12 [scrapy.extensions.logstats] INFO: Crawled 976 pages (at 3060 pages/min), scraped 970 items (at 3660 items/min) | |
2018-11-29 09:53:12 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:53:13 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv | |
The average speed of the spider is 54.0053977703 items/sec | |
2018-11-29 09:53:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 367314, | |
'downloader/request_count': 1058, | |
'downloader/request_method_count/GET': 1058, | |
'downloader/response_bytes': 23258387, | |
'downloader/response_count': 1058, | |
'downloader/response_status_count/200': 1058, | |
'dupefilter/filtered': 14863, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 53, 13, 733749), | |
'item_scraped_count': 1058, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 25, | |
'memusage/max': 46620672, | |
'memusage/startup': 46620672, | |
'request_depth_max': 10, | |
'response_received_count': 1058, | |
'scheduler/dequeued': 1058, | |
'scheduler/dequeued/memory': 1058, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 52, 54, 162092)} | |
2018-11-29 09:53:13 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 54.2727882098 Median : 54.0053977703 Std Dev : 0.717040552928 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/ | |
2018-11-29 10:09:52 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:09:52 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:09:52 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'} | |
2018-11-29 10:09:52 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:09:52 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:09:52 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:09:52 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:09:52 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:09:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:09:52 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:09:55 [scrapy.extensions.logstats] INFO: Crawled 123 pages (at 2460 pages/min), scraped 104 items (at 2080 items/min) | |
2018-11-29 10:09:58 [scrapy.extensions.logstats] INFO: Crawled 316 pages (at 3860 pages/min), scraped 287 items (at 3660 items/min) | |
2018-11-29 10:10:01 [scrapy.extensions.logstats] INFO: Crawled 467 pages (at 3020 pages/min), scraped 456 items (at 3380 items/min) | |
2018-11-29 10:10:04 [scrapy.extensions.logstats] INFO: Crawled 630 pages (at 3260 pages/min), scraped 630 items (at 3480 items/min) | |
2018-11-29 10:10:07 [scrapy.extensions.logstats] INFO: Crawled 822 pages (at 3840 pages/min), scraped 803 items (at 3460 items/min) | |
2018-11-29 10:10:10 [scrapy.extensions.logstats] INFO: Crawled 989 pages (at 3340 pages/min), scraped 972 items (at 3380 items/min) | |
2018-11-29 10:10:10 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:10:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 54.0142018659 items/sec | |
2018-11-29 10:10:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363934, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 10, 11, 505091), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46661632, | |
'memusage/startup': 46661632, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 9, 52, 76579)} | |
2018-11-29 10:10:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:10:12 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:10:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:10:12 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'} | |
2018-11-29 10:10:12 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:10:12 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:10:12 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:10:12 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:10:12 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:10:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:10:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:10:15 [scrapy.extensions.logstats] INFO: Crawled 123 pages (at 2460 pages/min), scraped 101 items (at 2020 items/min) | |
2018-11-29 10:10:18 [scrapy.extensions.logstats] INFO: Crawled 298 pages (at 3500 pages/min), scraped 290 items (at 3780 items/min) | |
2018-11-29 10:10:21 [scrapy.extensions.logstats] INFO: Crawled 455 pages (at 3140 pages/min), scraped 447 items (at 3140 items/min) | |
2018-11-29 10:10:24 [scrapy.extensions.logstats] INFO: Crawled 641 pages (at 3720 pages/min), scraped 627 items (at 3600 items/min) | |
2018-11-29 10:10:27 [scrapy.extensions.logstats] INFO: Crawled 826 pages (at 3700 pages/min), scraped 802 items (at 3500 items/min) | |
2018-11-29 10:10:30 [scrapy.extensions.logstats] INFO: Crawled 1000 pages (at 3480 pages/min), scraped 975 items (at 3460 items/min) | |
2018-11-29 10:10:30 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:10:31 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 54.6088343637 items/sec | |
2018-11-29 10:10:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363475, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 10, 31, 374879), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46657536, | |
'memusage/startup': 46657536, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 10, 12, 160928)} | |
2018-11-29 10:10:31 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:10:31 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:10:31 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:10:31 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'} | |
2018-11-29 10:10:31 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:10:32 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:10:32 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:10:32 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:10:32 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:10:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:10:32 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:10:35 [scrapy.extensions.logstats] INFO: Crawled 117 pages (at 2340 pages/min), scraped 103 items (at 2060 items/min) | |
2018-11-29 10:10:38 [scrapy.extensions.logstats] INFO: Crawled 307 pages (at 3800 pages/min), scraped 281 items (at 3560 items/min) | |
2018-11-29 10:10:41 [scrapy.extensions.logstats] INFO: Crawled 480 pages (at 3460 pages/min), scraped 454 items (at 3460 items/min) | |
2018-11-29 10:10:44 [scrapy.extensions.logstats] INFO: Crawled 646 pages (at 3320 pages/min), scraped 626 items (at 3440 items/min) | |
2018-11-29 10:10:47 [scrapy.extensions.logstats] INFO: Crawled 805 pages (at 3180 pages/min), scraped 798 items (at 3440 items/min) | |
2018-11-29 10:10:50 [scrapy.extensions.logstats] INFO: Crawled 1013 pages (at 4160 pages/min), scraped 975 items (at 3540 items/min) | |
2018-11-29 10:10:50 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:10:51 [scrapy.extensions.feedexport] INFO: Stored csv feed (1046 items) in: items.csv | |
The average speed of the spider is 54.728090246 items/sec | |
2018-11-29 10:10:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362920, | |
'downloader/request_count': 1046, | |
'downloader/request_method_count/GET': 1046, | |
'downloader/response_bytes': 22962086, | |
'downloader/response_count': 1046, | |
'downloader/response_status_count/200': 1046, | |
'dupefilter/filtered': 14627, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 10, 51, 145362), | |
'item_scraped_count': 1046, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 35, | |
'memusage/max': 46657536, | |
'memusage/startup': 46657536, | |
'request_depth_max': 9, | |
'response_received_count': 1046, | |
'scheduler/dequeued': 1046, | |
'scheduler/dequeued/memory': 1046, | |
'scheduler/enqueued': 1080, | |
'scheduler/enqueued/memory': 1080, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 10, 32, 21111)} | |
2018-11-29 10:10:51 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:10:51 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:10:51 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:10:51 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'} | |
2018-11-29 10:10:51 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:10:51 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:10:51 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:10:51 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:10:51 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:10:51 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:10:51 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:10:54 [scrapy.extensions.logstats] INFO: Crawled 135 pages (at 2700 pages/min), scraped 104 items (at 2080 items/min) | |
2018-11-29 10:10:57 [scrapy.extensions.logstats] INFO: Crawled 301 pages (at 3320 pages/min), scraped 295 items (at 3820 items/min) | |
2018-11-29 10:11:00 [scrapy.extensions.logstats] INFO: Crawled 477 pages (at 3520 pages/min), scraped 446 items (at 3020 items/min) | |
2018-11-29 10:11:03 [scrapy.extensions.logstats] INFO: Crawled 648 pages (at 3420 pages/min), scraped 631 items (at 3700 items/min) | |
2018-11-29 10:11:06 [scrapy.extensions.logstats] INFO: Crawled 803 pages (at 3100 pages/min), scraped 801 items (at 3400 items/min) | |
2018-11-29 10:11:10 [scrapy.extensions.logstats] INFO: Crawled 996 pages (at 3860 pages/min), scraped 985 items (at 3680 items/min) | |
2018-11-29 10:11:10 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:11:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv | |
The average speed of the spider is 54.5775567775 items/sec | |
2018-11-29 10:11:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 367892, | |
'downloader/request_count': 1058, | |
'downloader/request_method_count/GET': 1058, | |
'downloader/response_bytes': 23261366, | |
'downloader/response_count': 1058, | |
'downloader/response_status_count/200': 1058, | |
'dupefilter/filtered': 14863, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 11, 11, 185001), | |
'item_scraped_count': 1058, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 46657536, | |
'memusage/startup': 46657536, | |
'request_depth_max': 9, | |
'response_received_count': 1058, | |
'scheduler/dequeued': 1058, | |
'scheduler/dequeued/memory': 1058, | |
'scheduler/enqueued': 1081, | |
'scheduler/enqueued/memory': 1081, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 10, 51, 787689)} | |
2018-11-29 10:11:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:11:11 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:11:11 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:11:11 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'} | |
2018-11-29 10:11:11 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:11:11 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:11:11 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:11:11 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:11:11 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:11:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:11:11 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:11:14 [scrapy.extensions.logstats] INFO: Crawled 116 pages (at 2320 pages/min), scraped 108 items (at 2160 items/min) | |
2018-11-29 10:11:17 [scrapy.extensions.logstats] INFO: Crawled 309 pages (at 3860 pages/min), scraped 283 items (at 3500 items/min) | |
2018-11-29 10:11:20 [scrapy.extensions.logstats] INFO: Crawled 456 pages (at 2940 pages/min), scraped 456 items (at 3460 items/min) | |
2018-11-29 10:11:23 [scrapy.extensions.logstats] INFO: Crawled 648 pages (at 3840 pages/min), scraped 635 items (at 3580 items/min) | |
2018-11-29 10:11:27 [scrapy.extensions.logstats] INFO: Crawled 817 pages (at 3380 pages/min), scraped 805 items (at 3400 items/min) | |
2018-11-29 10:11:29 [scrapy.extensions.logstats] INFO: Crawled 986 pages (at 3380 pages/min), scraped 979 items (at 3480 items/min) | |
2018-11-29 10:11:30 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:11:31 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv | |
The average speed of the spider is 54.2897166857 items/sec | |
2018-11-29 10:11:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 367769, | |
'downloader/request_count': 1059, | |
'downloader/request_method_count/GET': 1059, | |
'downloader/response_bytes': 23312675, | |
'downloader/response_count': 1059, | |
'downloader/response_status_count/200': 1059, | |
'dupefilter/filtered': 14936, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 11, 31, 244085), | |
'item_scraped_count': 1059, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 46657536, | |
'memusage/startup': 46657536, | |
'request_depth_max': 10, | |
'response_received_count': 1059, | |
'scheduler/dequeued': 1059, | |
'scheduler/dequeued/memory': 1059, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 11, 11, 823783)} | |
2018-11-29 10:11:31 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 54.4436799878 Median : 54.5775567775 Std Dev : 0.258551470022 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/ | |
2018-11-29 10:20:13 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:20:13 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:20:13 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:20:13 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:20:13 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:20:13 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:20:13 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:20:13 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:20:13 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:20:16 [scrapy.extensions.logstats] INFO: Crawled 117 pages (at 2340 pages/min), scraped 106 items (at 2120 items/min) | |
2018-11-29 10:20:19 [scrapy.extensions.logstats] INFO: Crawled 292 pages (at 3500 pages/min), scraped 284 items (at 3560 items/min) | |
2018-11-29 10:20:22 [scrapy.extensions.logstats] INFO: Crawled 477 pages (at 3700 pages/min), scraped 441 items (at 3140 items/min) | |
2018-11-29 10:20:25 [scrapy.extensions.logstats] INFO: Crawled 629 pages (at 3040 pages/min), scraped 623 items (at 3640 items/min) | |
2018-11-29 10:20:28 [scrapy.extensions.logstats] INFO: Crawled 816 pages (at 3740 pages/min), scraped 800 items (at 3540 items/min) | |
2018-11-29 10:20:31 [scrapy.extensions.logstats] INFO: Crawled 998 pages (at 3640 pages/min), scraped 980 items (at 3600 items/min) | |
2018-11-29 10:20:31 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:20:32 [scrapy.extensions.feedexport] INFO: Stored csv feed (1028 items) in: items.csv | |
The average speed of the spider is 54.0763760286 items/sec | |
2018-11-29 10:20:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 356434, | |
'downloader/request_count': 1028, | |
'downloader/request_method_count/GET': 1028, | |
'downloader/response_bytes': 22689867, | |
'downloader/response_count': 1028, | |
'downloader/response_status_count/200': 1028, | |
'dupefilter/filtered': 14613, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 20, 32, 40008), | |
'item_scraped_count': 1028, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46452736, | |
'memusage/startup': 46452736, | |
'request_depth_max': 10, | |
'response_received_count': 1028, | |
'scheduler/dequeued': 1028, | |
'scheduler/dequeued/memory': 1028, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 20, 13, 120222)} | |
2018-11-29 10:20:32 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:20:32 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:20:32 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:20:32 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:20:32 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:20:32 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:20:32 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:20:32 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:20:32 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:20:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:20:35 [scrapy.extensions.logstats] INFO: Crawled 123 pages (at 2460 pages/min), scraped 103 items (at 2060 items/min) | |
2018-11-29 10:20:38 [scrapy.extensions.logstats] INFO: Crawled 290 pages (at 3340 pages/min), scraped 288 items (at 3700 items/min) | |
2018-11-29 10:20:41 [scrapy.extensions.logstats] INFO: Crawled 470 pages (at 3600 pages/min), scraped 457 items (at 3380 items/min) | |
2018-11-29 10:20:44 [scrapy.extensions.logstats] INFO: Crawled 658 pages (at 3760 pages/min), scraped 626 items (at 3380 items/min) | |
2018-11-29 10:20:47 [scrapy.extensions.logstats] INFO: Crawled 826 pages (at 3360 pages/min), scraped 811 items (at 3700 items/min) | |
2018-11-29 10:20:50 [scrapy.extensions.logstats] INFO: Crawled 994 pages (at 3360 pages/min), scraped 987 items (at 3520 items/min) | |
2018-11-29 10:20:51 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:20:51 [scrapy.extensions.feedexport] INFO: Stored csv feed (1052 items) in: items.csv | |
The average speed of the spider is 54.9769549264 items/sec | |
2018-11-29 10:20:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 365067, | |
'downloader/request_count': 1052, | |
'downloader/request_method_count/GET': 1052, | |
'downloader/response_bytes': 23177070, | |
'downloader/response_count': 1052, | |
'downloader/response_status_count/200': 1052, | |
'dupefilter/filtered': 14873, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 20, 51, 778587), | |
'item_scraped_count': 1052, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 31, | |
'memusage/max': 46456832, | |
'memusage/startup': 46456832, | |
'request_depth_max': 10, | |
'response_received_count': 1052, | |
'scheduler/dequeued': 1052, | |
'scheduler/dequeued/memory': 1052, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 20, 32, 620265)} | |
2018-11-29 10:20:51 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:20:52 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:20:52 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:20:52 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:20:52 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:20:52 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:20:52 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:20:52 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:20:52 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:20:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:20:55 [scrapy.extensions.logstats] INFO: Crawled 121 pages (at 2420 pages/min), scraped 107 items (at 2140 items/min) | |
2018-11-29 10:20:58 [scrapy.extensions.logstats] INFO: Crawled 300 pages (at 3580 pages/min), scraped 287 items (at 3600 items/min) | |
2018-11-29 10:21:01 [scrapy.extensions.logstats] INFO: Crawled 461 pages (at 3220 pages/min), scraped 455 items (at 3360 items/min) | |
2018-11-29 10:21:04 [scrapy.extensions.logstats] INFO: Crawled 634 pages (at 3460 pages/min), scraped 626 items (at 3420 items/min) | |
2018-11-29 10:21:07 [scrapy.extensions.logstats] INFO: Crawled 801 pages (at 3340 pages/min), scraped 788 items (at 3240 items/min) | |
2018-11-29 10:21:10 [scrapy.extensions.logstats] INFO: Crawled 994 pages (at 3860 pages/min), scraped 960 items (at 3440 items/min) | |
2018-11-29 10:21:11 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:21:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 53.7050977679 items/sec | |
2018-11-29 10:21:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 364014, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 21, 11, 885192), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46456832, | |
'memusage/startup': 46456832, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 20, 52, 342563)} | |
2018-11-29 10:21:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:21:12 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:21:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:21:12 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:21:12 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:21:12 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:21:12 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:21:12 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:21:12 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:21:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:21:15 [scrapy.extensions.logstats] INFO: Crawled 108 pages (at 2160 pages/min), scraped 106 items (at 2120 items/min) | |
2018-11-29 10:21:18 [scrapy.extensions.logstats] INFO: Crawled 293 pages (at 3700 pages/min), scraped 285 items (at 3580 items/min) | |
2018-11-29 10:21:21 [scrapy.extensions.logstats] INFO: Crawled 470 pages (at 3540 pages/min), scraped 451 items (at 3320 items/min) | |
2018-11-29 10:21:24 [scrapy.extensions.logstats] INFO: Crawled 630 pages (at 3200 pages/min), scraped 624 items (at 3460 items/min) | |
2018-11-29 10:21:27 [scrapy.extensions.logstats] INFO: Crawled 808 pages (at 3560 pages/min), scraped 797 items (at 3460 items/min) | |
2018-11-29 10:21:30 [scrapy.extensions.logstats] INFO: Crawled 981 pages (at 3460 pages/min), scraped 968 items (at 3420 items/min) | |
2018-11-29 10:21:31 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:21:31 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv | |
The average speed of the spider is 53.8195090969 items/sec | |
2018-11-29 10:21:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362732, | |
'downloader/request_count': 1044, | |
'downloader/request_method_count/GET': 1044, | |
'downloader/response_bytes': 22995909, | |
'downloader/response_count': 1044, | |
'downloader/response_status_count/200': 1044, | |
'dupefilter/filtered': 14757, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 21, 31, 862245), | |
'item_scraped_count': 1044, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 18, | |
'memusage/max': 46452736, | |
'memusage/startup': 46452736, | |
'request_depth_max': 10, | |
'response_received_count': 1044, | |
'scheduler/dequeued': 1044, | |
'scheduler/dequeued/memory': 1044, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 21, 12, 461530)} | |
2018-11-29 10:21:31 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:21:32 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:21:32 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:21:32 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:21:32 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:21:32 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:21:32 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:21:32 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:21:32 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:21:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:21:35 [scrapy.extensions.logstats] INFO: Crawled 130 pages (at 2600 pages/min), scraped 103 items (at 2060 items/min) | |
2018-11-29 10:21:38 [scrapy.extensions.logstats] INFO: Crawled 291 pages (at 3220 pages/min), scraped 290 items (at 3740 items/min) | |
2018-11-29 10:21:41 [scrapy.extensions.logstats] INFO: Crawled 480 pages (at 3780 pages/min), scraped 452 items (at 3240 items/min) | |
2018-11-29 10:21:44 [scrapy.extensions.logstats] INFO: Crawled 645 pages (at 3300 pages/min), scraped 635 items (at 3660 items/min) | |
2018-11-29 10:21:47 [scrapy.extensions.logstats] INFO: Crawled 813 pages (at 3360 pages/min), scraped 811 items (at 3520 items/min) | |
2018-11-29 10:21:50 [scrapy.extensions.logstats] INFO: Crawled 998 pages (at 3700 pages/min), scraped 990 items (at 3580 items/min) | |
2018-11-29 10:21:50 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:21:51 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv | |
The average speed of the spider is 54.5206981948 items/sec | |
2018-11-29 10:21:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362562, | |
'downloader/request_count': 1044, | |
'downloader/request_method_count/GET': 1044, | |
'downloader/response_bytes': 22995909, | |
'downloader/response_count': 1044, | |
'downloader/response_status_count/200': 1044, | |
'dupefilter/filtered': 14757, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 21, 51, 540612), | |
'item_scraped_count': 1044, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 18, | |
'memusage/max': 46452736, | |
'memusage/startup': 46452736, | |
'request_depth_max': 10, | |
'response_received_count': 1044, | |
'scheduler/dequeued': 1044, | |
'scheduler/dequeued/memory': 1044, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 21, 32, 413729)} | |
2018-11-29 10:21:51 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 54.2197272029 Median : 54.0763760286 Std Dev : 0.471072162224 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/ | |
2018-11-29 10:26:33 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:26:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:26:33 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:26:33 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:26:33 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:26:33 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:26:33 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:26:33 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:26:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:26:36 [scrapy.extensions.logstats] INFO: Crawled 112 pages (at 2240 pages/min), scraped 96 items (at 1920 items/min) | |
2018-11-29 10:26:39 [scrapy.extensions.logstats] INFO: Crawled 289 pages (at 3540 pages/min), scraped 273 items (at 3540 items/min) | |
2018-11-29 10:26:42 [scrapy.extensions.logstats] INFO: Crawled 442 pages (at 3060 pages/min), scraped 427 items (at 3080 items/min) | |
2018-11-29 10:26:45 [scrapy.extensions.logstats] INFO: Crawled 609 pages (at 3340 pages/min), scraped 594 items (at 3340 items/min) | |
2018-11-29 10:26:48 [scrapy.extensions.logstats] INFO: Crawled 773 pages (at 3280 pages/min), scraped 758 items (at 3280 items/min) | |
2018-11-29 10:26:51 [scrapy.extensions.logstats] INFO: Crawled 933 pages (at 3200 pages/min), scraped 920 items (at 3240 items/min) | |
2018-11-29 10:26:53 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:26:54 [scrapy.extensions.feedexport] INFO: Stored csv feed (1070 items) in: items.csv | |
The average speed of the spider is 51.9762194939 items/sec | |
2018-11-29 10:26:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 371166, | |
'downloader/request_count': 1070, | |
'downloader/request_method_count/GET': 1070, | |
'downloader/response_bytes': 4169684, | |
'downloader/response_count': 1070, | |
'downloader/response_status_count/200': 1070, | |
'dupefilter/filtered': 15079, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 26, 54, 312540), | |
'item_scraped_count': 1070, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46456832, | |
'memusage/startup': 46456832, | |
'request_depth_max': 10, | |
'response_received_count': 1070, | |
'scheduler/dequeued': 1070, | |
'scheduler/dequeued/memory': 1070, | |
'scheduler/enqueued': 1103, | |
'scheduler/enqueued/memory': 1103, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 26, 33, 708329)} | |
2018-11-29 10:26:54 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:26:54 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:26:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:26:54 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:26:54 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:26:54 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:26:54 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:26:54 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:26:54 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:26:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:26:57 [scrapy.extensions.logstats] INFO: Crawled 109 pages (at 2180 pages/min), scraped 90 items (at 1800 items/min) | |
2018-11-29 10:27:00 [scrapy.extensions.logstats] INFO: Crawled 277 pages (at 3360 pages/min), scraped 262 items (at 3440 items/min) | |
2018-11-29 10:27:03 [scrapy.extensions.logstats] INFO: Crawled 433 pages (at 3120 pages/min), scraped 421 items (at 3180 items/min) | |
2018-11-29 10:27:06 [scrapy.extensions.logstats] INFO: Crawled 601 pages (at 3360 pages/min), scraped 585 items (at 3280 items/min) | |
2018-11-29 10:27:09 [scrapy.extensions.logstats] INFO: Crawled 765 pages (at 3280 pages/min), scraped 750 items (at 3300 items/min) | |
2018-11-29 10:27:12 [scrapy.extensions.logstats] INFO: Crawled 929 pages (at 3280 pages/min), scraped 915 items (at 3300 items/min) | |
2018-11-29 10:27:14 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:27:15 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv | |
The average speed of the spider is 51.4176928926 items/sec | |
2018-11-29 10:27:15 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362970, | |
'downloader/request_count': 1044, | |
'downloader/request_method_count/GET': 1044, | |
'downloader/response_bytes': 4072056, | |
'downloader/response_count': 1044, | |
'downloader/response_status_count/200': 1044, | |
'dupefilter/filtered': 14757, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 27, 15, 102491), | |
'item_scraped_count': 1044, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 18, | |
'memusage/max': 46456832, | |
'memusage/startup': 46456832, | |
'request_depth_max': 10, | |
'response_received_count': 1044, | |
'scheduler/dequeued': 1044, | |
'scheduler/dequeued/memory': 1044, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 26, 54, 879857)} | |
2018-11-29 10:27:15 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:27:15 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:27:15 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:27:15 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:27:15 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:27:15 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:27:15 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:27:15 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:27:15 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:27:15 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:27:18 [scrapy.extensions.logstats] INFO: Crawled 114 pages (at 2280 pages/min), scraped 98 items (at 1960 items/min) | |
2018-11-29 10:27:21 [scrapy.extensions.logstats] INFO: Crawled 282 pages (at 3360 pages/min), scraped 273 items (at 3500 items/min) | |
2018-11-29 10:27:24 [scrapy.extensions.logstats] INFO: Crawled 442 pages (at 3200 pages/min), scraped 433 items (at 3200 items/min) | |
2018-11-29 10:27:27 [scrapy.extensions.logstats] INFO: Crawled 610 pages (at 3360 pages/min), scraped 595 items (at 3240 items/min) | |
2018-11-29 10:27:30 [scrapy.extensions.logstats] INFO: Crawled 771 pages (at 3220 pages/min), scraped 760 items (at 3300 items/min) | |
2018-11-29 10:27:33 [scrapy.extensions.logstats] INFO: Crawled 938 pages (at 3340 pages/min), scraped 930 items (at 3400 items/min) | |
2018-11-29 10:27:34 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:27:35 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 52.0539729896 items/sec | |
2018-11-29 10:27:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 364301, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 4091076, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 27, 35, 768699), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46456832, | |
'memusage/startup': 46456832, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 27, 15, 658557)} | |
2018-11-29 10:27:35 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:27:36 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:27:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:27:36 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:27:36 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:27:36 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:27:36 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:27:36 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:27:36 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:27:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:27:39 [scrapy.extensions.logstats] INFO: Crawled 111 pages (at 2220 pages/min), scraped 98 items (at 1960 items/min) | |
2018-11-29 10:27:42 [scrapy.extensions.logstats] INFO: Crawled 285 pages (at 3480 pages/min), scraped 272 items (at 3480 items/min) | |
2018-11-29 10:27:45 [scrapy.extensions.logstats] INFO: Crawled 445 pages (at 3200 pages/min), scraped 430 items (at 3160 items/min) | |
2018-11-29 10:27:48 [scrapy.extensions.logstats] INFO: Crawled 620 pages (at 3500 pages/min), scraped 604 items (at 3480 items/min) | |
2018-11-29 10:27:51 [scrapy.extensions.logstats] INFO: Crawled 781 pages (at 3220 pages/min), scraped 767 items (at 3260 items/min) | |
2018-11-29 10:27:54 [scrapy.extensions.logstats] INFO: Crawled 949 pages (at 3360 pages/min), scraped 933 items (at 3320 items/min) | |
2018-11-29 10:27:55 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:27:56 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 52.2991827852 items/sec | |
2018-11-29 10:27:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363833, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 4091076, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 27, 56, 342556), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46456832, | |
'memusage/startup': 46456832, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 27, 36, 337224)} | |
2018-11-29 10:27:56 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:27:56 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 10:27:56 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:27:56 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-11-29 10:27:56 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-11-29 10:27:56 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:27:56 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:27:56 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:27:56 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:27:56 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:27:59 [scrapy.extensions.logstats] INFO: Crawled 114 pages (at 2280 pages/min), scraped 98 items (at 1960 items/min) | |
2018-11-29 10:28:02 [scrapy.extensions.logstats] INFO: Crawled 282 pages (at 3360 pages/min), scraped 270 items (at 3440 items/min) | |
2018-11-29 10:28:05 [scrapy.extensions.logstats] INFO: Crawled 442 pages (at 3200 pages/min), scraped 433 items (at 3260 items/min) | |
2018-11-29 10:28:08 [scrapy.extensions.logstats] INFO: Crawled 602 pages (at 3200 pages/min), scraped 594 items (at 3220 items/min) | |
2018-11-29 10:28:11 [scrapy.extensions.logstats] INFO: Crawled 762 pages (at 3200 pages/min), scraped 750 items (at 3120 items/min) | |
2018-11-29 10:28:14 [scrapy.extensions.logstats] INFO: Crawled 926 pages (at 3280 pages/min), scraped 914 items (at 3280 items/min) | |
2018-11-29 10:28:16 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:28:17 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 51.6355868318 items/sec | |
2018-11-29 10:28:17 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363596, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 4091076, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 28, 17, 193057), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 46452736, | |
'memusage/startup': 46452736, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 27, 56, 892126)} | |
2018-11-29 10:28:17 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 51.8765309986 Median : 51.9762194939 Std Dev : 0.312616214555 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-3.6/ | |
2018-11-29 10:14:44 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:14:44 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:14:44 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:14:44 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:14:44 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:14:44 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:14:44 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:14:44 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:14:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:14:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:14:47 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 85 items (at 1700 items/min) | |
2018-11-29 10:14:50 [scrapy.extensions.logstats] INFO: Crawled 262 pages (at 3380 pages/min), scraped 255 items (at 3400 items/min) | |
2018-11-29 10:14:53 [scrapy.extensions.logstats] INFO: Crawled 426 pages (at 3280 pages/min), scraped 424 items (at 3380 items/min) | |
2018-11-29 10:14:56 [scrapy.extensions.logstats] INFO: Crawled 602 pages (at 3520 pages/min), scraped 583 items (at 3180 items/min) | |
2018-11-29 10:14:59 [scrapy.extensions.logstats] INFO: Crawled 761 pages (at 3180 pages/min), scraped 753 items (at 3400 items/min) | |
2018-11-29 10:15:02 [scrapy.extensions.logstats] INFO: Crawled 924 pages (at 3260 pages/min), scraped 911 items (at 3160 items/min) | |
2018-11-29 10:15:03 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:15:04 [scrapy.extensions.feedexport] INFO: Stored csv feed (1048 items) in: items.csv | |
The average speed of the spider is 51.12914776632539 items/sec | |
2018-11-29 10:15:04 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363184, | |
'downloader/request_count': 1048, | |
'downloader/request_method_count/GET': 1048, | |
'downloader/response_bytes': 23064841, | |
'downloader/response_count': 1048, | |
'downloader/response_status_count/200': 1048, | |
'dupefilter/filtered': 14773, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 15, 4, 565306), | |
'item_scraped_count': 1048, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 35, | |
'memusage/max': 45412352, | |
'memusage/startup': 45412352, | |
'request_depth_max': 10, | |
'response_received_count': 1048, | |
'scheduler/dequeued': 1048, | |
'scheduler/dequeued/memory': 1048, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 14, 44, 106279)} | |
2018-11-29 10:15:04 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:15:05 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:15:05 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:15:05 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:15:05 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:15:05 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:15:05 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:15:05 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:15:05 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:15:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:15:05 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:15:08 [scrapy.extensions.logstats] INFO: Crawled 92 pages (at 1840 pages/min), scraped 85 items (at 1700 items/min) | |
2018-11-29 10:15:11 [scrapy.extensions.logstats] INFO: Crawled 262 pages (at 3400 pages/min), scraped 255 items (at 3400 items/min) | |
2018-11-29 10:15:14 [scrapy.extensions.logstats] INFO: Crawled 431 pages (at 3380 pages/min), scraped 424 items (at 3380 items/min) | |
2018-11-29 10:15:17 [scrapy.extensions.logstats] INFO: Crawled 610 pages (at 3580 pages/min), scraped 587 items (at 3260 items/min) | |
2018-11-29 10:15:20 [scrapy.extensions.logstats] INFO: Crawled 768 pages (at 3160 pages/min), scraped 749 items (at 3240 items/min) | |
2018-11-29 10:15:23 [scrapy.extensions.logstats] INFO: Crawled 931 pages (at 3260 pages/min), scraped 922 items (at 3460 items/min) | |
2018-11-29 10:15:25 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:15:25 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 51.56846499863366 items/sec | |
2018-11-29 10:15:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363582, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 15, 25, 786526), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45412352, | |
'memusage/startup': 45412352, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 15, 5, 495634)} | |
2018-11-29 10:15:25 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:15:26 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:15:26 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:15:26 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:15:26 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:15:26 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:15:26 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:15:26 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:15:26 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:15:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:15:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:15:29 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 86 items (at 1720 items/min) | |
2018-11-29 10:15:32 [scrapy.extensions.logstats] INFO: Crawled 274 pages (at 3620 pages/min), scraped 261 items (at 3500 items/min) | |
2018-11-29 10:15:35 [scrapy.extensions.logstats] INFO: Crawled 433 pages (at 3180 pages/min), scraped 419 items (at 3160 items/min) | |
2018-11-29 10:15:38 [scrapy.extensions.logstats] INFO: Crawled 595 pages (at 3240 pages/min), scraped 581 items (at 3240 items/min) | |
2018-11-29 10:15:41 [scrapy.extensions.logstats] INFO: Crawled 756 pages (at 3220 pages/min), scraped 756 items (at 3500 items/min) | |
2018-11-29 10:15:44 [scrapy.extensions.logstats] INFO: Crawled 940 pages (at 3680 pages/min), scraped 928 items (at 3440 items/min) | |
2018-11-29 10:15:46 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:15:46 [scrapy.extensions.feedexport] INFO: Stored csv feed (1028 items) in: items.csv | |
The average speed of the spider is 51.51978070283345 items/sec | |
2018-11-29 10:15:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 356398, | |
'downloader/request_count': 1028, | |
'downloader/request_method_count/GET': 1028, | |
'downloader/response_bytes': 22689867, | |
'downloader/response_count': 1028, | |
'downloader/response_status_count/200': 1028, | |
'dupefilter/filtered': 14613, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 15, 46, 603307), | |
'item_scraped_count': 1028, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45412352, | |
'memusage/startup': 45412352, | |
'request_depth_max': 10, | |
'response_received_count': 1028, | |
'scheduler/dequeued': 1028, | |
'scheduler/dequeued/memory': 1028, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 15, 26, 686372)} | |
2018-11-29 10:15:46 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:15:47 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:15:47 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:15:47 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:15:47 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:15:47 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:15:47 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:15:47 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:15:47 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:15:47 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:15:47 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:15:50 [scrapy.extensions.logstats] INFO: Crawled 88 pages (at 1760 pages/min), scraped 81 items (at 1620 items/min) | |
2018-11-29 10:15:53 [scrapy.extensions.logstats] INFO: Crawled 273 pages (at 3700 pages/min), scraped 265 items (at 3680 items/min) | |
2018-11-29 10:15:56 [scrapy.extensions.logstats] INFO: Crawled 437 pages (at 3280 pages/min), scraped 429 items (at 3280 items/min) | |
2018-11-29 10:15:59 [scrapy.extensions.logstats] INFO: Crawled 598 pages (at 3220 pages/min), scraped 595 items (at 3320 items/min) | |
2018-11-29 10:16:02 [scrapy.extensions.logstats] INFO: Crawled 772 pages (at 3480 pages/min), scraped 756 items (at 3220 items/min) | |
2018-11-29 10:16:05 [scrapy.extensions.logstats] INFO: Crawled 955 pages (at 3660 pages/min), scraped 922 items (at 3320 items/min) | |
2018-11-29 10:16:06 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:16:07 [scrapy.extensions.feedexport] INFO: Stored csv feed (1043 items) in: items.csv | |
The average speed of the spider is 51.87628543449582 items/sec | |
2018-11-29 10:16:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362309, | |
'downloader/request_count': 1043, | |
'downloader/request_method_count/GET': 1043, | |
'downloader/response_bytes': 22941621, | |
'downloader/response_count': 1043, | |
'downloader/response_status_count/200': 1043, | |
'dupefilter/filtered': 14684, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 16, 7, 595425), | |
'item_scraped_count': 1043, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 19, | |
'memusage/max': 45400064, | |
'memusage/startup': 45400064, | |
'request_depth_max': 10, | |
'response_received_count': 1043, | |
'scheduler/dequeued': 1043, | |
'scheduler/dequeued/memory': 1043, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 15, 47, 506996)} | |
2018-11-29 10:16:07 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:16:08 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:16:08 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:16:08 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:16:08 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:16:08 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:16:08 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:16:08 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:16:08 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:16:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:16:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:16:11 [scrapy.extensions.logstats] INFO: Crawled 92 pages (at 1840 pages/min), scraped 92 items (at 1840 items/min) | |
2018-11-29 10:16:14 [scrapy.extensions.logstats] INFO: Crawled 279 pages (at 3740 pages/min), scraped 261 items (at 3380 items/min) | |
2018-11-29 10:16:17 [scrapy.extensions.logstats] INFO: Crawled 446 pages (at 3340 pages/min), scraped 415 items (at 3080 items/min) | |
2018-11-29 10:16:20 [scrapy.extensions.logstats] INFO: Crawled 594 pages (at 2960 pages/min), scraped 586 items (at 3420 items/min) | |
2018-11-29 10:16:23 [scrapy.extensions.logstats] INFO: Crawled 761 pages (at 3340 pages/min), scraped 749 items (at 3260 items/min) | |
2018-11-29 10:16:26 [scrapy.extensions.logstats] INFO: Crawled 948 pages (at 3740 pages/min), scraped 921 items (at 3440 items/min) | |
2018-11-29 10:16:28 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:16:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (1048 items) in: items.csv | |
The average speed of the spider is 51.690047106138145 items/sec | |
2018-11-29 10:16:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363320, | |
'downloader/request_count': 1048, | |
'downloader/request_method_count/GET': 1048, | |
'downloader/response_bytes': 23067820, | |
'downloader/response_count': 1048, | |
'downloader/response_status_count/200': 1048, | |
'dupefilter/filtered': 14773, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 16, 28, 777623), | |
'item_scraped_count': 1048, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45412352, | |
'memusage/startup': 45412352, | |
'request_depth_max': 9, | |
'response_received_count': 1048, | |
'scheduler/dequeued': 1048, | |
'scheduler/dequeued/memory': 1048, | |
'scheduler/enqueued': 1081, | |
'scheduler/enqueued/memory': 1081, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 16, 8, 516604)} | |
2018-11-29 10:16:28 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 51.556745201685295 Median : 51.56846499863366 Std Dev : 0.24666521954163279 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-3.6/ | |
2018-11-29 09:48:53 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:48:53 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:48:53 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:48:53 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:48:53 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:48:53 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:48:53 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:48:53 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:48:53 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:48:53 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:48:57 [scrapy.extensions.logstats] INFO: Crawled 98 pages (at 1960 pages/min), scraped 96 items (at 1920 items/min) | |
2018-11-29 09:49:00 [scrapy.extensions.logstats] INFO: Crawled 280 pages (at 3640 pages/min), scraped 259 items (at 3260 items/min) | |
2018-11-29 09:49:02 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 3360 pages/min), scraped 421 items (at 3240 items/min) | |
2018-11-29 09:49:06 [scrapy.extensions.logstats] INFO: Crawled 613 pages (at 3300 pages/min), scraped 586 items (at 3300 items/min) | |
2018-11-29 09:49:08 [scrapy.extensions.logstats] INFO: Crawled 757 pages (at 2880 pages/min), scraped 755 items (at 3380 items/min) | |
2018-11-29 09:49:12 [scrapy.extensions.logstats] INFO: Crawled 944 pages (at 3740 pages/min), scraped 932 items (at 3540 items/min) | |
2018-11-29 09:49:13 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:49:13 [scrapy.extensions.feedexport] INFO: Stored csv feed (1047 items) in: items.csv | |
The average speed of the spider is 52.103400842770014 items/sec | |
2018-11-29 09:49:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362941, | |
'downloader/request_count': 1047, | |
'downloader/request_method_count/GET': 1047, | |
'downloader/response_bytes': 23016374, | |
'downloader/response_count': 1047, | |
'downloader/response_status_count/200': 1047, | |
'dupefilter/filtered': 14700, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 49, 13, 934364), | |
'item_scraped_count': 1047, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45408256, | |
'memusage/startup': 45408256, | |
'request_depth_max': 9, | |
'response_received_count': 1047, | |
'scheduler/dequeued': 1047, | |
'scheduler/dequeued/memory': 1047, | |
'scheduler/enqueued': 1080, | |
'scheduler/enqueued/memory': 1080, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 48, 53, 845726)} | |
2018-11-29 09:49:13 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:49:14 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:49:14 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:49:14 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:49:14 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:49:14 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:49:14 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:49:14 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:49:14 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:49:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:49:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:49:17 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 85 items (at 1700 items/min) | |
2018-11-29 09:49:20 [scrapy.extensions.logstats] INFO: Crawled 275 pages (at 3640 pages/min), scraped 268 items (at 3660 items/min) | |
2018-11-29 09:49:24 [scrapy.extensions.logstats] INFO: Crawled 446 pages (at 3420 pages/min), scraped 434 items (at 3320 items/min) | |
2018-11-29 09:49:26 [scrapy.extensions.logstats] INFO: Crawled 611 pages (at 3300 pages/min), scraped 596 items (at 3240 items/min) | |
2018-11-29 09:49:29 [scrapy.extensions.logstats] INFO: Crawled 769 pages (at 3160 pages/min), scraped 761 items (at 3300 items/min) | |
2018-11-29 09:49:32 [scrapy.extensions.logstats] INFO: Crawled 952 pages (at 3660 pages/min), scraped 944 items (at 3660 items/min) | |
2018-11-29 09:49:33 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:49:34 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 52.091360755197876 items/sec | |
2018-11-29 09:49:34 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363552, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 49, 34, 941327), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45416448, | |
'memusage/startup': 45416448, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 49, 14, 851541)} | |
2018-11-29 09:49:34 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:49:35 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:49:35 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:49:35 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:49:35 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:49:35 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:49:35 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:49:35 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:49:35 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:49:35 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:49:35 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:49:38 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 85 items (at 1700 items/min) | |
2018-11-29 09:49:41 [scrapy.extensions.logstats] INFO: Crawled 264 pages (at 3420 pages/min), scraped 257 items (at 3440 items/min) | |
2018-11-29 09:49:44 [scrapy.extensions.logstats] INFO: Crawled 429 pages (at 3300 pages/min), scraped 424 items (at 3340 items/min) | |
2018-11-29 09:49:47 [scrapy.extensions.logstats] INFO: Crawled 602 pages (at 3460 pages/min), scraped 583 items (at 3180 items/min) | |
2018-11-29 09:49:51 [scrapy.extensions.logstats] INFO: Crawled 769 pages (at 3340 pages/min), scraped 748 items (at 3300 items/min) | |
2018-11-29 09:49:54 [scrapy.extensions.logstats] INFO: Crawled 929 pages (at 3200 pages/min), scraped 921 items (at 3460 items/min) | |
2018-11-29 09:49:55 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:49:56 [scrapy.extensions.feedexport] INFO: Stored csv feed (1048 items) in: items.csv | |
The average speed of the spider is 51.49821654192627 items/sec | |
2018-11-29 09:49:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363177, | |
'downloader/request_count': 1048, | |
'downloader/request_method_count/GET': 1048, | |
'downloader/response_bytes': 23064841, | |
'downloader/response_count': 1048, | |
'downloader/response_status_count/200': 1048, | |
'dupefilter/filtered': 14773, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 49, 56, 192262), | |
'item_scraped_count': 1048, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 35, | |
'memusage/max': 45408256, | |
'memusage/startup': 45408256, | |
'request_depth_max': 10, | |
'response_received_count': 1048, | |
'scheduler/dequeued': 1048, | |
'scheduler/dequeued/memory': 1048, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 49, 35, 858083)} | |
2018-11-29 09:49:56 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:49:57 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:49:57 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:49:57 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:49:57 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:49:57 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:49:57 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:49:57 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:49:57 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:49:57 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:49:57 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:50:00 [scrapy.extensions.logstats] INFO: Crawled 84 pages (at 1680 pages/min), scraped 84 items (at 1680 items/min) | |
2018-11-29 09:50:03 [scrapy.extensions.logstats] INFO: Crawled 267 pages (at 3660 pages/min), scraped 259 items (at 3500 items/min) | |
2018-11-29 09:50:06 [scrapy.extensions.logstats] INFO: Crawled 428 pages (at 3220 pages/min), scraped 428 items (at 3380 items/min) | |
2018-11-29 09:50:09 [scrapy.extensions.logstats] INFO: Crawled 603 pages (at 3500 pages/min), scraped 582 items (at 3080 items/min) | |
2018-11-29 09:50:12 [scrapy.extensions.logstats] INFO: Crawled 759 pages (at 3120 pages/min), scraped 747 items (at 3300 items/min) | |
2018-11-29 09:50:15 [scrapy.extensions.logstats] INFO: Crawled 948 pages (at 3780 pages/min), scraped 917 items (at 3400 items/min) | |
2018-11-29 09:50:16 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:50:17 [scrapy.extensions.feedexport] INFO: Stored csv feed (1047 items) in: items.csv | |
The average speed of the spider is 51.51026875692216 items/sec | |
2018-11-29 09:50:17 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363350, | |
'downloader/request_count': 1047, | |
'downloader/request_method_count/GET': 1047, | |
'downloader/response_bytes': 23016374, | |
'downloader/response_count': 1047, | |
'downloader/response_status_count/200': 1047, | |
'dupefilter/filtered': 14700, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 50, 17, 441551), | |
'item_scraped_count': 1047, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45404160, | |
'memusage/startup': 45404160, | |
'request_depth_max': 9, | |
'response_received_count': 1047, | |
'scheduler/dequeued': 1047, | |
'scheduler/dequeued/memory': 1047, | |
'scheduler/enqueued': 1080, | |
'scheduler/enqueued/memory': 1080, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 49, 57, 103363)} | |
2018-11-29 09:50:17 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:50:18 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 09:50:18 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:50:18 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:50:18 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:50:18 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:50:18 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:50:18 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:50:18 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:50:18 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:50:18 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 09:50:21 [scrapy.extensions.logstats] INFO: Crawled 98 pages (at 1960 pages/min), scraped 90 items (at 1800 items/min) | |
2018-11-29 09:50:24 [scrapy.extensions.logstats] INFO: Crawled 269 pages (at 3420 pages/min), scraped 261 items (at 3420 items/min) | |
2018-11-29 09:50:27 [scrapy.extensions.logstats] INFO: Crawled 443 pages (at 3480 pages/min), scraped 428 items (at 3340 items/min) | |
2018-11-29 09:50:30 [scrapy.extensions.logstats] INFO: Crawled 621 pages (at 3560 pages/min), scraped 590 items (at 3240 items/min) | |
2018-11-29 09:50:33 [scrapy.extensions.logstats] INFO: Crawled 779 pages (at 3160 pages/min), scraped 761 items (at 3420 items/min) | |
2018-11-29 09:50:36 [scrapy.extensions.logstats] INFO: Crawled 938 pages (at 3180 pages/min), scraped 932 items (at 3420 items/min) | |
2018-11-29 09:50:37 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:50:38 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 52.10664560334505 items/sec | |
2018-11-29 09:50:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363515, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 50, 38, 380067), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45412352, | |
'memusage/startup': 45412352, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 50, 18, 346932)} | |
2018-11-29 09:50:38 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 51.86197850003227 Median : 52.091360755197876 Std Dev : 0.29215936184384017 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-3.6/ | |
2018-11-29 10:12:07 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:12:07 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:12:07 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:12:07 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:12:07 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:12:07 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:12:07 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:12:07 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:12:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:12:07 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:12:10 [scrapy.extensions.logstats] INFO: Crawled 98 pages (at 1960 pages/min), scraped 91 items (at 1820 items/min) | |
2018-11-29 10:12:13 [scrapy.extensions.logstats] INFO: Crawled 281 pages (at 3660 pages/min), scraped 256 items (at 3300 items/min) | |
2018-11-29 10:12:16 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 3340 pages/min), scraped 422 items (at 3320 items/min) | |
2018-11-29 10:12:19 [scrapy.extensions.logstats] INFO: Crawled 606 pages (at 3160 pages/min), scraped 587 items (at 3300 items/min) | |
2018-11-29 10:12:22 [scrapy.extensions.logstats] INFO: Crawled 779 pages (at 3460 pages/min), scraped 760 items (at 3460 items/min) | |
2018-11-29 10:12:25 [scrapy.extensions.logstats] INFO: Crawled 950 pages (at 3420 pages/min), scraped 930 items (at 3400 items/min) | |
2018-11-29 10:12:26 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:12:27 [scrapy.extensions.feedexport] INFO: Stored csv feed (1047 items) in: items.csv | |
The average speed of the spider is 52.264764796054884 items/sec | |
2018-11-29 10:12:27 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362913, | |
'downloader/request_count': 1047, | |
'downloader/request_method_count/GET': 1047, | |
'downloader/response_bytes': 23016374, | |
'downloader/response_count': 1047, | |
'downloader/response_status_count/200': 1047, | |
'dupefilter/filtered': 14700, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 12, 27, 217546), | |
'item_scraped_count': 1047, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45436928, | |
'memusage/startup': 45436928, | |
'request_depth_max': 9, | |
'response_received_count': 1047, | |
'scheduler/dequeued': 1047, | |
'scheduler/dequeued/memory': 1047, | |
'scheduler/enqueued': 1080, | |
'scheduler/enqueued/memory': 1080, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 12, 7, 214518)} | |
2018-11-29 10:12:27 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:12:28 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:12:28 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:12:28 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:12:28 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:12:28 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:12:28 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:12:28 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:12:28 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:12:28 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:12:28 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:12:31 [scrapy.extensions.logstats] INFO: Crawled 90 pages (at 1800 pages/min), scraped 82 items (at 1640 items/min) | |
2018-11-29 10:12:34 [scrapy.extensions.logstats] INFO: Crawled 267 pages (at 3540 pages/min), scraped 260 items (at 3560 items/min) | |
2018-11-29 10:12:37 [scrapy.extensions.logstats] INFO: Crawled 431 pages (at 3280 pages/min), scraped 425 items (at 3300 items/min) | |
2018-11-29 10:12:40 [scrapy.extensions.logstats] INFO: Crawled 613 pages (at 3640 pages/min), scraped 584 items (at 3180 items/min) | |
2018-11-29 10:12:43 [scrapy.extensions.logstats] INFO: Crawled 764 pages (at 3020 pages/min), scraped 762 items (at 3560 items/min) | |
2018-11-29 10:12:46 [scrapy.extensions.logstats] INFO: Crawled 948 pages (at 3680 pages/min), scraped 924 items (at 3240 items/min) | |
2018-11-29 10:12:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:12:48 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 52.10157002214863 items/sec | |
2018-11-29 10:12:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363571, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 12, 48, 244966), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45408256, | |
'memusage/startup': 45408256, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 12, 28, 125770)} | |
2018-11-29 10:12:48 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:12:49 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:12:49 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:12:49 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:12:49 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:12:49 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:12:49 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:12:49 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:12:49 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:12:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:12:49 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:12:52 [scrapy.extensions.logstats] INFO: Crawled 86 pages (at 1720 pages/min), scraped 86 items (at 1720 items/min) | |
2018-11-29 10:12:55 [scrapy.extensions.logstats] INFO: Crawled 282 pages (at 3920 pages/min), scraped 255 items (at 3380 items/min) | |
2018-11-29 10:12:58 [scrapy.extensions.logstats] INFO: Crawled 440 pages (at 3160 pages/min), scraped 416 items (at 3220 items/min) | |
2018-11-29 10:13:01 [scrapy.extensions.logstats] INFO: Crawled 596 pages (at 3120 pages/min), scraped 588 items (at 3440 items/min) | |
2018-11-29 10:13:04 [scrapy.extensions.logstats] INFO: Crawled 776 pages (at 3600 pages/min), scraped 749 items (at 3220 items/min) | |
2018-11-29 10:13:07 [scrapy.extensions.logstats] INFO: Crawled 942 pages (at 3320 pages/min), scraped 922 items (at 3460 items/min) | |
2018-11-29 10:13:08 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:13:09 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv | |
The average speed of the spider is 51.930715786382954 items/sec | |
2018-11-29 10:13:09 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 368205, | |
'downloader/request_count': 1059, | |
'downloader/request_method_count/GET': 1059, | |
'downloader/response_bytes': 23312675, | |
'downloader/response_count': 1059, | |
'downloader/response_status_count/200': 1059, | |
'dupefilter/filtered': 14936, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 13, 9, 522094), | |
'item_scraped_count': 1059, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 45408256, | |
'memusage/startup': 45408256, | |
'request_depth_max': 10, | |
'response_received_count': 1059, | |
'scheduler/dequeued': 1059, | |
'scheduler/dequeued/memory': 1059, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 12, 49, 143973)} | |
2018-11-29 10:13:09 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:13:10 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:13:10 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:13:10 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:13:10 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:13:10 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:13:10 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:13:10 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:13:10 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:13:10 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:13:10 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:13:13 [scrapy.extensions.logstats] INFO: Crawled 95 pages (at 1900 pages/min), scraped 88 items (at 1760 items/min) | |
2018-11-29 10:13:16 [scrapy.extensions.logstats] INFO: Crawled 269 pages (at 3480 pages/min), scraped 257 items (at 3380 items/min) | |
2018-11-29 10:13:19 [scrapy.extensions.logstats] INFO: Crawled 422 pages (at 3060 pages/min), scraped 416 items (at 3180 items/min) | |
2018-11-29 10:13:22 [scrapy.extensions.logstats] INFO: Crawled 595 pages (at 3460 pages/min), scraped 581 items (at 3300 items/min) | |
2018-11-29 10:13:25 [scrapy.extensions.logstats] INFO: Crawled 761 pages (at 3320 pages/min), scraped 755 items (at 3480 items/min) | |
2018-11-29 10:13:28 [scrapy.extensions.logstats] INFO: Crawled 926 pages (at 3300 pages/min), scraped 920 items (at 3300 items/min) | |
2018-11-29 10:13:29 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:13:30 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 51.69871550439042 items/sec | |
2018-11-29 10:13:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363583, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 13, 30, 695211), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45408256, | |
'memusage/startup': 45408256, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 13, 10, 425861)} | |
2018-11-29 10:13:30 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 10:13:31 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-11-29 10:13:31 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 10:13:31 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 10:13:31 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 10:13:31 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 10:13:31 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 10:13:31 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 10:13:31 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 10:13:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 10:13:31 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 | |
2018-11-29 10:13:34 [scrapy.extensions.logstats] INFO: Crawled 85 pages (at 1700 pages/min), scraped 85 items (at 1700 items/min) | |
2018-11-29 10:13:37 [scrapy.extensions.logstats] INFO: Crawled 277 pages (at 3840 pages/min), scraped 254 items (at 3380 items/min) | |
2018-11-29 10:13:40 [scrapy.extensions.logstats] INFO: Crawled 431 pages (at 3080 pages/min), scraped 417 items (at 3260 items/min) | |
2018-11-29 10:13:43 [scrapy.extensions.logstats] INFO: Crawled 611 pages (at 3600 pages/min), scraped 586 items (at 3380 items/min) | |
2018-11-29 10:13:46 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 3280 pages/min), scraped 757 items (at 3420 items/min) | |
2018-11-29 10:13:49 [scrapy.extensions.logstats] INFO: Crawled 932 pages (at 3140 pages/min), scraped 930 items (at 3460 items/min) | |
2018-11-29 10:13:51 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 10:13:51 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv | |
The average speed of the spider is 51.745365191545254 items/sec | |
2018-11-29 10:13:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 362646, | |
'downloader/request_count': 1044, | |
'downloader/request_method_count/GET': 1044, | |
'downloader/response_bytes': 22995909, | |
'downloader/response_count': 1044, | |
'downloader/response_status_count/200': 1044, | |
'dupefilter/filtered': 14757, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 10, 13, 51, 714065), | |
'item_scraped_count': 1044, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 18, | |
'memusage/max': 45400064, | |
'memusage/startup': 45400064, | |
'request_depth_max': 10, | |
'response_received_count': 1044, | |
'scheduler/dequeued': 1044, | |
'scheduler/dequeued/memory': 1044, | |
'scheduler/enqueued': 1061, | |
'scheduler/enqueued/memory': 1061, | |
'start_time': datetime.datetime(2018, 11, 29, 10, 13, 31, 585460)} | |
2018-11-29 10:13:51 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 51.94822626010443 Median : 51.930715786382954 Std Dev : 0.21327250227979302 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-3.6/ | |
2018-11-29 09:44:12 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 09:44:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:44:12 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:44:12 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:44:12 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:44:12 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:44:12 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:44:12 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:44:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:44:15 [scrapy.extensions.logstats] INFO: Crawled 97 pages (at 1940 pages/min), scraped 90 items (at 1800 items/min) | |
2018-11-29 09:44:18 [scrapy.extensions.logstats] INFO: Crawled 286 pages (at 3780 pages/min), scraped 254 items (at 3280 items/min) | |
2018-11-29 09:44:21 [scrapy.extensions.logstats] INFO: Crawled 438 pages (at 3040 pages/min), scraped 430 items (at 3520 items/min) | |
2018-11-29 09:44:24 [scrapy.extensions.logstats] INFO: Crawled 597 pages (at 3180 pages/min), scraped 590 items (at 3200 items/min) | |
2018-11-29 09:44:27 [scrapy.extensions.logstats] INFO: Crawled 777 pages (at 3600 pages/min), scraped 755 items (at 3300 items/min) | |
2018-11-29 09:44:30 [scrapy.extensions.logstats] INFO: Crawled 955 pages (at 3560 pages/min), scraped 924 items (at 3380 items/min) | |
2018-11-29 09:44:31 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:44:32 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv | |
The average speed of the spider is 52.12280606504324 items/sec | |
2018-11-29 09:44:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 367102, | |
'downloader/request_count': 1059, | |
'downloader/request_method_count/GET': 1059, | |
'downloader/response_bytes': 23312675, | |
'downloader/response_count': 1059, | |
'downloader/response_status_count/200': 1059, | |
'dupefilter/filtered': 14936, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 44, 32, 369966), | |
'item_scraped_count': 1059, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 45654016, | |
'memusage/startup': 45654016, | |
'request_depth_max': 10, | |
'response_received_count': 1059, | |
'scheduler/dequeued': 1059, | |
'scheduler/dequeued/memory': 1059, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 44, 12, 82335)} | |
2018-11-29 09:44:32 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:44:33 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 09:44:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:44:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:44:33 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:44:33 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:44:33 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:44:33 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:44:33 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:44:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:44:36 [scrapy.extensions.logstats] INFO: Crawled 92 pages (at 1840 pages/min), scraped 92 items (at 1840 items/min) | |
2018-11-29 09:44:39 [scrapy.extensions.logstats] INFO: Crawled 273 pages (at 3620 pages/min), scraped 265 items (at 3460 items/min) | |
2018-11-29 09:44:42 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 3500 pages/min), scraped 435 items (at 3400 items/min) | |
2018-11-29 09:44:45 [scrapy.extensions.logstats] INFO: Crawled 611 pages (at 3260 pages/min), scraped 599 items (at 3280 items/min) | |
2018-11-29 09:44:48 [scrapy.extensions.logstats] INFO: Crawled 779 pages (at 3360 pages/min), scraped 764 items (at 3300 items/min) | |
2018-11-29 09:44:51 [scrapy.extensions.logstats] INFO: Crawled 954 pages (at 3500 pages/min), scraped 942 items (at 3560 items/min) | |
2018-11-29 09:44:52 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:44:53 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 52.22712560289813 items/sec | |
2018-11-29 09:44:53 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363560, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 44, 53, 410047), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45649920, | |
'memusage/startup': 45649920, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 44, 33, 338861)} | |
2018-11-29 09:44:53 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:44:54 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 09:44:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:44:54 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:44:54 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:44:54 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:44:54 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:44:54 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:44:54 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:44:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:44:57 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 85 items (at 1700 items/min) | |
2018-11-29 09:45:00 [scrapy.extensions.logstats] INFO: Crawled 272 pages (at 3580 pages/min), scraped 263 items (at 3560 items/min) | |
2018-11-29 09:45:03 [scrapy.extensions.logstats] INFO: Crawled 433 pages (at 3220 pages/min), scraped 414 items (at 3020 items/min) | |
2018-11-29 09:45:06 [scrapy.extensions.logstats] INFO: Crawled 596 pages (at 3260 pages/min), scraped 581 items (at 3340 items/min) | |
2018-11-29 09:45:09 [scrapy.extensions.logstats] INFO: Crawled 724 pages (at 2560 pages/min), scraped 722 items (at 2820 items/min) | |
2018-11-29 09:45:12 [scrapy.extensions.logstats] INFO: Crawled 888 pages (at 3280 pages/min), scraped 886 items (at 3280 items/min) | |
2018-11-29 09:45:14 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:45:15 [scrapy.extensions.logstats] INFO: Crawled 1057 pages (at 3380 pages/min), scraped 1055 items (at 3380 items/min) | |
2018-11-29 09:45:15 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv | |
The average speed of the spider is 50.04157957218292 items/sec | |
2018-11-29 09:45:15 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 366382, | |
'downloader/request_count': 1057, | |
'downloader/request_method_count/GET': 1057, | |
'downloader/response_bytes': 23206941, | |
'downloader/response_count': 1057, | |
'downloader/response_status_count/200': 1057, | |
'dupefilter/filtered': 14790, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 45, 15, 468617), | |
'item_scraped_count': 1057, | |
'log_count/INFO': 15, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 25, | |
'memusage/max': 45649920, | |
'memusage/startup': 45649920, | |
'request_depth_max': 10, | |
'response_received_count': 1057, | |
'scheduler/dequeued': 1057, | |
'scheduler/dequeued/memory': 1057, | |
'scheduler/enqueued': 1081, | |
'scheduler/enqueued/memory': 1081, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 44, 54, 376711)} | |
2018-11-29 09:45:15 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:45:16 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 09:45:16 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:45:16 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:45:16 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:45:16 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:45:16 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:45:16 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:45:16 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:45:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:45:19 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 86 items (at 1720 items/min) | |
2018-11-29 09:45:22 [scrapy.extensions.logstats] INFO: Crawled 280 pages (at 3740 pages/min), scraped 261 items (at 3500 items/min) | |
2018-11-29 09:45:25 [scrapy.extensions.logstats] INFO: Crawled 440 pages (at 3200 pages/min), scraped 422 items (at 3220 items/min) | |
2018-11-29 09:45:28 [scrapy.extensions.logstats] INFO: Crawled 615 pages (at 3500 pages/min), scraped 587 items (at 3300 items/min) | |
2018-11-29 09:45:31 [scrapy.extensions.logstats] INFO: Crawled 776 pages (at 3220 pages/min), scraped 766 items (at 3580 items/min) | |
2018-11-29 09:45:34 [scrapy.extensions.logstats] INFO: Crawled 948 pages (at 3440 pages/min), scraped 933 items (at 3340 items/min) | |
2018-11-29 09:45:35 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:45:36 [scrapy.extensions.feedexport] INFO: Stored csv feed (1048 items) in: items.csv | |
The average speed of the spider is 52.2544657729506 items/sec | |
2018-11-29 09:45:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363602, | |
'downloader/request_count': 1048, | |
'downloader/request_method_count/GET': 1048, | |
'downloader/response_bytes': 23067683, | |
'downloader/response_count': 1048, | |
'downloader/response_status_count/200': 1048, | |
'dupefilter/filtered': 14773, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 45, 36, 491510), | |
'item_scraped_count': 1048, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45649920, | |
'memusage/startup': 45649920, | |
'request_depth_max': 10, | |
'response_received_count': 1048, | |
'scheduler/dequeued': 1048, | |
'scheduler/dequeued/memory': 1048, | |
'scheduler/enqueued': 1081, | |
'scheduler/enqueued/memory': 1081, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 45, 16, 450202)} | |
2018-11-29 09:45:36 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-11-29 09:45:37 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books) | |
2018-11-29 09:45:37 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo | |
2018-11-29 09:45:37 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-11-29 09:45:37 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-11-29 09:45:37 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-11-29 09:45:37 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-11-29 09:45:37 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-11-29 09:45:37 [scrapy.core.engine] INFO: Spider opened | |
2018-11-29 09:45:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-11-29 09:45:40 [scrapy.extensions.logstats] INFO: Crawled 97 pages (at 1940 pages/min), scraped 84 items (at 1680 items/min) | |
2018-11-29 09:45:43 [scrapy.extensions.logstats] INFO: Crawled 282 pages (at 3700 pages/min), scraped 264 items (at 3600 items/min) | |
2018-11-29 09:45:46 [scrapy.extensions.logstats] INFO: Crawled 442 pages (at 3200 pages/min), scraped 428 items (at 3280 items/min) | |
2018-11-29 09:45:49 [scrapy.extensions.logstats] INFO: Crawled 622 pages (at 3600 pages/min), scraped 591 items (at 3260 items/min) | |
2018-11-29 09:45:52 [scrapy.extensions.logstats] INFO: Crawled 774 pages (at 3040 pages/min), scraped 766 items (at 3500 items/min) | |
2018-11-29 09:45:55 [scrapy.extensions.logstats] INFO: Crawled 963 pages (at 3780 pages/min), scraped 946 items (at 3600 items/min) | |
2018-11-29 09:45:56 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-11-29 09:45:57 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv | |
The average speed of the spider is 52.416718620295846 items/sec | |
2018-11-29 09:45:57 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 363635, | |
'downloader/request_count': 1049, | |
'downloader/request_method_count/GET': 1049, | |
'downloader/response_bytes': 23119129, | |
'downloader/response_count': 1049, | |
'downloader/response_status_count/200': 1049, | |
'dupefilter/filtered': 14846, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 11, 29, 9, 45, 57, 448373), | |
'item_scraped_count': 1049, | |
'log_count/INFO': 14, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 45649920, | |
'memusage/startup': 45649920, | |
'request_depth_max': 10, | |
'response_received_count': 1049, | |
'scheduler/dequeued': 1049, | |
'scheduler/dequeued/memory': 1049, | |
'scheduler/enqueued': 1082, | |
'scheduler/enqueued/memory': 1082, | |
'start_time': datetime.datetime(2018, 11, 29, 9, 45, 37, 465467)} | |
2018-11-29 09:45:57 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 51.812539126674146 Median : 52.22712560289813 Std Dev : 0.8904804640371003 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment