Last active
December 5, 2018 10:19
-
-
Save whalebot-helmsman/83ed45a1aa1b4221ab6be2f36226c5e3 to your computer and use it in GitHub Desktop.
Results of scrapy-bench for new priority queues
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 1 broadworm in /home/nikita/ves/scrapy-bench-2.7/ | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-04 16:16:22 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: broadspider) | |
2018-12-04 16:16:22 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-04 16:16:22 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'broad.spiders', 'CLOSESPIDER_ITEMCOUNT': 800, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['broad.spiders'], 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'broadspider', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'AUTOTHROTTLE_ENABLED': True} | |
2018-12-04 16:16:22 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.throttle.AutoThrottle'] | |
2018-12-04 16:16:22 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-04 16:16:22 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-04 16:16:22 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-04 16:16:22 [scrapy.core.engine] INFO: Spider opened | |
2018-12-04 16:16:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-04 16:16:22 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-04 16:16:25 [scrapy.extensions.logstats] INFO: Crawled 53 pages (at 1060 pages/min), scraped 51 items (at 1020 items/min) | |
2018-12-04 16:16:28 [scrapy.extensions.logstats] INFO: Crawled 73 pages (at 400 pages/min), scraped 72 items (at 420 items/min) | |
2018-12-04 16:16:31 [scrapy.extensions.logstats] INFO: Crawled 133 pages (at 1200 pages/min), scraped 129 items (at 1140 items/min) | |
2018-12-04 16:16:34 [scrapy.extensions.logstats] INFO: Crawled 299 pages (at 3320 pages/min), scraped 280 items (at 3020 items/min) | |
2018-12-04 16:16:38 [scrapy.extensions.logstats] INFO: Crawled 447 pages (at 2960 pages/min), scraped 416 items (at 2720 items/min) | |
2018-12-04 16:16:41 [scrapy.extensions.logstats] INFO: Crawled 566 pages (at 2380 pages/min), scraped 534 items (at 2360 items/min) | |
2018-12-04 16:16:44 [scrapy.extensions.logstats] INFO: Crawled 701 pages (at 2700 pages/min), scraped 674 items (at 2800 items/min) | |
2018-12-04 16:16:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-04 16:16:47 [scrapy.extensions.logstats] INFO: Crawled 850 pages (at 2980 pages/min), scraped 809 items (at 2700 items/min) | |
2018-12-04 16:16:49 [scrapy.extensions.logstats] INFO: Crawled 887 pages (at 740 pages/min), scraped 885 items (at 1520 items/min) | |
2018-12-04 16:16:52 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 580 pages/min), scraped 915 items (at 600 items/min) | |
2018-12-04 16:16:55 [scrapy.extensions.logstats] INFO: Crawled 932 pages (at 320 pages/min), scraped 932 items (at 340 items/min) | |
2018-12-04 16:16:58 [scrapy.extensions.logstats] INFO: Crawled 937 pages (at 100 pages/min), scraped 937 items (at 100 items/min) | |
2018-12-04 16:17:01 [scrapy.extensions.logstats] INFO: Crawled 938 pages (at 20 pages/min), scraped 938 items (at 20 items/min) | |
2018-12-04 16:17:04 [scrapy.extensions.logstats] INFO: Crawled 938 pages (at 0 pages/min), scraped 938 items (at 0 items/min) | |
2018-12-04 16:17:06 [scrapy.extensions.feedexport] INFO: Stored csv feed (939 items) in: items.csv | |
The average speed of the spider is 21.50302623 items/sec | |
2018-12-04 16:17:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 290153, | |
'downloader/request_count': 939, | |
'downloader/request_method_count/GET': 939, | |
'downloader/response_bytes': 31931235, | |
'downloader/response_count': 939, | |
'downloader/response_status_count/200': 939, | |
'dupefilter/filtered': 24822, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 4, 16, 17, 6, 540783), | |
'item_scraped_count': 939, | |
'log_count/INFO': 23, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/BroadBenchSpider': 1, | |
'memdebug/live_refs/Request': 11515, | |
'memusage/max': 52441088, | |
'memusage/startup': 52441088, | |
'request_depth_max': 18, | |
'response_received_count': 939, | |
'scheduler/dequeued': 939, | |
'scheduler/dequeued/memory': 939, | |
'scheduler/enqueued': 12453, | |
'scheduler/enqueued/memory': 12453, | |
'start_time': datetime.datetime(2018, 12, 4, 16, 16, 22, 840560)} | |
2018-12-04 16:17:06 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Broad Crawl' Iterations = '1' | |
Mean : 21.50302623 Median : 21.50302623 Std Dev : 0.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url=http://172.17.0.6:8880 bookworm in /home/nikita/ves/scrapy-bench-2.7/ | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:56:26 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:56:26 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:56:26 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-12-03 16:56:26 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:56:26 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:56:26 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:56:26 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:56:26 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:56:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:56:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:56:29 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 261 items (at 5220 items/min) | |
2018-12-03 16:56:32 [scrapy.extensions.logstats] INFO: Crawled 586 pages (at 6300 pages/min), scraped 578 items (at 6340 items/min) | |
2018-12-03 16:56:35 [scrapy.extensions.logstats] INFO: Crawled 930 pages (at 6880 pages/min), scraped 919 items (at 6820 items/min) | |
2018-12-03 16:56:36 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:56:36 [scrapy.extensions.feedexport] INFO: Stored csv feed (1069 items) in: items.csv | |
The average speed of the spider is 99.6834756462 items/sec | |
2018-12-03 16:56:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 372557, | |
'downloader/request_count': 1069, | |
'downloader/request_method_count/GET': 1069, | |
'downloader/response_bytes': 23532410, | |
'downloader/response_count': 1069, | |
'downloader/response_status_count/200': 1069, | |
'dupefilter/filtered': 15006, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 56, 36, 735176), | |
'item_scraped_count': 1069, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 52449280, | |
'memusage/startup': 52449280, | |
'request_depth_max': 9, | |
'response_received_count': 1069, | |
'scheduler/dequeued': 1069, | |
'scheduler/dequeued/memory': 1069, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 56, 26, 355559)} | |
2018-12-03 16:56:36 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:56:37 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:56:37 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:56:37 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-12-03 16:56:37 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:56:37 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:56:37 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:56:37 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:56:37 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:56:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:56:37 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:56:40 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 263 items (at 5260 items/min) | |
2018-12-03 16:56:43 [scrapy.extensions.logstats] INFO: Crawled 605 pages (at 6680 pages/min), scraped 595 items (at 6640 items/min) | |
2018-12-03 16:56:46 [scrapy.extensions.logstats] INFO: Crawled 949 pages (at 6880 pages/min), scraped 939 items (at 6880 items/min) | |
2018-12-03 16:56:46 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:56:47 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv | |
The average speed of the spider is 102.718869713 items/sec | |
2018-12-03 16:56:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 368839, | |
'downloader/request_count': 1058, | |
'downloader/request_method_count/GET': 1058, | |
'downloader/response_bytes': 23293106, | |
'downloader/response_count': 1058, | |
'downloader/response_status_count/200': 1058, | |
'dupefilter/filtered': 14863, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 56, 47, 285942), | |
'item_scraped_count': 1058, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52391936, | |
'memusage/startup': 52391936, | |
'request_depth_max': 9, | |
'response_received_count': 1058, | |
'scheduler/dequeued': 1058, | |
'scheduler/dequeued/memory': 1058, | |
'scheduler/enqueued': 1081, | |
'scheduler/enqueued/memory': 1081, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 56, 37, 53618)} | |
2018-12-03 16:56:47 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:56:47 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:56:47 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:56:47 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-12-03 16:56:47 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:56:47 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:56:47 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:56:47 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:56:47 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:56:47 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:56:47 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:56:50 [scrapy.extensions.logstats] INFO: Crawled 267 pages (at 5340 pages/min), scraped 256 items (at 5120 items/min) | |
2018-12-03 16:56:53 [scrapy.extensions.logstats] INFO: Crawled 596 pages (at 6580 pages/min), scraped 585 items (at 6580 items/min) | |
2018-12-03 16:56:56 [scrapy.extensions.logstats] INFO: Crawled 957 pages (at 7220 pages/min), scraped 918 items (at 6660 items/min) | |
2018-12-03 16:56:57 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:56:57 [scrapy.extensions.feedexport] INFO: Stored csv feed (1068 items) in: items.csv | |
The average speed of the spider is 102.542592385 items/sec | |
2018-12-03 16:56:57 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 372248, | |
'downloader/request_count': 1068, | |
'downloader/request_method_count/GET': 1068, | |
'downloader/response_bytes': 23480934, | |
'downloader/response_count': 1068, | |
'downloader/response_status_count/200': 1068, | |
'dupefilter/filtered': 14933, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 56, 57, 997549), | |
'item_scraped_count': 1068, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 52162560, | |
'memusage/startup': 52162560, | |
'request_depth_max': 9, | |
'response_received_count': 1068, | |
'scheduler/dequeued': 1068, | |
'scheduler/dequeued/memory': 1068, | |
'scheduler/enqueued': 1101, | |
'scheduler/enqueued/memory': 1101, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 56, 47, 604597)} | |
2018-12-03 16:56:57 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:56:58 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:56:58 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:56:58 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-12-03 16:56:58 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:56:58 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:56:58 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:56:58 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:56:58 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:56:58 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:56:58 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:57:01 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 259 items (at 5180 items/min) | |
2018-12-03 16:57:04 [scrapy.extensions.logstats] INFO: Crawled 606 pages (at 6700 pages/min), scraped 585 items (at 6520 items/min) | |
2018-12-03 16:57:07 [scrapy.extensions.logstats] INFO: Crawled 951 pages (at 6900 pages/min), scraped 941 items (at 7120 items/min) | |
2018-12-03 16:57:08 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:57:08 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv | |
The average speed of the spider is 102.587522703 items/sec | |
2018-12-03 16:57:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 368496, | |
'downloader/request_count': 1057, | |
'downloader/request_method_count/GET': 1057, | |
'downloader/response_bytes': 23242113, | |
'downloader/response_count': 1057, | |
'downloader/response_status_count/200': 1057, | |
'dupefilter/filtered': 14790, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 57, 8, 491389), | |
'item_scraped_count': 1057, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52387840, | |
'memusage/startup': 52387840, | |
'request_depth_max': 9, | |
'response_received_count': 1057, | |
'scheduler/dequeued': 1057, | |
'scheduler/dequeued/memory': 1057, | |
'scheduler/enqueued': 1080, | |
'scheduler/enqueued/memory': 1080, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 56, 58, 319427)} | |
2018-12-03 16:57:08 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:57:08 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:57:08 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:57:08 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'} | |
2018-12-03 16:57:08 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:57:08 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:57:08 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:57:08 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:57:08 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:57:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:57:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:57:11 [scrapy.extensions.logstats] INFO: Crawled 270 pages (at 5400 pages/min), scraped 256 items (at 5120 items/min) | |
2018-12-03 16:57:15 [scrapy.extensions.logstats] INFO: Crawled 636 pages (at 7320 pages/min), scraped 584 items (at 6560 items/min) | |
2018-12-03 16:57:18 [scrapy.extensions.logstats] INFO: Crawled 990 pages (at 7080 pages/min), scraped 919 items (at 6700 items/min) | |
2018-12-03 16:57:18 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:57:19 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 99.4316869922 items/sec | |
2018-12-03 16:57:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 375976, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23726256, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 57, 19, 170607), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52449280, | |
'memusage/startup': 52449280, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 57, 8, 810973)} | |
2018-12-03 16:57:19 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 101.392829488 Median : 102.542592385 Std Dev : 1.50170567828 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 1 broadworm in /home/nikita/ves/scrapy-bench-2.7/ | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-04 16:06:36 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: broadspider) | |
2018-12-04 16:06:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-04 16:06:36 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'broad.spiders', 'CLOSESPIDER_ITEMCOUNT': 800, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['broad.spiders'], 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'broadspider', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'AUTOTHROTTLE_ENABLED': True} | |
2018-12-04 16:06:36 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.throttle.AutoThrottle'] | |
2018-12-04 16:06:36 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-04 16:06:36 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-04 16:06:36 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-04 16:06:36 [scrapy.core.engine] INFO: Spider opened | |
2018-12-04 16:06:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-04 16:06:36 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6026 | |
2018-12-04 16:06:39 [scrapy.extensions.logstats] INFO: Crawled 63 pages (at 1260 pages/min), scraped 61 items (at 1220 items/min) | |
2018-12-04 16:06:42 [scrapy.extensions.logstats] INFO: Crawled 71 pages (at 160 pages/min), scraped 70 items (at 180 items/min) | |
2018-12-04 16:06:45 [scrapy.extensions.logstats] INFO: Crawled 96 pages (at 500 pages/min), scraped 93 items (at 460 items/min) | |
2018-12-04 16:06:48 [scrapy.extensions.logstats] INFO: Crawled 168 pages (at 1440 pages/min), scraped 163 items (at 1400 items/min) | |
2018-12-04 16:06:51 [scrapy.extensions.logstats] INFO: Crawled 264 pages (at 1920 pages/min), scraped 261 items (at 1960 items/min) | |
2018-12-04 16:06:54 [scrapy.extensions.logstats] INFO: Crawled 328 pages (at 1280 pages/min), scraped 328 items (at 1340 items/min) | |
2018-12-04 16:06:57 [scrapy.extensions.logstats] INFO: Crawled 356 pages (at 560 pages/min), scraped 356 items (at 560 items/min) | |
2018-12-04 16:07:00 [scrapy.extensions.logstats] INFO: Crawled 385 pages (at 580 pages/min), scraped 383 items (at 540 items/min) | |
2018-12-04 16:07:03 [scrapy.extensions.logstats] INFO: Crawled 416 pages (at 620 pages/min), scraped 416 items (at 660 items/min) | |
2018-12-04 16:07:06 [scrapy.extensions.logstats] INFO: Crawled 446 pages (at 600 pages/min), scraped 445 items (at 580 items/min) | |
2018-12-04 16:07:09 [scrapy.extensions.logstats] INFO: Crawled 475 pages (at 580 pages/min), scraped 474 items (at 580 items/min) | |
2018-12-04 16:07:12 [scrapy.extensions.logstats] INFO: Crawled 505 pages (at 600 pages/min), scraped 505 items (at 620 items/min) | |
2018-12-04 16:07:15 [scrapy.extensions.logstats] INFO: Crawled 533 pages (at 560 pages/min), scraped 531 items (at 520 items/min) | |
2018-12-04 16:07:18 [scrapy.extensions.logstats] INFO: Crawled 561 pages (at 560 pages/min), scraped 561 items (at 600 items/min) | |
2018-12-04 16:07:21 [scrapy.extensions.logstats] INFO: Crawled 589 pages (at 560 pages/min), scraped 588 items (at 540 items/min) | |
2018-12-04 16:07:24 [scrapy.extensions.logstats] INFO: Crawled 617 pages (at 560 pages/min), scraped 614 items (at 520 items/min) | |
2018-12-04 16:07:27 [scrapy.extensions.logstats] INFO: Crawled 648 pages (at 620 pages/min), scraped 647 items (at 660 items/min) | |
2018-12-04 16:07:30 [scrapy.extensions.logstats] INFO: Crawled 675 pages (at 540 pages/min), scraped 674 items (at 540 items/min) | |
2018-12-04 16:07:33 [scrapy.extensions.logstats] INFO: Crawled 688 pages (at 260 pages/min), scraped 688 items (at 280 items/min) | |
2018-12-04 16:07:36 [scrapy.extensions.logstats] INFO: Crawled 696 pages (at 160 pages/min), scraped 696 items (at 160 items/min) | |
2018-12-04 16:07:39 [scrapy.extensions.logstats] INFO: Crawled 702 pages (at 120 pages/min), scraped 702 items (at 120 items/min) | |
2018-12-04 16:07:42 [scrapy.extensions.logstats] INFO: Crawled 707 pages (at 100 pages/min), scraped 707 items (at 100 items/min) | |
2018-12-04 16:07:45 [scrapy.extensions.logstats] INFO: Crawled 712 pages (at 100 pages/min), scraped 712 items (at 100 items/min) | |
2018-12-04 16:07:48 [scrapy.extensions.logstats] INFO: Crawled 718 pages (at 120 pages/min), scraped 717 items (at 100 items/min) | |
2018-12-04 16:07:51 [scrapy.extensions.logstats] INFO: Crawled 721 pages (at 60 pages/min), scraped 721 items (at 80 items/min) | |
2018-12-04 16:07:54 [scrapy.extensions.logstats] INFO: Crawled 728 pages (at 140 pages/min), scraped 728 items (at 140 items/min) | |
2018-12-04 16:07:57 [scrapy.extensions.logstats] INFO: Crawled 734 pages (at 120 pages/min), scraped 734 items (at 120 items/min) | |
2018-12-04 16:08:00 [scrapy.extensions.logstats] INFO: Crawled 739 pages (at 100 pages/min), scraped 739 items (at 100 items/min) | |
2018-12-04 16:08:03 [scrapy.extensions.logstats] INFO: Crawled 745 pages (at 120 pages/min), scraped 745 items (at 120 items/min) | |
2018-12-04 16:08:06 [scrapy.extensions.logstats] INFO: Crawled 751 pages (at 120 pages/min), scraped 751 items (at 120 items/min) | |
2018-12-04 16:08:09 [scrapy.extensions.logstats] INFO: Crawled 758 pages (at 140 pages/min), scraped 758 items (at 140 items/min) | |
2018-12-04 16:08:12 [scrapy.extensions.logstats] INFO: Crawled 762 pages (at 80 pages/min), scraped 762 items (at 80 items/min) | |
2018-12-04 16:08:15 [scrapy.extensions.logstats] INFO: Crawled 772 pages (at 200 pages/min), scraped 772 items (at 200 items/min) | |
2018-12-04 16:08:18 [scrapy.extensions.logstats] INFO: Crawled 774 pages (at 40 pages/min), scraped 774 items (at 40 items/min) | |
2018-12-04 16:08:21 [scrapy.extensions.logstats] INFO: Crawled 781 pages (at 140 pages/min), scraped 780 items (at 120 items/min) | |
2018-12-04 16:08:24 [scrapy.extensions.logstats] INFO: Crawled 788 pages (at 140 pages/min), scraped 788 items (at 160 items/min) | |
2018-12-04 16:08:27 [scrapy.extensions.logstats] INFO: Crawled 790 pages (at 40 pages/min), scraped 790 items (at 40 items/min) | |
2018-12-04 16:08:30 [scrapy.extensions.logstats] INFO: Crawled 796 pages (at 120 pages/min), scraped 796 items (at 120 items/min) | |
2018-12-04 16:08:32 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-04 16:08:33 [scrapy.extensions.logstats] INFO: Crawled 800 pages (at 80 pages/min), scraped 800 items (at 80 items/min) | |
2018-12-04 16:08:36 [scrapy.extensions.logstats] INFO: Crawled 807 pages (at 140 pages/min), scraped 807 items (at 140 items/min) | |
2018-12-04 16:08:39 [scrapy.extensions.logstats] INFO: Crawled 815 pages (at 160 pages/min), scraped 814 items (at 140 items/min) | |
2018-12-04 16:08:42 [scrapy.extensions.logstats] INFO: Crawled 819 pages (at 80 pages/min), scraped 819 items (at 100 items/min) | |
2018-12-04 16:08:45 [scrapy.extensions.logstats] INFO: Crawled 823 pages (at 80 pages/min), scraped 823 items (at 80 items/min) | |
2018-12-04 16:08:48 [scrapy.extensions.logstats] INFO: Crawled 826 pages (at 60 pages/min), scraped 826 items (at 60 items/min) | |
2018-12-04 16:08:51 [scrapy.extensions.logstats] INFO: Crawled 829 pages (at 60 pages/min), scraped 829 items (at 60 items/min) | |
2018-12-04 16:08:54 [scrapy.extensions.logstats] INFO: Crawled 835 pages (at 120 pages/min), scraped 834 items (at 100 items/min) | |
2018-12-04 16:08:57 [scrapy.extensions.logstats] INFO: Crawled 839 pages (at 80 pages/min), scraped 839 items (at 100 items/min) | |
2018-12-04 16:09:00 [scrapy.extensions.logstats] INFO: Crawled 842 pages (at 60 pages/min), scraped 842 items (at 60 items/min) | |
2018-12-04 16:09:03 [scrapy.extensions.logstats] INFO: Crawled 846 pages (at 80 pages/min), scraped 846 items (at 80 items/min) | |
2018-12-04 16:09:06 [scrapy.extensions.logstats] INFO: Crawled 852 pages (at 120 pages/min), scraped 852 items (at 120 items/min) | |
2018-12-04 16:09:09 [scrapy.extensions.logstats] INFO: Crawled 855 pages (at 60 pages/min), scraped 855 items (at 60 items/min) | |
2018-12-04 16:09:12 [scrapy.extensions.logstats] INFO: Crawled 858 pages (at 60 pages/min), scraped 857 items (at 40 items/min) | |
2018-12-04 16:09:15 [scrapy.extensions.logstats] INFO: Crawled 861 pages (at 60 pages/min), scraped 861 items (at 80 items/min) | |
2018-12-04 16:09:18 [scrapy.extensions.logstats] INFO: Crawled 862 pages (at 20 pages/min), scraped 862 items (at 20 items/min) | |
2018-12-04 16:09:21 [scrapy.extensions.logstats] INFO: Crawled 866 pages (at 80 pages/min), scraped 865 items (at 60 items/min) | |
2018-12-04 16:09:24 [scrapy.extensions.logstats] INFO: Crawled 868 pages (at 40 pages/min), scraped 868 items (at 60 items/min) | |
2018-12-04 16:09:27 [scrapy.extensions.logstats] INFO: Crawled 871 pages (at 60 pages/min), scraped 871 items (at 60 items/min) | |
2018-12-04 16:09:30 [scrapy.extensions.logstats] INFO: Crawled 873 pages (at 40 pages/min), scraped 873 items (at 40 items/min) | |
2018-12-04 16:09:33 [scrapy.extensions.logstats] INFO: Crawled 877 pages (at 80 pages/min), scraped 877 items (at 80 items/min) | |
2018-12-04 16:09:36 [scrapy.extensions.logstats] INFO: Crawled 880 pages (at 60 pages/min), scraped 879 items (at 40 items/min) | |
2018-12-04 16:09:39 [scrapy.extensions.logstats] INFO: Crawled 882 pages (at 40 pages/min), scraped 882 items (at 60 items/min) | |
2018-12-04 16:09:42 [scrapy.extensions.logstats] INFO: Crawled 885 pages (at 60 pages/min), scraped 885 items (at 60 items/min) | |
2018-12-04 16:09:45 [scrapy.extensions.logstats] INFO: Crawled 886 pages (at 20 pages/min), scraped 886 items (at 20 items/min) | |
2018-12-04 16:09:48 [scrapy.extensions.logstats] INFO: Crawled 889 pages (at 60 pages/min), scraped 889 items (at 60 items/min) | |
2018-12-04 16:09:51 [scrapy.extensions.logstats] INFO: Crawled 889 pages (at 0 pages/min), scraped 889 items (at 0 items/min) | |
2018-12-04 16:09:54 [scrapy.extensions.logstats] INFO: Crawled 891 pages (at 40 pages/min), scraped 891 items (at 40 items/min) | |
2018-12-04 16:09:57 [scrapy.extensions.logstats] INFO: Crawled 893 pages (at 40 pages/min), scraped 893 items (at 40 items/min) | |
2018-12-04 16:10:00 [scrapy.extensions.logstats] INFO: Crawled 893 pages (at 0 pages/min), scraped 893 items (at 0 items/min) | |
2018-12-04 16:10:03 [scrapy.extensions.logstats] INFO: Crawled 896 pages (at 60 pages/min), scraped 896 items (at 60 items/min) | |
2018-12-04 16:10:06 [scrapy.extensions.logstats] INFO: Crawled 897 pages (at 20 pages/min), scraped 897 items (at 20 items/min) | |
2018-12-04 16:10:09 [scrapy.extensions.logstats] INFO: Crawled 898 pages (at 20 pages/min), scraped 898 items (at 20 items/min) | |
2018-12-04 16:10:12 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 40 pages/min), scraped 900 items (at 40 items/min) | |
2018-12-04 16:10:15 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 0 pages/min), scraped 900 items (at 0 items/min) | |
2018-12-04 16:10:18 [scrapy.extensions.logstats] INFO: Crawled 901 pages (at 20 pages/min), scraped 901 items (at 20 items/min) | |
2018-12-04 16:10:21 [scrapy.extensions.logstats] INFO: Crawled 901 pages (at 0 pages/min), scraped 901 items (at 0 items/min) | |
2018-12-04 16:10:24 [scrapy.extensions.logstats] INFO: Crawled 903 pages (at 40 pages/min), scraped 903 items (at 40 items/min) | |
2018-12-04 16:10:27 [scrapy.extensions.logstats] INFO: Crawled 903 pages (at 0 pages/min), scraped 903 items (at 0 items/min) | |
2018-12-04 16:10:30 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 60 pages/min), scraped 906 items (at 60 items/min) | |
2018-12-04 16:10:33 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 0 pages/min), scraped 906 items (at 0 items/min) | |
2018-12-04 16:10:36 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 0 pages/min), scraped 906 items (at 0 items/min) | |
2018-12-04 16:10:39 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 0 pages/min), scraped 906 items (at 0 items/min) | |
2018-12-04 16:10:42 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 20 pages/min), scraped 907 items (at 20 items/min) | |
2018-12-04 16:10:45 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 0 pages/min), scraped 907 items (at 0 items/min) | |
2018-12-04 16:10:48 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 0 pages/min), scraped 907 items (at 0 items/min) | |
2018-12-04 16:10:51 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 0 pages/min), scraped 907 items (at 0 items/min) | |
2018-12-04 16:10:54 [scrapy.extensions.logstats] INFO: Crawled 908 pages (at 20 pages/min), scraped 908 items (at 20 items/min) | |
2018-12-04 16:10:57 [scrapy.extensions.logstats] INFO: Crawled 908 pages (at 0 pages/min), scraped 908 items (at 0 items/min) | |
2018-12-04 16:11:00 [scrapy.extensions.logstats] INFO: Crawled 909 pages (at 20 pages/min), scraped 909 items (at 20 items/min) | |
2018-12-04 16:11:03 [scrapy.extensions.logstats] INFO: Crawled 909 pages (at 0 pages/min), scraped 909 items (at 0 items/min) | |
2018-12-04 16:11:06 [scrapy.extensions.logstats] INFO: Crawled 911 pages (at 40 pages/min), scraped 911 items (at 40 items/min) | |
2018-12-04 16:11:09 [scrapy.extensions.logstats] INFO: Crawled 911 pages (at 0 pages/min), scraped 911 items (at 0 items/min) | |
2018-12-04 16:11:12 [scrapy.extensions.logstats] INFO: Crawled 912 pages (at 20 pages/min), scraped 912 items (at 20 items/min) | |
2018-12-04 16:11:15 [scrapy.extensions.logstats] INFO: Crawled 912 pages (at 0 pages/min), scraped 912 items (at 0 items/min) | |
2018-12-04 16:11:18 [scrapy.extensions.logstats] INFO: Crawled 912 pages (at 0 pages/min), scraped 912 items (at 0 items/min) | |
2018-12-04 16:11:21 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 20 pages/min), scraped 913 items (at 20 items/min) | |
2018-12-04 16:11:24 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 0 pages/min), scraped 913 items (at 0 items/min) | |
2018-12-04 16:11:27 [scrapy.extensions.logstats] INFO: Crawled 914 pages (at 20 pages/min), scraped 914 items (at 20 items/min) | |
2018-12-04 16:11:30 [scrapy.extensions.logstats] INFO: Crawled 914 pages (at 0 pages/min), scraped 914 items (at 0 items/min) | |
2018-12-04 16:11:33 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 20 pages/min), scraped 915 items (at 20 items/min) | |
2018-12-04 16:11:36 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 0 pages/min), scraped 915 items (at 0 items/min) | |
2018-12-04 16:11:39 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 0 pages/min), scraped 915 items (at 0 items/min) | |
2018-12-04 16:11:42 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 20 pages/min), scraped 916 items (at 20 items/min) | |
2018-12-04 16:11:45 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 0 pages/min), scraped 916 items (at 0 items/min) | |
2018-12-04 16:11:48 [scrapy.extensions.logstats] INFO: Crawled 917 pages (at 20 pages/min), scraped 917 items (at 20 items/min) | |
2018-12-04 16:11:51 [scrapy.extensions.logstats] INFO: Crawled 917 pages (at 0 pages/min), scraped 917 items (at 0 items/min) | |
2018-12-04 16:11:54 [scrapy.extensions.logstats] INFO: Crawled 918 pages (at 20 pages/min), scraped 918 items (at 20 items/min) | |
2018-12-04 16:11:57 [scrapy.extensions.logstats] INFO: Crawled 919 pages (at 20 pages/min), scraped 918 items (at 0 items/min) | |
2018-12-04 16:12:00 [scrapy.extensions.logstats] INFO: Crawled 919 pages (at 0 pages/min), scraped 919 items (at 20 items/min) | |
2018-12-04 16:12:03 [scrapy.extensions.feedexport] INFO: Stored csv feed (920 items) in: items.csv | |
The average speed of the spider is 2.81044919015 items/sec | |
2018-12-04 16:12:03 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 304229, | |
'downloader/request_count': 920, | |
'downloader/request_method_count/GET': 920, | |
'downloader/response_bytes': 25138362, | |
'downloader/response_count': 920, | |
'downloader/response_status_count/200': 920, | |
'dupefilter/filtered': 15014, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 4, 16, 12, 3, 643755), | |
'item_scraped_count': 920, | |
'log_count/INFO': 117, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/BroadBenchSpider': 1, | |
'memdebug/live_refs/Request': 7889, | |
'memusage/max': 120696832, | |
'memusage/startup': 52322304, | |
'request_depth_max': 13, | |
'response_received_count': 920, | |
'scheduler/dequeued': 920, | |
'scheduler/dequeued/memory': 920, | |
'scheduler/enqueued': 8808, | |
'scheduler/enqueued/memory': 8808, | |
'start_time': datetime.datetime(2018, 12, 4, 16, 6, 36, 587721)} | |
2018-12-04 16:12:03 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Broad Crawl' Iterations = '1' | |
Mean : 2.81044919015 Median : 2.81044919015 Std Dev : 0.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url=http://172.17.0.6:8880 bookworm in /home/nikita/ves/scrapy-bench-2.7/ | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:54:01 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:54:01 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:54:01 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-12-03 16:54:01 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:54:01 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:54:01 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:54:01 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:54:01 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:54:01 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:54:01 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:54:04 [scrapy.extensions.logstats] INFO: Crawled 276 pages (at 5520 pages/min), scraped 265 items (at 5300 items/min) | |
2018-12-03 16:54:08 [scrapy.extensions.logstats] INFO: Crawled 633 pages (at 7140 pages/min), scraped 574 items (at 6180 items/min) | |
2018-12-03 16:54:11 [scrapy.extensions.logstats] INFO: Crawled 989 pages (at 7120 pages/min), scraped 917 items (at 6860 items/min) | |
2018-12-03 16:54:11 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:54:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1067 items) in: items.csv | |
The average speed of the spider is 99.6846940525 items/sec | |
2018-12-03 16:54:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 371951, | |
'downloader/request_count': 1067, | |
'downloader/request_method_count/GET': 1067, | |
'downloader/response_bytes': 23429941, | |
'downloader/response_count': 1067, | |
'downloader/response_status_count/200': 1067, | |
'dupefilter/filtered': 14860, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 12, 19157), | |
'item_scraped_count': 1067, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 52396032, | |
'memusage/startup': 52396032, | |
'request_depth_max': 8, | |
'response_received_count': 1067, | |
'scheduler/dequeued': 1067, | |
'scheduler/dequeued/memory': 1067, | |
'scheduler/enqueued': 1100, | |
'scheduler/enqueued/memory': 1100, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 1, 728204)} | |
2018-12-03 16:54:12 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:54:12 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:54:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:54:12 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-12-03 16:54:12 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:54:12 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:54:12 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:54:12 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:54:12 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:54:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:54:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:54:15 [scrapy.extensions.logstats] INFO: Crawled 278 pages (at 5560 pages/min), scraped 270 items (at 5400 items/min) | |
2018-12-03 16:54:18 [scrapy.extensions.logstats] INFO: Crawled 609 pages (at 6620 pages/min), scraped 597 items (at 6540 items/min) | |
2018-12-03 16:54:21 [scrapy.extensions.logstats] INFO: Crawled 956 pages (at 6940 pages/min), scraped 936 items (at 6780 items/min) | |
2018-12-03 16:54:22 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:54:22 [scrapy.extensions.feedexport] INFO: Stored csv feed (1056 items) in: items.csv | |
The average speed of the spider is 103.880934494 items/sec | |
2018-12-03 16:54:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 368183, | |
'downloader/request_count': 1056, | |
'downloader/request_method_count/GET': 1056, | |
'downloader/response_bytes': 23190637, | |
'downloader/response_count': 1056, | |
'downloader/response_status_count/200': 1056, | |
'dupefilter/filtered': 14717, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 22, 512185), | |
'item_scraped_count': 1056, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52391936, | |
'memusage/startup': 52391936, | |
'request_depth_max': 8, | |
'response_received_count': 1056, | |
'scheduler/dequeued': 1056, | |
'scheduler/dequeued/memory': 1056, | |
'scheduler/enqueued': 1079, | |
'scheduler/enqueued/memory': 1079, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 12, 340425)} | |
2018-12-03 16:54:22 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:54:22 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:54:22 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:54:22 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-12-03 16:54:22 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:54:22 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:54:22 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:54:22 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:54:22 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:54:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:54:22 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:54:25 [scrapy.extensions.logstats] INFO: Crawled 278 pages (at 5560 pages/min), scraped 267 items (at 5340 items/min) | |
2018-12-03 16:54:29 [scrapy.extensions.logstats] INFO: Crawled 609 pages (at 6620 pages/min), scraped 597 items (at 6600 items/min) | |
2018-12-03 16:54:32 [scrapy.extensions.logstats] INFO: Crawled 961 pages (at 7040 pages/min), scraped 935 items (at 6760 items/min) | |
2018-12-03 16:54:32 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:54:33 [scrapy.extensions.feedexport] INFO: Stored csv feed (1077 items) in: items.csv | |
The average speed of the spider is 102.582743682 items/sec | |
2018-12-03 16:54:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 375405, | |
'downloader/request_count': 1077, | |
'downloader/request_method_count/GET': 1077, | |
'downloader/response_bytes': 23623787, | |
'downloader/response_count': 1077, | |
'downloader/response_status_count/200': 1077, | |
'dupefilter/filtered': 14950, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 33, 295471), | |
'item_scraped_count': 1077, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52355072, | |
'memusage/startup': 52355072, | |
'request_depth_max': 8, | |
'response_received_count': 1077, | |
'scheduler/dequeued': 1077, | |
'scheduler/dequeued/memory': 1077, | |
'scheduler/enqueued': 1100, | |
'scheduler/enqueued/memory': 1100, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 22, 822215)} | |
2018-12-03 16:54:33 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:54:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:54:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:54:33 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-12-03 16:54:33 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:54:33 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:54:33 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:54:33 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:54:33 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:54:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:54:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:54:36 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 258 items (at 5160 items/min) | |
2018-12-03 16:54:39 [scrapy.extensions.logstats] INFO: Crawled 624 pages (at 7060 pages/min), scraped 591 items (at 6660 items/min) | |
2018-12-03 16:54:42 [scrapy.extensions.logstats] INFO: Crawled 945 pages (at 6420 pages/min), scraped 937 items (at 6920 items/min) | |
2018-12-03 16:54:43 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:54:43 [scrapy.extensions.feedexport] INFO: Stored csv feed (1077 items) in: items.csv | |
The average speed of the spider is 102.784142089 items/sec | |
2018-12-03 16:54:43 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 375459, | |
'downloader/request_count': 1077, | |
'downloader/request_method_count/GET': 1077, | |
'downloader/response_bytes': 23623787, | |
'downloader/response_count': 1077, | |
'downloader/response_status_count/200': 1077, | |
'dupefilter/filtered': 14950, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 43, 888686), | |
'item_scraped_count': 1077, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52465664, | |
'memusage/startup': 52465664, | |
'request_depth_max': 8, | |
'response_received_count': 1077, | |
'scheduler/dequeued': 1077, | |
'scheduler/dequeued/memory': 1077, | |
'scheduler/enqueued': 1100, | |
'scheduler/enqueued/memory': 1100, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 33, 612214)} | |
2018-12-03 16:54:43 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. | |
utils.DeprecatedIn23, | |
2018-12-03 16:54:44 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:54:44 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty | |
2018-12-03 16:54:44 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'} | |
2018-12-03 16:54:44 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.corestats.CoreStats'] | |
2018-12-03 16:54:44 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:54:44 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:54:44 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:54:44 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:54:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:54:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:54:47 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 260 items (at 5200 items/min) | |
2018-12-03 16:54:50 [scrapy.extensions.logstats] INFO: Crawled 641 pages (at 7400 pages/min), scraped 582 items (at 6440 items/min) | |
2018-12-03 16:54:53 [scrapy.extensions.logstats] INFO: Crawled 988 pages (at 6940 pages/min), scraped 927 items (at 6900 items/min) | |
2018-12-03 16:54:54 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:54:54 [scrapy.extensions.feedexport] INFO: Stored csv feed (1077 items) in: items.csv | |
The average speed of the spider is 100.624918156 items/sec | |
2018-12-03 16:54:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 375491, | |
'downloader/request_count': 1077, | |
'downloader/request_method_count/GET': 1077, | |
'downloader/response_bytes': 23656308, | |
'downloader/response_count': 1077, | |
'downloader/response_status_count/200': 1077, | |
'dupefilter/filtered': 15014, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 54, 523037), | |
'item_scraped_count': 1077, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 25, | |
'memusage/max': 52346880, | |
'memusage/startup': 52346880, | |
'request_depth_max': 9, | |
'response_received_count': 1077, | |
'scheduler/dequeued': 1077, | |
'scheduler/dequeued/memory': 1077, | |
'scheduler/enqueued': 1101, | |
'scheduler/enqueued/memory': 1101, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 44, 216313)} | |
2018-12-03 16:54:54 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 101.911486495 Median : 102.582743682 Std Dev : 1.53001320847 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 1 broadworm in /home/nikita/ves/scrapy-bench-3.6/ | |
2018-12-04 16:20:04 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: broadspider) | |
2018-12-04 16:20:04 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-04 16:20:04 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'broadspider', 'CLOSESPIDER_ITEMCOUNT': 800, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'broad.spiders', 'REACTOR_THREADPOOL_MAXSIZE': 20, 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['broad.spiders']} | |
2018-12-04 16:20:05 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.throttle.AutoThrottle'] | |
2018-12-04 16:20:05 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-04 16:20:05 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-04 16:20:05 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-04 16:20:05 [scrapy.core.engine] INFO: Spider opened | |
2018-12-04 16:20:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-04 16:20:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-04 16:20:09 [scrapy.extensions.logstats] INFO: Crawled 51 pages (at 1020 pages/min), scraped 51 items (at 1020 items/min) | |
2018-12-04 16:20:12 [scrapy.extensions.logstats] INFO: Crawled 84 pages (at 660 pages/min), scraped 82 items (at 620 items/min) | |
2018-12-04 16:20:15 [scrapy.extensions.logstats] INFO: Crawled 165 pages (at 1620 pages/min), scraped 159 items (at 1540 items/min) | |
2018-12-04 16:20:18 [scrapy.extensions.logstats] INFO: Crawled 318 pages (at 3060 pages/min), scraped 289 items (at 2600 items/min) | |
2018-12-04 16:20:21 [scrapy.extensions.logstats] INFO: Crawled 443 pages (at 2500 pages/min), scraped 411 items (at 2440 items/min) | |
2018-12-04 16:20:24 [scrapy.extensions.logstats] INFO: Crawled 581 pages (at 2760 pages/min), scraped 543 items (at 2640 items/min) | |
2018-12-04 16:20:27 [scrapy.extensions.logstats] INFO: Crawled 714 pages (at 2660 pages/min), scraped 687 items (at 2880 items/min) | |
2018-12-04 16:20:29 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-04 16:20:30 [scrapy.extensions.logstats] INFO: Crawled 866 pages (at 3040 pages/min), scraped 832 items (at 2900 items/min) | |
2018-12-04 16:20:33 [scrapy.extensions.logstats] INFO: Crawled 896 pages (at 600 pages/min), scraped 896 items (at 1280 items/min) | |
2018-12-04 16:20:36 [scrapy.extensions.logstats] INFO: Crawled 925 pages (at 580 pages/min), scraped 924 items (at 560 items/min) | |
2018-12-04 16:20:39 [scrapy.extensions.logstats] INFO: Crawled 942 pages (at 340 pages/min), scraped 942 items (at 360 items/min) | |
2018-12-04 16:20:42 [scrapy.extensions.logstats] INFO: Crawled 943 pages (at 20 pages/min), scraped 943 items (at 20 items/min) | |
2018-12-04 16:20:45 [scrapy.extensions.logstats] INFO: Crawled 943 pages (at 0 pages/min), scraped 943 items (at 0 items/min) | |
2018-12-04 16:20:47 [scrapy.extensions.feedexport] INFO: Stored csv feed (944 items) in: items.csv | |
The average speed of the spider is 23.12196554851653 items/sec | |
2018-12-04 16:20:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 287368, | |
'downloader/request_count': 944, | |
'downloader/request_method_count/GET': 944, | |
'downloader/response_bytes': 34229023, | |
'downloader/response_count': 944, | |
'downloader/response_status_count/200': 944, | |
'dupefilter/filtered': 28023, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 4, 16, 20, 46, 938954), | |
'item_scraped_count': 944, | |
'log_count/INFO': 22, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/BroadBenchSpider': 1, | |
'memdebug/live_refs/Request': 12725, | |
'memusage/max': 52166656, | |
'memusage/startup': 52166656, | |
'request_depth_max': 15, | |
'response_received_count': 944, | |
'scheduler/dequeued': 944, | |
'scheduler/dequeued/memory': 944, | |
'scheduler/enqueued': 13668, | |
'scheduler/enqueued/memory': 13668, | |
'start_time': datetime.datetime(2018, 12, 4, 16, 20, 6, 150168)} | |
2018-12-04 16:20:47 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Broad Crawl' Iterations = '1' | |
Mean : 23.12196554851653 Median : 23.12196554851653 Std Dev : 0.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url=http://172.17.0.6:8880 bookworm in /home/nikita/ves/scrapy-bench-3.6/ | |
2018-12-03 16:57:36 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:57:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 16:57:36 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 16:57:36 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 16:57:36 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:57:36 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:57:36 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:57:36 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:57:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:57:36 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:57:39 [scrapy.extensions.logstats] INFO: Crawled 254 pages (at 5080 pages/min), scraped 197 items (at 3940 items/min) | |
2018-12-03 16:57:42 [scrapy.extensions.logstats] INFO: Crawled 493 pages (at 4780 pages/min), scraped 484 items (at 5740 items/min) | |
2018-12-03 16:57:45 [scrapy.extensions.logstats] INFO: Crawled 801 pages (at 6160 pages/min), scraped 779 items (at 5900 items/min) | |
2018-12-03 16:57:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:57:48 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 91.14893914218703 items/sec | |
2018-12-03 16:57:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 376212, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23726256, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 57, 48, 343718), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 51806208, | |
'memusage/startup': 51806208, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 57, 36, 506294)} | |
2018-12-03 16:57:48 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-12-03 16:57:48 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:57:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 16:57:48 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 16:57:48 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 16:57:48 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:57:48 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:57:48 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:57:48 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:57:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:57:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:57:52 [scrapy.extensions.logstats] INFO: Crawled 252 pages (at 5040 pages/min), scraped 205 items (at 4100 items/min) | |
2018-12-03 16:57:54 [scrapy.extensions.logstats] INFO: Crawled 523 pages (at 5420 pages/min), scraped 492 items (at 5740 items/min) | |
2018-12-03 16:57:58 [scrapy.extensions.logstats] INFO: Crawled 849 pages (at 6520 pages/min), scraped 784 items (at 5840 items/min) | |
2018-12-03 16:58:00 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:58:00 [scrapy.extensions.feedexport] INFO: Stored csv feed (1068 items) in: items.csv | |
The average speed of the spider is 90.72894556647711 items/sec | |
2018-12-03 16:58:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 372334, | |
'downloader/request_count': 1068, | |
'downloader/request_method_count/GET': 1068, | |
'downloader/response_bytes': 23480934, | |
'downloader/response_count': 1068, | |
'downloader/response_status_count/200': 1068, | |
'dupefilter/filtered': 14933, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 58, 0, 642647), | |
'item_scraped_count': 1068, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 51941376, | |
'memusage/startup': 51941376, | |
'request_depth_max': 9, | |
'response_received_count': 1068, | |
'scheduler/dequeued': 1068, | |
'scheduler/dequeued/memory': 1068, | |
'scheduler/enqueued': 1101, | |
'scheduler/enqueued/memory': 1101, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 57, 48, 914985)} | |
2018-12-03 16:58:00 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-12-03 16:58:01 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:58:01 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 16:58:01 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 16:58:01 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 16:58:01 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:58:01 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:58:01 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:58:01 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:58:01 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:58:01 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:58:04 [scrapy.extensions.logstats] INFO: Crawled 252 pages (at 5040 pages/min), scraped 202 items (at 4040 items/min) | |
2018-12-03 16:58:07 [scrapy.extensions.logstats] INFO: Crawled 500 pages (at 4960 pages/min), scraped 492 items (at 5800 items/min) | |
2018-12-03 16:58:10 [scrapy.extensions.logstats] INFO: Crawled 801 pages (at 6020 pages/min), scraped 789 items (at 5940 items/min) | |
2018-12-03 16:58:12 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:58:12 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv | |
The average speed of the spider is 88.62741130753513 items/sec | |
2018-12-03 16:58:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 368803, | |
'downloader/request_count': 1058, | |
'downloader/request_method_count/GET': 1058, | |
'downloader/response_bytes': 23293106, | |
'downloader/response_count': 1058, | |
'downloader/response_status_count/200': 1058, | |
'dupefilter/filtered': 14863, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 58, 12, 940998), | |
'item_scraped_count': 1058, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52318208, | |
'memusage/startup': 52318208, | |
'request_depth_max': 9, | |
'response_received_count': 1058, | |
'scheduler/dequeued': 1058, | |
'scheduler/dequeued/memory': 1058, | |
'scheduler/enqueued': 1081, | |
'scheduler/enqueued/memory': 1081, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 58, 1, 215937)} | |
2018-12-03 16:58:12 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-12-03 16:58:13 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:58:13 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 16:58:13 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 16:58:13 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 16:58:13 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:58:13 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:58:13 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:58:13 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:58:13 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:58:13 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:58:16 [scrapy.extensions.logstats] INFO: Crawled 253 pages (at 5060 pages/min), scraped 200 items (at 4000 items/min) | |
2018-12-03 16:58:19 [scrapy.extensions.logstats] INFO: Crawled 514 pages (at 5220 pages/min), scraped 492 items (at 5840 items/min) | |
2018-12-03 16:58:22 [scrapy.extensions.logstats] INFO: Crawled 842 pages (at 6560 pages/min), scraped 795 items (at 6060 items/min) | |
2018-12-03 16:58:24 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:58:25 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv | |
The average speed of the spider is 88.41912912771723 items/sec | |
2018-12-03 16:58:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 368726, | |
'downloader/request_count': 1057, | |
'downloader/request_method_count/GET': 1057, | |
'downloader/response_bytes': 23242113, | |
'downloader/response_count': 1057, | |
'downloader/response_status_count/200': 1057, | |
'dupefilter/filtered': 14790, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 58, 25, 40154), | |
'item_scraped_count': 1057, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 52146176, | |
'memusage/startup': 52146176, | |
'request_depth_max': 9, | |
'response_received_count': 1057, | |
'scheduler/dequeued': 1057, | |
'scheduler/dequeued/memory': 1057, | |
'scheduler/enqueued': 1080, | |
'scheduler/enqueued/memory': 1080, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 58, 13, 505983)} | |
2018-12-03 16:58:25 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-12-03 16:58:25 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 16:58:25 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 16:58:25 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 16:58:25 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 16:58:25 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 16:58:25 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 16:58:25 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 16:58:25 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 16:58:25 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 16:58:25 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 16:58:28 [scrapy.extensions.logstats] INFO: Crawled 251 pages (at 5020 pages/min), scraped 200 items (at 4000 items/min) | |
2018-12-03 16:58:31 [scrapy.extensions.logstats] INFO: Crawled 544 pages (at 5860 pages/min), scraped 478 items (at 5560 items/min) | |
2018-12-03 16:58:34 [scrapy.extensions.logstats] INFO: Crawled 791 pages (at 4940 pages/min), scraped 783 items (at 6100 items/min) | |
2018-12-03 16:58:36 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 16:58:37 [scrapy.extensions.feedexport] INFO: Stored csv feed (1077 items) in: items.csv | |
The average speed of the spider is 89.2150649171547 items/sec | |
2018-12-03 16:58:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 375586, | |
'downloader/request_count': 1077, | |
'downloader/request_method_count/GET': 1077, | |
'downloader/response_bytes': 23623787, | |
'downloader/response_count': 1077, | |
'downloader/response_status_count/200': 1077, | |
'dupefilter/filtered': 14950, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 16, 58, 37, 473215), | |
'item_scraped_count': 1077, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 51838976, | |
'memusage/startup': 51838976, | |
'request_depth_max': 8, | |
'response_received_count': 1077, | |
'scheduler/dequeued': 1077, | |
'scheduler/dequeued/memory': 1077, | |
'scheduler/enqueued': 1100, | |
'scheduler/enqueued/memory': 1100, | |
'start_time': datetime.datetime(2018, 12, 3, 16, 58, 25, 606346)} | |
2018-12-03 16:58:37 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 89.62789801221425 Median : 89.2150649171547 Std Dev : 1.1098106921958728 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 1 broadworm in /home/nikita/ves/scrapy-bench-3.6/ | |
2018-12-04 16:23:18 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: broadspider) | |
2018-12-04 16:23:18 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-04 16:23:18 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'broadspider', 'CLOSESPIDER_ITEMCOUNT': 800, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'broad.spiders', 'REACTOR_THREADPOOL_MAXSIZE': 20, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['broad.spiders']} | |
2018-12-04 16:23:18 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats', | |
'scrapy.extensions.throttle.AutoThrottle'] | |
2018-12-04 16:23:18 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-04 16:23:18 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-04 16:23:18 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-04 16:23:18 [scrapy.core.engine] INFO: Spider opened | |
2018-12-04 16:23:18 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-04 16:23:18 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-04 16:23:21 [scrapy.extensions.logstats] INFO: Crawled 49 pages (at 980 pages/min), scraped 48 items (at 960 items/min) | |
2018-12-04 16:23:24 [scrapy.extensions.logstats] INFO: Crawled 58 pages (at 180 pages/min), scraped 58 items (at 200 items/min) | |
2018-12-04 16:23:27 [scrapy.extensions.logstats] INFO: Crawled 77 pages (at 380 pages/min), scraped 75 items (at 340 items/min) | |
2018-12-04 16:23:30 [scrapy.extensions.logstats] INFO: Crawled 116 pages (at 780 pages/min), scraped 115 items (at 800 items/min) | |
2018-12-04 16:23:33 [scrapy.extensions.logstats] INFO: Crawled 194 pages (at 1560 pages/min), scraped 192 items (at 1540 items/min) | |
2018-12-04 16:23:36 [scrapy.extensions.logstats] INFO: Crawled 260 pages (at 1320 pages/min), scraped 258 items (at 1320 items/min) | |
2018-12-04 16:23:39 [scrapy.extensions.logstats] INFO: Crawled 310 pages (at 1000 pages/min), scraped 308 items (at 1000 items/min) | |
2018-12-04 16:23:42 [scrapy.extensions.logstats] INFO: Crawled 352 pages (at 840 pages/min), scraped 351 items (at 860 items/min) | |
2018-12-04 16:23:45 [scrapy.extensions.logstats] INFO: Crawled 384 pages (at 640 pages/min), scraped 384 items (at 660 items/min) | |
2018-12-04 16:23:48 [scrapy.extensions.logstats] INFO: Crawled 407 pages (at 460 pages/min), scraped 406 items (at 440 items/min) | |
2018-12-04 16:23:51 [scrapy.extensions.logstats] INFO: Crawled 428 pages (at 420 pages/min), scraped 427 items (at 420 items/min) | |
2018-12-04 16:23:54 [scrapy.extensions.logstats] INFO: Crawled 457 pages (at 580 pages/min), scraped 455 items (at 560 items/min) | |
2018-12-04 16:23:57 [scrapy.extensions.logstats] INFO: Crawled 489 pages (at 640 pages/min), scraped 487 items (at 640 items/min) | |
2018-12-04 16:24:00 [scrapy.extensions.logstats] INFO: Crawled 515 pages (at 520 pages/min), scraped 514 items (at 540 items/min) | |
2018-12-04 16:24:03 [scrapy.extensions.logstats] INFO: Crawled 536 pages (at 420 pages/min), scraped 535 items (at 420 items/min) | |
2018-12-04 16:24:06 [scrapy.extensions.logstats] INFO: Crawled 560 pages (at 480 pages/min), scraped 559 items (at 480 items/min) | |
2018-12-04 16:24:09 [scrapy.extensions.logstats] INFO: Crawled 588 pages (at 560 pages/min), scraped 586 items (at 540 items/min) | |
2018-12-04 16:24:12 [scrapy.extensions.logstats] INFO: Crawled 611 pages (at 460 pages/min), scraped 611 items (at 500 items/min) | |
2018-12-04 16:24:15 [scrapy.extensions.logstats] INFO: Crawled 635 pages (at 480 pages/min), scraped 635 items (at 480 items/min) | |
2018-12-04 16:24:18 [scrapy.extensions.logstats] INFO: Crawled 660 pages (at 500 pages/min), scraped 659 items (at 480 items/min) | |
2018-12-04 16:24:21 [scrapy.extensions.logstats] INFO: Crawled 679 pages (at 380 pages/min), scraped 678 items (at 380 items/min) | |
2018-12-04 16:24:24 [scrapy.extensions.logstats] INFO: Crawled 684 pages (at 100 pages/min), scraped 684 items (at 120 items/min) | |
2018-12-04 16:24:27 [scrapy.extensions.logstats] INFO: Crawled 696 pages (at 240 pages/min), scraped 694 items (at 200 items/min) | |
2018-12-04 16:24:30 [scrapy.extensions.logstats] INFO: Crawled 701 pages (at 100 pages/min), scraped 700 items (at 120 items/min) | |
2018-12-04 16:24:33 [scrapy.extensions.logstats] INFO: Crawled 705 pages (at 80 pages/min), scraped 705 items (at 100 items/min) | |
2018-12-04 16:24:36 [scrapy.extensions.logstats] INFO: Crawled 720 pages (at 300 pages/min), scraped 719 items (at 280 items/min) | |
2018-12-04 16:24:39 [scrapy.extensions.logstats] INFO: Crawled 740 pages (at 400 pages/min), scraped 739 items (at 400 items/min) | |
2018-12-04 16:24:42 [scrapy.extensions.logstats] INFO: Crawled 762 pages (at 440 pages/min), scraped 762 items (at 460 items/min) | |
2018-12-04 16:24:45 [scrapy.extensions.logstats] INFO: Crawled 770 pages (at 160 pages/min), scraped 770 items (at 160 items/min) | |
2018-12-04 16:24:48 [scrapy.extensions.logstats] INFO: Crawled 771 pages (at 20 pages/min), scraped 771 items (at 20 items/min) | |
2018-12-04 16:24:51 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 80 pages/min), scraped 775 items (at 80 items/min) | |
2018-12-04 16:24:54 [scrapy.extensions.logstats] INFO: Crawled 777 pages (at 40 pages/min), scraped 777 items (at 40 items/min) | |
2018-12-04 16:24:57 [scrapy.extensions.logstats] INFO: Crawled 780 pages (at 60 pages/min), scraped 780 items (at 60 items/min) | |
2018-12-04 16:25:00 [scrapy.extensions.logstats] INFO: Crawled 786 pages (at 120 pages/min), scraped 786 items (at 120 items/min) | |
2018-12-04 16:25:03 [scrapy.extensions.logstats] INFO: Crawled 789 pages (at 60 pages/min), scraped 789 items (at 60 items/min) | |
2018-12-04 16:25:06 [scrapy.extensions.logstats] INFO: Crawled 794 pages (at 100 pages/min), scraped 794 items (at 100 items/min) | |
2018-12-04 16:25:09 [scrapy.extensions.logstats] INFO: Crawled 796 pages (at 40 pages/min), scraped 796 items (at 40 items/min) | |
2018-12-04 16:25:12 [scrapy.extensions.logstats] INFO: Crawled 798 pages (at 40 pages/min), scraped 798 items (at 40 items/min) | |
2018-12-04 16:25:13 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-04 16:25:15 [scrapy.extensions.logstats] INFO: Crawled 802 pages (at 80 pages/min), scraped 802 items (at 80 items/min) | |
2018-12-04 16:25:18 [scrapy.extensions.logstats] INFO: Crawled 806 pages (at 80 pages/min), scraped 806 items (at 80 items/min) | |
2018-12-04 16:25:21 [scrapy.extensions.logstats] INFO: Crawled 808 pages (at 40 pages/min), scraped 808 items (at 40 items/min) | |
2018-12-04 16:25:24 [scrapy.extensions.logstats] INFO: Crawled 810 pages (at 40 pages/min), scraped 810 items (at 40 items/min) | |
2018-12-04 16:25:27 [scrapy.extensions.logstats] INFO: Crawled 814 pages (at 80 pages/min), scraped 813 items (at 60 items/min) | |
2018-12-04 16:25:30 [scrapy.extensions.logstats] INFO: Crawled 816 pages (at 40 pages/min), scraped 816 items (at 60 items/min) | |
2018-12-04 16:25:33 [scrapy.extensions.logstats] INFO: Crawled 817 pages (at 20 pages/min), scraped 817 items (at 20 items/min) | |
2018-12-04 16:25:36 [scrapy.extensions.logstats] INFO: Crawled 819 pages (at 40 pages/min), scraped 819 items (at 40 items/min) | |
2018-12-04 16:25:39 [scrapy.extensions.logstats] INFO: Crawled 821 pages (at 40 pages/min), scraped 821 items (at 40 items/min) | |
2018-12-04 16:25:42 [scrapy.extensions.logstats] INFO: Crawled 824 pages (at 60 pages/min), scraped 823 items (at 40 items/min) | |
2018-12-04 16:25:45 [scrapy.extensions.logstats] INFO: Crawled 827 pages (at 60 pages/min), scraped 827 items (at 80 items/min) | |
2018-12-04 16:25:48 [scrapy.extensions.logstats] INFO: Crawled 828 pages (at 20 pages/min), scraped 828 items (at 20 items/min) | |
2018-12-04 16:25:51 [scrapy.extensions.logstats] INFO: Crawled 832 pages (at 80 pages/min), scraped 832 items (at 80 items/min) | |
2018-12-04 16:25:54 [scrapy.extensions.logstats] INFO: Crawled 834 pages (at 40 pages/min), scraped 834 items (at 40 items/min) | |
2018-12-04 16:25:57 [scrapy.extensions.logstats] INFO: Crawled 838 pages (at 80 pages/min), scraped 838 items (at 80 items/min) | |
2018-12-04 16:26:00 [scrapy.extensions.logstats] INFO: Crawled 839 pages (at 20 pages/min), scraped 839 items (at 20 items/min) | |
2018-12-04 16:26:03 [scrapy.extensions.logstats] INFO: Crawled 840 pages (at 20 pages/min), scraped 840 items (at 20 items/min) | |
2018-12-04 16:26:06 [scrapy.extensions.logstats] INFO: Crawled 844 pages (at 80 pages/min), scraped 844 items (at 80 items/min) | |
2018-12-04 16:26:09 [scrapy.extensions.logstats] INFO: Crawled 847 pages (at 60 pages/min), scraped 847 items (at 60 items/min) | |
2018-12-04 16:26:12 [scrapy.extensions.logstats] INFO: Crawled 847 pages (at 0 pages/min), scraped 847 items (at 0 items/min) | |
2018-12-04 16:26:15 [scrapy.extensions.logstats] INFO: Crawled 849 pages (at 40 pages/min), scraped 849 items (at 40 items/min) | |
2018-12-04 16:26:18 [scrapy.extensions.logstats] INFO: Crawled 850 pages (at 20 pages/min), scraped 850 items (at 20 items/min) | |
2018-12-04 16:26:21 [scrapy.extensions.logstats] INFO: Crawled 856 pages (at 120 pages/min), scraped 855 items (at 100 items/min) | |
2018-12-04 16:26:24 [scrapy.extensions.logstats] INFO: Crawled 856 pages (at 0 pages/min), scraped 856 items (at 20 items/min) | |
2018-12-04 16:26:27 [scrapy.extensions.logstats] INFO: Crawled 857 pages (at 20 pages/min), scraped 857 items (at 20 items/min) | |
2018-12-04 16:26:30 [scrapy.extensions.logstats] INFO: Crawled 858 pages (at 20 pages/min), scraped 858 items (at 20 items/min) | |
2018-12-04 16:26:33 [scrapy.extensions.logstats] INFO: Crawled 858 pages (at 0 pages/min), scraped 858 items (at 0 items/min) | |
2018-12-04 16:26:36 [scrapy.extensions.logstats] INFO: Crawled 861 pages (at 60 pages/min), scraped 861 items (at 60 items/min) | |
2018-12-04 16:26:39 [scrapy.extensions.logstats] INFO: Crawled 863 pages (at 40 pages/min), scraped 863 items (at 40 items/min) | |
2018-12-04 16:26:42 [scrapy.extensions.logstats] INFO: Crawled 865 pages (at 40 pages/min), scraped 865 items (at 40 items/min) | |
2018-12-04 16:26:45 [scrapy.extensions.logstats] INFO: Crawled 865 pages (at 0 pages/min), scraped 865 items (at 0 items/min) | |
2018-12-04 16:26:48 [scrapy.extensions.logstats] INFO: Crawled 867 pages (at 40 pages/min), scraped 867 items (at 40 items/min) | |
2018-12-04 16:26:51 [scrapy.extensions.logstats] INFO: Crawled 869 pages (at 40 pages/min), scraped 869 items (at 40 items/min) | |
2018-12-04 16:26:54 [scrapy.extensions.logstats] INFO: Crawled 870 pages (at 20 pages/min), scraped 870 items (at 20 items/min) | |
2018-12-04 16:26:57 [scrapy.extensions.logstats] INFO: Crawled 871 pages (at 20 pages/min), scraped 871 items (at 20 items/min) | |
2018-12-04 16:27:00 [scrapy.extensions.logstats] INFO: Crawled 872 pages (at 20 pages/min), scraped 872 items (at 20 items/min) | |
2018-12-04 16:27:03 [scrapy.extensions.logstats] INFO: Crawled 873 pages (at 20 pages/min), scraped 873 items (at 20 items/min) | |
2018-12-04 16:27:06 [scrapy.extensions.logstats] INFO: Crawled 873 pages (at 0 pages/min), scraped 873 items (at 0 items/min) | |
2018-12-04 16:27:09 [scrapy.extensions.logstats] INFO: Crawled 875 pages (at 40 pages/min), scraped 875 items (at 40 items/min) | |
2018-12-04 16:27:12 [scrapy.extensions.logstats] INFO: Crawled 877 pages (at 40 pages/min), scraped 877 items (at 40 items/min) | |
2018-12-04 16:27:15 [scrapy.extensions.logstats] INFO: Crawled 877 pages (at 0 pages/min), scraped 877 items (at 0 items/min) | |
2018-12-04 16:27:18 [scrapy.extensions.logstats] INFO: Crawled 878 pages (at 20 pages/min), scraped 877 items (at 0 items/min) | |
2018-12-04 16:27:21 [scrapy.extensions.logstats] INFO: Crawled 878 pages (at 0 pages/min), scraped 878 items (at 20 items/min) | |
2018-12-04 16:27:24 [scrapy.extensions.logstats] INFO: Crawled 879 pages (at 20 pages/min), scraped 879 items (at 20 items/min) | |
2018-12-04 16:27:27 [scrapy.extensions.logstats] INFO: Crawled 881 pages (at 40 pages/min), scraped 881 items (at 40 items/min) | |
2018-12-04 16:27:30 [scrapy.extensions.logstats] INFO: Crawled 881 pages (at 0 pages/min), scraped 881 items (at 0 items/min) | |
2018-12-04 16:27:33 [scrapy.extensions.logstats] INFO: Crawled 883 pages (at 40 pages/min), scraped 883 items (at 40 items/min) | |
2018-12-04 16:27:36 [scrapy.extensions.logstats] INFO: Crawled 884 pages (at 20 pages/min), scraped 884 items (at 20 items/min) | |
2018-12-04 16:27:39 [scrapy.extensions.logstats] INFO: Crawled 886 pages (at 40 pages/min), scraped 886 items (at 40 items/min) | |
2018-12-04 16:27:42 [scrapy.extensions.logstats] INFO: Crawled 887 pages (at 20 pages/min), scraped 887 items (at 20 items/min) | |
2018-12-04 16:27:45 [scrapy.extensions.logstats] INFO: Crawled 888 pages (at 20 pages/min), scraped 888 items (at 20 items/min) | |
2018-12-04 16:27:48 [scrapy.extensions.logstats] INFO: Crawled 888 pages (at 0 pages/min), scraped 888 items (at 0 items/min) | |
2018-12-04 16:27:51 [scrapy.extensions.logstats] INFO: Crawled 888 pages (at 0 pages/min), scraped 888 items (at 0 items/min) | |
2018-12-04 16:27:54 [scrapy.extensions.logstats] INFO: Crawled 891 pages (at 60 pages/min), scraped 891 items (at 60 items/min) | |
2018-12-04 16:27:57 [scrapy.extensions.logstats] INFO: Crawled 892 pages (at 20 pages/min), scraped 892 items (at 20 items/min) | |
2018-12-04 16:28:00 [scrapy.extensions.logstats] INFO: Crawled 893 pages (at 20 pages/min), scraped 893 items (at 20 items/min) | |
2018-12-04 16:28:03 [scrapy.extensions.logstats] INFO: Crawled 895 pages (at 40 pages/min), scraped 895 items (at 40 items/min) | |
2018-12-04 16:28:06 [scrapy.extensions.logstats] INFO: Crawled 896 pages (at 20 pages/min), scraped 896 items (at 20 items/min) | |
2018-12-04 16:28:09 [scrapy.extensions.logstats] INFO: Crawled 898 pages (at 40 pages/min), scraped 898 items (at 40 items/min) | |
2018-12-04 16:28:12 [scrapy.extensions.logstats] INFO: Crawled 899 pages (at 20 pages/min), scraped 899 items (at 20 items/min) | |
2018-12-04 16:28:15 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 20 pages/min), scraped 900 items (at 20 items/min) | |
2018-12-04 16:28:18 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 0 pages/min), scraped 900 items (at 0 items/min) | |
2018-12-04 16:28:21 [scrapy.extensions.logstats] INFO: Crawled 901 pages (at 20 pages/min), scraped 901 items (at 20 items/min) | |
2018-12-04 16:28:24 [scrapy.extensions.logstats] INFO: Crawled 902 pages (at 20 pages/min), scraped 902 items (at 20 items/min) | |
2018-12-04 16:28:27 [scrapy.extensions.logstats] INFO: Crawled 902 pages (at 0 pages/min), scraped 902 items (at 0 items/min) | |
2018-12-04 16:28:30 [scrapy.extensions.logstats] INFO: Crawled 902 pages (at 0 pages/min), scraped 902 items (at 0 items/min) | |
2018-12-04 16:28:33 [scrapy.extensions.logstats] INFO: Crawled 905 pages (at 60 pages/min), scraped 905 items (at 60 items/min) | |
2018-12-04 16:28:36 [scrapy.extensions.logstats] INFO: Crawled 905 pages (at 0 pages/min), scraped 905 items (at 0 items/min) | |
2018-12-04 16:28:39 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 20 pages/min), scraped 906 items (at 20 items/min) | |
2018-12-04 16:28:42 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 20 pages/min), scraped 907 items (at 20 items/min) | |
2018-12-04 16:28:45 [scrapy.extensions.logstats] INFO: Crawled 908 pages (at 20 pages/min), scraped 908 items (at 20 items/min) | |
2018-12-04 16:28:48 [scrapy.extensions.logstats] INFO: Crawled 910 pages (at 40 pages/min), scraped 910 items (at 40 items/min) | |
2018-12-04 16:28:51 [scrapy.extensions.logstats] INFO: Crawled 910 pages (at 0 pages/min), scraped 910 items (at 0 items/min) | |
2018-12-04 16:28:54 [scrapy.extensions.logstats] INFO: Crawled 910 pages (at 0 pages/min), scraped 910 items (at 0 items/min) | |
2018-12-04 16:28:57 [scrapy.extensions.logstats] INFO: Crawled 911 pages (at 20 pages/min), scraped 911 items (at 20 items/min) | |
2018-12-04 16:29:00 [scrapy.extensions.logstats] INFO: Crawled 911 pages (at 0 pages/min), scraped 911 items (at 0 items/min) | |
2018-12-04 16:29:03 [scrapy.extensions.logstats] INFO: Crawled 912 pages (at 20 pages/min), scraped 912 items (at 20 items/min) | |
2018-12-04 16:29:06 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 20 pages/min), scraped 913 items (at 20 items/min) | |
2018-12-04 16:29:09 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 0 pages/min), scraped 913 items (at 0 items/min) | |
2018-12-04 16:29:12 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 0 pages/min), scraped 913 items (at 0 items/min) | |
2018-12-04 16:29:15 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 0 pages/min), scraped 913 items (at 0 items/min) | |
2018-12-04 16:29:18 [scrapy.extensions.logstats] INFO: Crawled 914 pages (at 20 pages/min), scraped 914 items (at 20 items/min) | |
2018-12-04 16:29:21 [scrapy.extensions.logstats] INFO: Crawled 914 pages (at 0 pages/min), scraped 914 items (at 0 items/min) | |
2018-12-04 16:29:24 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 20 pages/min), scraped 915 items (at 20 items/min) | |
2018-12-04 16:29:27 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 0 pages/min), scraped 915 items (at 0 items/min) | |
2018-12-04 16:29:30 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 0 pages/min), scraped 915 items (at 0 items/min) | |
2018-12-04 16:29:33 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 20 pages/min), scraped 916 items (at 20 items/min) | |
2018-12-04 16:29:36 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 0 pages/min), scraped 916 items (at 0 items/min) | |
2018-12-04 16:29:39 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 0 pages/min), scraped 916 items (at 0 items/min) | |
2018-12-04 16:29:42 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 0 pages/min), scraped 916 items (at 0 items/min) | |
2018-12-04 16:29:45 [scrapy.extensions.logstats] INFO: Crawled 918 pages (at 40 pages/min), scraped 918 items (at 40 items/min) | |
2018-12-04 16:29:48 [scrapy.extensions.logstats] INFO: Crawled 918 pages (at 0 pages/min), scraped 918 items (at 0 items/min) | |
2018-12-04 16:29:50 [scrapy.extensions.feedexport] INFO: Stored csv feed (920 items) in: items.csv | |
The average speed of the spider is 2.341064071191682 items/sec | |
2018-12-04 16:29:50 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 299263, | |
'downloader/request_count': 920, | |
'downloader/request_method_count/GET': 920, | |
'downloader/response_bytes': 26395440, | |
'downloader/response_count': 920, | |
'downloader/response_status_count/200': 920, | |
'dupefilter/filtered': 18812, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 4, 16, 29, 50, 879085), | |
'item_scraped_count': 920, | |
'log_count/INFO': 139, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/BroadBenchSpider': 1, | |
'memdebug/live_refs/Request': 6882, | |
'memusage/max': 100548608, | |
'memusage/startup': 52174848, | |
'request_depth_max': 51, | |
'response_received_count': 920, | |
'scheduler/dequeued': 920, | |
'scheduler/dequeued/memory': 920, | |
'scheduler/enqueued': 7801, | |
'scheduler/enqueued/memory': 7801, | |
'start_time': datetime.datetime(2018, 12, 4, 16, 23, 18, 317620)} | |
2018-12-04 16:29:50 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Broad Crawl' Iterations = '1' | |
Mean : 2.341064071191682 Median : 2.341064071191682 Std Dev : 0.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Executing scrapy-bench --n-runs 5 --book_url=http://172.17.0.6:8880 bookworm in /home/nikita/ves/scrapy-bench-3.6/ | |
2018-12-03 17:00:23 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 17:00:23 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 17:00:23 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 17:00:23 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 17:00:23 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 17:00:23 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 17:00:23 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 17:00:23 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 17:00:23 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 17:00:23 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 17:00:26 [scrapy.extensions.logstats] INFO: Crawled 260 pages (at 5200 pages/min), scraped 201 items (at 4020 items/min) | |
2018-12-03 17:00:29 [scrapy.extensions.logstats] INFO: Crawled 554 pages (at 5880 pages/min), scraped 492 items (at 5820 items/min) | |
2018-12-03 17:00:32 [scrapy.extensions.logstats] INFO: Crawled 806 pages (at 5040 pages/min), scraped 801 items (at 6180 items/min) | |
2018-12-03 17:00:34 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 17:00:35 [scrapy.extensions.feedexport] INFO: Stored csv feed (1067 items) in: items.csv | |
The average speed of the spider is 90.25091217578672 items/sec | |
2018-12-03 17:00:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 371877, | |
'downloader/request_count': 1067, | |
'downloader/request_method_count/GET': 1067, | |
'downloader/response_bytes': 23429941, | |
'downloader/response_count': 1067, | |
'downloader/response_status_count/200': 1067, | |
'dupefilter/filtered': 14860, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 17, 0, 35, 160678), | |
'item_scraped_count': 1067, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 51843072, | |
'memusage/startup': 51843072, | |
'request_depth_max': 8, | |
'response_received_count': 1067, | |
'scheduler/dequeued': 1067, | |
'scheduler/dequeued/memory': 1067, | |
'scheduler/enqueued': 1100, | |
'scheduler/enqueued/memory': 1100, | |
'start_time': datetime.datetime(2018, 12, 3, 17, 0, 23, 505147)} | |
2018-12-03 17:00:35 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-12-03 17:00:35 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 17:00:35 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 17:00:35 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 17:00:35 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 17:00:35 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 17:00:35 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 17:00:35 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 17:00:35 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 17:00:35 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 17:00:35 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 17:00:39 [scrapy.extensions.logstats] INFO: Crawled 244 pages (at 4880 pages/min), scraped 206 items (at 4120 items/min) | |
2018-12-03 17:00:41 [scrapy.extensions.logstats] INFO: Crawled 545 pages (at 6020 pages/min), scraped 482 items (at 5520 items/min) | |
2018-12-03 17:00:44 [scrapy.extensions.logstats] INFO: Crawled 804 pages (at 5180 pages/min), scraped 795 items (at 6260 items/min) | |
2018-12-03 17:00:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 17:00:47 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 89.73449994942929 items/sec | |
2018-12-03 17:00:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 376063, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23726256, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 17, 0, 47, 623049), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 51810304, | |
'memusage/startup': 51810304, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 12, 3, 17, 0, 35, 728579)} | |
2018-12-03 17:00:47 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-12-03 17:00:48 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 17:00:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 17:00:48 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 17:00:48 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 17:00:48 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 17:00:48 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 17:00:48 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 17:00:48 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 17:00:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 17:00:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 17:00:51 [scrapy.extensions.logstats] INFO: Crawled 256 pages (at 5120 pages/min), scraped 204 items (at 4080 items/min) | |
2018-12-03 17:00:54 [scrapy.extensions.logstats] INFO: Crawled 547 pages (at 5820 pages/min), scraped 489 items (at 5700 items/min) | |
2018-12-03 17:00:57 [scrapy.extensions.logstats] INFO: Crawled 823 pages (at 5520 pages/min), scraped 794 items (at 6100 items/min) | |
2018-12-03 17:00:59 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 17:00:59 [scrapy.extensions.feedexport] INFO: Stored csv feed (1069 items) in: items.csv | |
The average speed of the spider is 90.40642489479762 items/sec | |
2018-12-03 17:00:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 372636, | |
'downloader/request_count': 1069, | |
'downloader/request_method_count/GET': 1069, | |
'downloader/response_bytes': 23532410, | |
'downloader/response_count': 1069, | |
'downloader/response_status_count/200': 1069, | |
'dupefilter/filtered': 15006, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 17, 0, 59, 844157), | |
'item_scraped_count': 1069, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 34, | |
'memusage/max': 51777536, | |
'memusage/startup': 51777536, | |
'request_depth_max': 9, | |
'response_received_count': 1069, | |
'scheduler/dequeued': 1069, | |
'scheduler/dequeued/memory': 1069, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 12, 3, 17, 0, 48, 202209)} | |
2018-12-03 17:00:59 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-12-03 17:01:00 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 17:01:00 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 17:01:00 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 17:01:00 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 17:01:00 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 17:01:00 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 17:01:00 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 17:01:00 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 17:01:00 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 17:01:00 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 17:01:03 [scrapy.extensions.logstats] INFO: Crawled 251 pages (at 5020 pages/min), scraped 194 items (at 3880 items/min) | |
2018-12-03 17:01:06 [scrapy.extensions.logstats] INFO: Crawled 532 pages (at 5620 pages/min), scraped 485 items (at 5820 items/min) | |
2018-12-03 17:01:09 [scrapy.extensions.logstats] INFO: Crawled 842 pages (at 6200 pages/min), scraped 784 items (at 5980 items/min) | |
2018-12-03 17:01:11 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 17:01:12 [scrapy.extensions.feedexport] INFO: Stored csv feed (1078 items) in: items.csv | |
The average speed of the spider is 88.46023391045921 items/sec | |
2018-12-03 17:01:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 375744, | |
'downloader/request_count': 1078, | |
'downloader/request_method_count/GET': 1078, | |
'downloader/response_bytes': 23674780, | |
'downloader/response_count': 1078, | |
'downloader/response_status_count/200': 1078, | |
'dupefilter/filtered': 15023, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 17, 1, 12, 204779), | |
'item_scraped_count': 1078, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 51875840, | |
'memusage/startup': 51875840, | |
'request_depth_max': 9, | |
'response_received_count': 1078, | |
'scheduler/dequeued': 1078, | |
'scheduler/dequeued/memory': 1078, | |
'scheduler/enqueued': 1101, | |
'scheduler/enqueued/memory': 1101, | |
'start_time': datetime.datetime(2018, 12, 3, 17, 1, 0, 413734)} | |
2018-12-03 17:01:12 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
2018-12-03 17:01:12 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books) | |
2018-12-03 17:01:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid | |
2018-12-03 17:01:12 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']} | |
2018-12-03 17:01:12 [scrapy.middleware] INFO: Enabled extensions: | |
['scrapy.extensions.corestats.CoreStats', | |
'scrapy.extensions.telnet.TelnetConsole', | |
'scrapy.extensions.memusage.MemoryUsage', | |
'scrapy.extensions.memdebug.MemoryDebugger', | |
'scrapy.extensions.closespider.CloseSpider', | |
'scrapy.extensions.feedexport.FeedExporter', | |
'scrapy.extensions.logstats.LogStats'] | |
2018-12-03 17:01:12 [scrapy.middleware] INFO: Enabled downloader middlewares: | |
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', | |
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', | |
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', | |
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', | |
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', | |
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', | |
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', | |
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', | |
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', | |
'scrapy.downloadermiddlewares.stats.DownloaderStats'] | |
2018-12-03 17:01:12 [scrapy.middleware] INFO: Enabled spider middlewares: | |
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', | |
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', | |
'scrapy.spidermiddlewares.referer.RefererMiddleware', | |
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', | |
'scrapy.spidermiddlewares.depth.DepthMiddleware'] | |
2018-12-03 17:01:12 [scrapy.middleware] INFO: Enabled item pipelines: | |
[] | |
2018-12-03 17:01:12 [scrapy.core.engine] INFO: Spider opened | |
2018-12-03 17:01:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) | |
2018-12-03 17:01:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024 | |
2018-12-03 17:01:16 [scrapy.extensions.logstats] INFO: Crawled 257 pages (at 5140 pages/min), scraped 204 items (at 4080 items/min) | |
2018-12-03 17:01:18 [scrapy.extensions.logstats] INFO: Crawled 493 pages (at 4720 pages/min), scraped 485 items (at 5620 items/min) | |
2018-12-03 17:01:21 [scrapy.extensions.logstats] INFO: Crawled 835 pages (at 6840 pages/min), scraped 781 items (at 5920 items/min) | |
2018-12-03 17:01:24 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount) | |
2018-12-03 17:01:24 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv | |
The average speed of the spider is 91.11756327055262 items/sec | |
2018-12-03 17:01:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats: | |
{'downloader/request_bytes': 376063, | |
'downloader/request_count': 1079, | |
'downloader/request_method_count/GET': 1079, | |
'downloader/response_bytes': 23726256, | |
'downloader/response_count': 1079, | |
'downloader/response_status_count/200': 1079, | |
'dupefilter/filtered': 15096, | |
'finish_reason': 'closespider_itemcount', | |
'finish_time': datetime.datetime(2018, 12, 3, 17, 1, 24, 581383), | |
'item_scraped_count': 1079, | |
'log_count/INFO': 12, | |
'memdebug/gc_garbage_count': 0, | |
'memdebug/live_refs/FollowAllSpider': 1, | |
'memdebug/live_refs/Request': 24, | |
'memusage/max': 51843072, | |
'memusage/startup': 51843072, | |
'request_depth_max': 9, | |
'response_received_count': 1079, | |
'scheduler/dequeued': 1079, | |
'scheduler/dequeued/memory': 1079, | |
'scheduler/enqueued': 1102, | |
'scheduler/enqueued/memory': 1102, | |
'start_time': datetime.datetime(2018, 12, 3, 17, 1, 12, 773580)} | |
2018-12-03 17:01:24 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount) | |
The results of the benchmark are (all speeds in items/sec) : | |
Test = 'Book Spider' Iterations = '5' | |
Mean : 89.99392684020509 Median : 90.25091217578672 Std Dev : 0.8852424808786434 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment