Skip to content

Instantly share code, notes, and snippets.

@whalebot-helmsman
Last active December 5, 2018 10:19
Show Gist options
  • Save whalebot-helmsman/83ed45a1aa1b4221ab6be2f36226c5e3 to your computer and use it in GitHub Desktop.
Save whalebot-helmsman/83ed45a1aa1b4221ab6be2f36226c5e3 to your computer and use it in GitHub Desktop.
Results of scrapy-bench for new priority queues
Executing scrapy-bench --n-runs 1 broadworm in /home/nikita/ves/scrapy-bench-2.7/
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-04 16:16:22 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: broadspider)
2018-12-04 16:16:22 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-04 16:16:22 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'broad.spiders', 'CLOSESPIDER_ITEMCOUNT': 800, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['broad.spiders'], 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'broadspider', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'AUTOTHROTTLE_ENABLED': True}
2018-12-04 16:16:22 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.throttle.AutoThrottle']
2018-12-04 16:16:22 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-04 16:16:22 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-04 16:16:22 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-04 16:16:22 [scrapy.core.engine] INFO: Spider opened
2018-12-04 16:16:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-04 16:16:22 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-04 16:16:25 [scrapy.extensions.logstats] INFO: Crawled 53 pages (at 1060 pages/min), scraped 51 items (at 1020 items/min)
2018-12-04 16:16:28 [scrapy.extensions.logstats] INFO: Crawled 73 pages (at 400 pages/min), scraped 72 items (at 420 items/min)
2018-12-04 16:16:31 [scrapy.extensions.logstats] INFO: Crawled 133 pages (at 1200 pages/min), scraped 129 items (at 1140 items/min)
2018-12-04 16:16:34 [scrapy.extensions.logstats] INFO: Crawled 299 pages (at 3320 pages/min), scraped 280 items (at 3020 items/min)
2018-12-04 16:16:38 [scrapy.extensions.logstats] INFO: Crawled 447 pages (at 2960 pages/min), scraped 416 items (at 2720 items/min)
2018-12-04 16:16:41 [scrapy.extensions.logstats] INFO: Crawled 566 pages (at 2380 pages/min), scraped 534 items (at 2360 items/min)
2018-12-04 16:16:44 [scrapy.extensions.logstats] INFO: Crawled 701 pages (at 2700 pages/min), scraped 674 items (at 2800 items/min)
2018-12-04 16:16:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-04 16:16:47 [scrapy.extensions.logstats] INFO: Crawled 850 pages (at 2980 pages/min), scraped 809 items (at 2700 items/min)
2018-12-04 16:16:49 [scrapy.extensions.logstats] INFO: Crawled 887 pages (at 740 pages/min), scraped 885 items (at 1520 items/min)
2018-12-04 16:16:52 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 580 pages/min), scraped 915 items (at 600 items/min)
2018-12-04 16:16:55 [scrapy.extensions.logstats] INFO: Crawled 932 pages (at 320 pages/min), scraped 932 items (at 340 items/min)
2018-12-04 16:16:58 [scrapy.extensions.logstats] INFO: Crawled 937 pages (at 100 pages/min), scraped 937 items (at 100 items/min)
2018-12-04 16:17:01 [scrapy.extensions.logstats] INFO: Crawled 938 pages (at 20 pages/min), scraped 938 items (at 20 items/min)
2018-12-04 16:17:04 [scrapy.extensions.logstats] INFO: Crawled 938 pages (at 0 pages/min), scraped 938 items (at 0 items/min)
2018-12-04 16:17:06 [scrapy.extensions.feedexport] INFO: Stored csv feed (939 items) in: items.csv
The average speed of the spider is 21.50302623 items/sec
2018-12-04 16:17:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 290153,
'downloader/request_count': 939,
'downloader/request_method_count/GET': 939,
'downloader/response_bytes': 31931235,
'downloader/response_count': 939,
'downloader/response_status_count/200': 939,
'dupefilter/filtered': 24822,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 4, 16, 17, 6, 540783),
'item_scraped_count': 939,
'log_count/INFO': 23,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/BroadBenchSpider': 1,
'memdebug/live_refs/Request': 11515,
'memusage/max': 52441088,
'memusage/startup': 52441088,
'request_depth_max': 18,
'response_received_count': 939,
'scheduler/dequeued': 939,
'scheduler/dequeued/memory': 939,
'scheduler/enqueued': 12453,
'scheduler/enqueued/memory': 12453,
'start_time': datetime.datetime(2018, 12, 4, 16, 16, 22, 840560)}
2018-12-04 16:17:06 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Broad Crawl' Iterations = '1'
Mean : 21.50302623 Median : 21.50302623 Std Dev : 0.0
Executing scrapy-bench --n-runs 5 --book_url=http://172.17.0.6:8880 bookworm in /home/nikita/ves/scrapy-bench-2.7/
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:56:26 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:56:26 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:56:26 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-12-03 16:56:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:56:26 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:56:26 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:56:26 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:56:26 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:56:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:56:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:56:29 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 261 items (at 5220 items/min)
2018-12-03 16:56:32 [scrapy.extensions.logstats] INFO: Crawled 586 pages (at 6300 pages/min), scraped 578 items (at 6340 items/min)
2018-12-03 16:56:35 [scrapy.extensions.logstats] INFO: Crawled 930 pages (at 6880 pages/min), scraped 919 items (at 6820 items/min)
2018-12-03 16:56:36 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:56:36 [scrapy.extensions.feedexport] INFO: Stored csv feed (1069 items) in: items.csv
The average speed of the spider is 99.6834756462 items/sec
2018-12-03 16:56:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 372557,
'downloader/request_count': 1069,
'downloader/request_method_count/GET': 1069,
'downloader/response_bytes': 23532410,
'downloader/response_count': 1069,
'downloader/response_status_count/200': 1069,
'dupefilter/filtered': 15006,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 56, 36, 735176),
'item_scraped_count': 1069,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 52449280,
'memusage/startup': 52449280,
'request_depth_max': 9,
'response_received_count': 1069,
'scheduler/dequeued': 1069,
'scheduler/dequeued/memory': 1069,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 12, 3, 16, 56, 26, 355559)}
2018-12-03 16:56:36 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:56:37 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:56:37 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:56:37 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-12-03 16:56:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:56:37 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:56:37 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:56:37 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:56:37 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:56:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:56:37 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:56:40 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 263 items (at 5260 items/min)
2018-12-03 16:56:43 [scrapy.extensions.logstats] INFO: Crawled 605 pages (at 6680 pages/min), scraped 595 items (at 6640 items/min)
2018-12-03 16:56:46 [scrapy.extensions.logstats] INFO: Crawled 949 pages (at 6880 pages/min), scraped 939 items (at 6880 items/min)
2018-12-03 16:56:46 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:56:47 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv
The average speed of the spider is 102.718869713 items/sec
2018-12-03 16:56:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 368839,
'downloader/request_count': 1058,
'downloader/request_method_count/GET': 1058,
'downloader/response_bytes': 23293106,
'downloader/response_count': 1058,
'downloader/response_status_count/200': 1058,
'dupefilter/filtered': 14863,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 56, 47, 285942),
'item_scraped_count': 1058,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52391936,
'memusage/startup': 52391936,
'request_depth_max': 9,
'response_received_count': 1058,
'scheduler/dequeued': 1058,
'scheduler/dequeued/memory': 1058,
'scheduler/enqueued': 1081,
'scheduler/enqueued/memory': 1081,
'start_time': datetime.datetime(2018, 12, 3, 16, 56, 37, 53618)}
2018-12-03 16:56:47 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:56:47 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:56:47 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:56:47 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-12-03 16:56:47 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:56:47 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:56:47 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:56:47 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:56:47 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:56:47 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:56:47 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:56:50 [scrapy.extensions.logstats] INFO: Crawled 267 pages (at 5340 pages/min), scraped 256 items (at 5120 items/min)
2018-12-03 16:56:53 [scrapy.extensions.logstats] INFO: Crawled 596 pages (at 6580 pages/min), scraped 585 items (at 6580 items/min)
2018-12-03 16:56:56 [scrapy.extensions.logstats] INFO: Crawled 957 pages (at 7220 pages/min), scraped 918 items (at 6660 items/min)
2018-12-03 16:56:57 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:56:57 [scrapy.extensions.feedexport] INFO: Stored csv feed (1068 items) in: items.csv
The average speed of the spider is 102.542592385 items/sec
2018-12-03 16:56:57 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 372248,
'downloader/request_count': 1068,
'downloader/request_method_count/GET': 1068,
'downloader/response_bytes': 23480934,
'downloader/response_count': 1068,
'downloader/response_status_count/200': 1068,
'dupefilter/filtered': 14933,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 56, 57, 997549),
'item_scraped_count': 1068,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 52162560,
'memusage/startup': 52162560,
'request_depth_max': 9,
'response_received_count': 1068,
'scheduler/dequeued': 1068,
'scheduler/dequeued/memory': 1068,
'scheduler/enqueued': 1101,
'scheduler/enqueued/memory': 1101,
'start_time': datetime.datetime(2018, 12, 3, 16, 56, 47, 604597)}
2018-12-03 16:56:57 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:56:58 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:56:58 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:56:58 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-12-03 16:56:58 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:56:58 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:56:58 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:56:58 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:56:58 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:56:58 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:56:58 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:57:01 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 259 items (at 5180 items/min)
2018-12-03 16:57:04 [scrapy.extensions.logstats] INFO: Crawled 606 pages (at 6700 pages/min), scraped 585 items (at 6520 items/min)
2018-12-03 16:57:07 [scrapy.extensions.logstats] INFO: Crawled 951 pages (at 6900 pages/min), scraped 941 items (at 7120 items/min)
2018-12-03 16:57:08 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:57:08 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv
The average speed of the spider is 102.587522703 items/sec
2018-12-03 16:57:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 368496,
'downloader/request_count': 1057,
'downloader/request_method_count/GET': 1057,
'downloader/response_bytes': 23242113,
'downloader/response_count': 1057,
'downloader/response_status_count/200': 1057,
'dupefilter/filtered': 14790,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 57, 8, 491389),
'item_scraped_count': 1057,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52387840,
'memusage/startup': 52387840,
'request_depth_max': 9,
'response_received_count': 1057,
'scheduler/dequeued': 1057,
'scheduler/dequeued/memory': 1057,
'scheduler/enqueued': 1080,
'scheduler/enqueued/memory': 1080,
'start_time': datetime.datetime(2018, 12, 3, 16, 56, 58, 319427)}
2018-12-03 16:57:08 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:57:08 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:57:08 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:57:08 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-12-03 16:57:08 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:57:08 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:57:08 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:57:08 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:57:08 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:57:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:57:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:57:11 [scrapy.extensions.logstats] INFO: Crawled 270 pages (at 5400 pages/min), scraped 256 items (at 5120 items/min)
2018-12-03 16:57:15 [scrapy.extensions.logstats] INFO: Crawled 636 pages (at 7320 pages/min), scraped 584 items (at 6560 items/min)
2018-12-03 16:57:18 [scrapy.extensions.logstats] INFO: Crawled 990 pages (at 7080 pages/min), scraped 919 items (at 6700 items/min)
2018-12-03 16:57:18 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:57:19 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 99.4316869922 items/sec
2018-12-03 16:57:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 375976,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23726256,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 57, 19, 170607),
'item_scraped_count': 1079,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52449280,
'memusage/startup': 52449280,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 12, 3, 16, 57, 8, 810973)}
2018-12-03 16:57:19 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 101.392829488 Median : 102.542592385 Std Dev : 1.50170567828
Executing scrapy-bench --n-runs 1 broadworm in /home/nikita/ves/scrapy-bench-2.7/
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-04 16:06:36 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: broadspider)
2018-12-04 16:06:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-04 16:06:36 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'broad.spiders', 'CLOSESPIDER_ITEMCOUNT': 800, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['broad.spiders'], 'REACTOR_THREADPOOL_MAXSIZE': 20, 'BOT_NAME': 'broadspider', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'AUTOTHROTTLE_ENABLED': True}
2018-12-04 16:06:36 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.throttle.AutoThrottle']
2018-12-04 16:06:36 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-04 16:06:36 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-04 16:06:36 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-04 16:06:36 [scrapy.core.engine] INFO: Spider opened
2018-12-04 16:06:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-04 16:06:36 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6026
2018-12-04 16:06:39 [scrapy.extensions.logstats] INFO: Crawled 63 pages (at 1260 pages/min), scraped 61 items (at 1220 items/min)
2018-12-04 16:06:42 [scrapy.extensions.logstats] INFO: Crawled 71 pages (at 160 pages/min), scraped 70 items (at 180 items/min)
2018-12-04 16:06:45 [scrapy.extensions.logstats] INFO: Crawled 96 pages (at 500 pages/min), scraped 93 items (at 460 items/min)
2018-12-04 16:06:48 [scrapy.extensions.logstats] INFO: Crawled 168 pages (at 1440 pages/min), scraped 163 items (at 1400 items/min)
2018-12-04 16:06:51 [scrapy.extensions.logstats] INFO: Crawled 264 pages (at 1920 pages/min), scraped 261 items (at 1960 items/min)
2018-12-04 16:06:54 [scrapy.extensions.logstats] INFO: Crawled 328 pages (at 1280 pages/min), scraped 328 items (at 1340 items/min)
2018-12-04 16:06:57 [scrapy.extensions.logstats] INFO: Crawled 356 pages (at 560 pages/min), scraped 356 items (at 560 items/min)
2018-12-04 16:07:00 [scrapy.extensions.logstats] INFO: Crawled 385 pages (at 580 pages/min), scraped 383 items (at 540 items/min)
2018-12-04 16:07:03 [scrapy.extensions.logstats] INFO: Crawled 416 pages (at 620 pages/min), scraped 416 items (at 660 items/min)
2018-12-04 16:07:06 [scrapy.extensions.logstats] INFO: Crawled 446 pages (at 600 pages/min), scraped 445 items (at 580 items/min)
2018-12-04 16:07:09 [scrapy.extensions.logstats] INFO: Crawled 475 pages (at 580 pages/min), scraped 474 items (at 580 items/min)
2018-12-04 16:07:12 [scrapy.extensions.logstats] INFO: Crawled 505 pages (at 600 pages/min), scraped 505 items (at 620 items/min)
2018-12-04 16:07:15 [scrapy.extensions.logstats] INFO: Crawled 533 pages (at 560 pages/min), scraped 531 items (at 520 items/min)
2018-12-04 16:07:18 [scrapy.extensions.logstats] INFO: Crawled 561 pages (at 560 pages/min), scraped 561 items (at 600 items/min)
2018-12-04 16:07:21 [scrapy.extensions.logstats] INFO: Crawled 589 pages (at 560 pages/min), scraped 588 items (at 540 items/min)
2018-12-04 16:07:24 [scrapy.extensions.logstats] INFO: Crawled 617 pages (at 560 pages/min), scraped 614 items (at 520 items/min)
2018-12-04 16:07:27 [scrapy.extensions.logstats] INFO: Crawled 648 pages (at 620 pages/min), scraped 647 items (at 660 items/min)
2018-12-04 16:07:30 [scrapy.extensions.logstats] INFO: Crawled 675 pages (at 540 pages/min), scraped 674 items (at 540 items/min)
2018-12-04 16:07:33 [scrapy.extensions.logstats] INFO: Crawled 688 pages (at 260 pages/min), scraped 688 items (at 280 items/min)
2018-12-04 16:07:36 [scrapy.extensions.logstats] INFO: Crawled 696 pages (at 160 pages/min), scraped 696 items (at 160 items/min)
2018-12-04 16:07:39 [scrapy.extensions.logstats] INFO: Crawled 702 pages (at 120 pages/min), scraped 702 items (at 120 items/min)
2018-12-04 16:07:42 [scrapy.extensions.logstats] INFO: Crawled 707 pages (at 100 pages/min), scraped 707 items (at 100 items/min)
2018-12-04 16:07:45 [scrapy.extensions.logstats] INFO: Crawled 712 pages (at 100 pages/min), scraped 712 items (at 100 items/min)
2018-12-04 16:07:48 [scrapy.extensions.logstats] INFO: Crawled 718 pages (at 120 pages/min), scraped 717 items (at 100 items/min)
2018-12-04 16:07:51 [scrapy.extensions.logstats] INFO: Crawled 721 pages (at 60 pages/min), scraped 721 items (at 80 items/min)
2018-12-04 16:07:54 [scrapy.extensions.logstats] INFO: Crawled 728 pages (at 140 pages/min), scraped 728 items (at 140 items/min)
2018-12-04 16:07:57 [scrapy.extensions.logstats] INFO: Crawled 734 pages (at 120 pages/min), scraped 734 items (at 120 items/min)
2018-12-04 16:08:00 [scrapy.extensions.logstats] INFO: Crawled 739 pages (at 100 pages/min), scraped 739 items (at 100 items/min)
2018-12-04 16:08:03 [scrapy.extensions.logstats] INFO: Crawled 745 pages (at 120 pages/min), scraped 745 items (at 120 items/min)
2018-12-04 16:08:06 [scrapy.extensions.logstats] INFO: Crawled 751 pages (at 120 pages/min), scraped 751 items (at 120 items/min)
2018-12-04 16:08:09 [scrapy.extensions.logstats] INFO: Crawled 758 pages (at 140 pages/min), scraped 758 items (at 140 items/min)
2018-12-04 16:08:12 [scrapy.extensions.logstats] INFO: Crawled 762 pages (at 80 pages/min), scraped 762 items (at 80 items/min)
2018-12-04 16:08:15 [scrapy.extensions.logstats] INFO: Crawled 772 pages (at 200 pages/min), scraped 772 items (at 200 items/min)
2018-12-04 16:08:18 [scrapy.extensions.logstats] INFO: Crawled 774 pages (at 40 pages/min), scraped 774 items (at 40 items/min)
2018-12-04 16:08:21 [scrapy.extensions.logstats] INFO: Crawled 781 pages (at 140 pages/min), scraped 780 items (at 120 items/min)
2018-12-04 16:08:24 [scrapy.extensions.logstats] INFO: Crawled 788 pages (at 140 pages/min), scraped 788 items (at 160 items/min)
2018-12-04 16:08:27 [scrapy.extensions.logstats] INFO: Crawled 790 pages (at 40 pages/min), scraped 790 items (at 40 items/min)
2018-12-04 16:08:30 [scrapy.extensions.logstats] INFO: Crawled 796 pages (at 120 pages/min), scraped 796 items (at 120 items/min)
2018-12-04 16:08:32 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-04 16:08:33 [scrapy.extensions.logstats] INFO: Crawled 800 pages (at 80 pages/min), scraped 800 items (at 80 items/min)
2018-12-04 16:08:36 [scrapy.extensions.logstats] INFO: Crawled 807 pages (at 140 pages/min), scraped 807 items (at 140 items/min)
2018-12-04 16:08:39 [scrapy.extensions.logstats] INFO: Crawled 815 pages (at 160 pages/min), scraped 814 items (at 140 items/min)
2018-12-04 16:08:42 [scrapy.extensions.logstats] INFO: Crawled 819 pages (at 80 pages/min), scraped 819 items (at 100 items/min)
2018-12-04 16:08:45 [scrapy.extensions.logstats] INFO: Crawled 823 pages (at 80 pages/min), scraped 823 items (at 80 items/min)
2018-12-04 16:08:48 [scrapy.extensions.logstats] INFO: Crawled 826 pages (at 60 pages/min), scraped 826 items (at 60 items/min)
2018-12-04 16:08:51 [scrapy.extensions.logstats] INFO: Crawled 829 pages (at 60 pages/min), scraped 829 items (at 60 items/min)
2018-12-04 16:08:54 [scrapy.extensions.logstats] INFO: Crawled 835 pages (at 120 pages/min), scraped 834 items (at 100 items/min)
2018-12-04 16:08:57 [scrapy.extensions.logstats] INFO: Crawled 839 pages (at 80 pages/min), scraped 839 items (at 100 items/min)
2018-12-04 16:09:00 [scrapy.extensions.logstats] INFO: Crawled 842 pages (at 60 pages/min), scraped 842 items (at 60 items/min)
2018-12-04 16:09:03 [scrapy.extensions.logstats] INFO: Crawled 846 pages (at 80 pages/min), scraped 846 items (at 80 items/min)
2018-12-04 16:09:06 [scrapy.extensions.logstats] INFO: Crawled 852 pages (at 120 pages/min), scraped 852 items (at 120 items/min)
2018-12-04 16:09:09 [scrapy.extensions.logstats] INFO: Crawled 855 pages (at 60 pages/min), scraped 855 items (at 60 items/min)
2018-12-04 16:09:12 [scrapy.extensions.logstats] INFO: Crawled 858 pages (at 60 pages/min), scraped 857 items (at 40 items/min)
2018-12-04 16:09:15 [scrapy.extensions.logstats] INFO: Crawled 861 pages (at 60 pages/min), scraped 861 items (at 80 items/min)
2018-12-04 16:09:18 [scrapy.extensions.logstats] INFO: Crawled 862 pages (at 20 pages/min), scraped 862 items (at 20 items/min)
2018-12-04 16:09:21 [scrapy.extensions.logstats] INFO: Crawled 866 pages (at 80 pages/min), scraped 865 items (at 60 items/min)
2018-12-04 16:09:24 [scrapy.extensions.logstats] INFO: Crawled 868 pages (at 40 pages/min), scraped 868 items (at 60 items/min)
2018-12-04 16:09:27 [scrapy.extensions.logstats] INFO: Crawled 871 pages (at 60 pages/min), scraped 871 items (at 60 items/min)
2018-12-04 16:09:30 [scrapy.extensions.logstats] INFO: Crawled 873 pages (at 40 pages/min), scraped 873 items (at 40 items/min)
2018-12-04 16:09:33 [scrapy.extensions.logstats] INFO: Crawled 877 pages (at 80 pages/min), scraped 877 items (at 80 items/min)
2018-12-04 16:09:36 [scrapy.extensions.logstats] INFO: Crawled 880 pages (at 60 pages/min), scraped 879 items (at 40 items/min)
2018-12-04 16:09:39 [scrapy.extensions.logstats] INFO: Crawled 882 pages (at 40 pages/min), scraped 882 items (at 60 items/min)
2018-12-04 16:09:42 [scrapy.extensions.logstats] INFO: Crawled 885 pages (at 60 pages/min), scraped 885 items (at 60 items/min)
2018-12-04 16:09:45 [scrapy.extensions.logstats] INFO: Crawled 886 pages (at 20 pages/min), scraped 886 items (at 20 items/min)
2018-12-04 16:09:48 [scrapy.extensions.logstats] INFO: Crawled 889 pages (at 60 pages/min), scraped 889 items (at 60 items/min)
2018-12-04 16:09:51 [scrapy.extensions.logstats] INFO: Crawled 889 pages (at 0 pages/min), scraped 889 items (at 0 items/min)
2018-12-04 16:09:54 [scrapy.extensions.logstats] INFO: Crawled 891 pages (at 40 pages/min), scraped 891 items (at 40 items/min)
2018-12-04 16:09:57 [scrapy.extensions.logstats] INFO: Crawled 893 pages (at 40 pages/min), scraped 893 items (at 40 items/min)
2018-12-04 16:10:00 [scrapy.extensions.logstats] INFO: Crawled 893 pages (at 0 pages/min), scraped 893 items (at 0 items/min)
2018-12-04 16:10:03 [scrapy.extensions.logstats] INFO: Crawled 896 pages (at 60 pages/min), scraped 896 items (at 60 items/min)
2018-12-04 16:10:06 [scrapy.extensions.logstats] INFO: Crawled 897 pages (at 20 pages/min), scraped 897 items (at 20 items/min)
2018-12-04 16:10:09 [scrapy.extensions.logstats] INFO: Crawled 898 pages (at 20 pages/min), scraped 898 items (at 20 items/min)
2018-12-04 16:10:12 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 40 pages/min), scraped 900 items (at 40 items/min)
2018-12-04 16:10:15 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 0 pages/min), scraped 900 items (at 0 items/min)
2018-12-04 16:10:18 [scrapy.extensions.logstats] INFO: Crawled 901 pages (at 20 pages/min), scraped 901 items (at 20 items/min)
2018-12-04 16:10:21 [scrapy.extensions.logstats] INFO: Crawled 901 pages (at 0 pages/min), scraped 901 items (at 0 items/min)
2018-12-04 16:10:24 [scrapy.extensions.logstats] INFO: Crawled 903 pages (at 40 pages/min), scraped 903 items (at 40 items/min)
2018-12-04 16:10:27 [scrapy.extensions.logstats] INFO: Crawled 903 pages (at 0 pages/min), scraped 903 items (at 0 items/min)
2018-12-04 16:10:30 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 60 pages/min), scraped 906 items (at 60 items/min)
2018-12-04 16:10:33 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 0 pages/min), scraped 906 items (at 0 items/min)
2018-12-04 16:10:36 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 0 pages/min), scraped 906 items (at 0 items/min)
2018-12-04 16:10:39 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 0 pages/min), scraped 906 items (at 0 items/min)
2018-12-04 16:10:42 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 20 pages/min), scraped 907 items (at 20 items/min)
2018-12-04 16:10:45 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 0 pages/min), scraped 907 items (at 0 items/min)
2018-12-04 16:10:48 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 0 pages/min), scraped 907 items (at 0 items/min)
2018-12-04 16:10:51 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 0 pages/min), scraped 907 items (at 0 items/min)
2018-12-04 16:10:54 [scrapy.extensions.logstats] INFO: Crawled 908 pages (at 20 pages/min), scraped 908 items (at 20 items/min)
2018-12-04 16:10:57 [scrapy.extensions.logstats] INFO: Crawled 908 pages (at 0 pages/min), scraped 908 items (at 0 items/min)
2018-12-04 16:11:00 [scrapy.extensions.logstats] INFO: Crawled 909 pages (at 20 pages/min), scraped 909 items (at 20 items/min)
2018-12-04 16:11:03 [scrapy.extensions.logstats] INFO: Crawled 909 pages (at 0 pages/min), scraped 909 items (at 0 items/min)
2018-12-04 16:11:06 [scrapy.extensions.logstats] INFO: Crawled 911 pages (at 40 pages/min), scraped 911 items (at 40 items/min)
2018-12-04 16:11:09 [scrapy.extensions.logstats] INFO: Crawled 911 pages (at 0 pages/min), scraped 911 items (at 0 items/min)
2018-12-04 16:11:12 [scrapy.extensions.logstats] INFO: Crawled 912 pages (at 20 pages/min), scraped 912 items (at 20 items/min)
2018-12-04 16:11:15 [scrapy.extensions.logstats] INFO: Crawled 912 pages (at 0 pages/min), scraped 912 items (at 0 items/min)
2018-12-04 16:11:18 [scrapy.extensions.logstats] INFO: Crawled 912 pages (at 0 pages/min), scraped 912 items (at 0 items/min)
2018-12-04 16:11:21 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 20 pages/min), scraped 913 items (at 20 items/min)
2018-12-04 16:11:24 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 0 pages/min), scraped 913 items (at 0 items/min)
2018-12-04 16:11:27 [scrapy.extensions.logstats] INFO: Crawled 914 pages (at 20 pages/min), scraped 914 items (at 20 items/min)
2018-12-04 16:11:30 [scrapy.extensions.logstats] INFO: Crawled 914 pages (at 0 pages/min), scraped 914 items (at 0 items/min)
2018-12-04 16:11:33 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 20 pages/min), scraped 915 items (at 20 items/min)
2018-12-04 16:11:36 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 0 pages/min), scraped 915 items (at 0 items/min)
2018-12-04 16:11:39 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 0 pages/min), scraped 915 items (at 0 items/min)
2018-12-04 16:11:42 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 20 pages/min), scraped 916 items (at 20 items/min)
2018-12-04 16:11:45 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 0 pages/min), scraped 916 items (at 0 items/min)
2018-12-04 16:11:48 [scrapy.extensions.logstats] INFO: Crawled 917 pages (at 20 pages/min), scraped 917 items (at 20 items/min)
2018-12-04 16:11:51 [scrapy.extensions.logstats] INFO: Crawled 917 pages (at 0 pages/min), scraped 917 items (at 0 items/min)
2018-12-04 16:11:54 [scrapy.extensions.logstats] INFO: Crawled 918 pages (at 20 pages/min), scraped 918 items (at 20 items/min)
2018-12-04 16:11:57 [scrapy.extensions.logstats] INFO: Crawled 919 pages (at 20 pages/min), scraped 918 items (at 0 items/min)
2018-12-04 16:12:00 [scrapy.extensions.logstats] INFO: Crawled 919 pages (at 0 pages/min), scraped 919 items (at 20 items/min)
2018-12-04 16:12:03 [scrapy.extensions.feedexport] INFO: Stored csv feed (920 items) in: items.csv
The average speed of the spider is 2.81044919015 items/sec
2018-12-04 16:12:03 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 304229,
'downloader/request_count': 920,
'downloader/request_method_count/GET': 920,
'downloader/response_bytes': 25138362,
'downloader/response_count': 920,
'downloader/response_status_count/200': 920,
'dupefilter/filtered': 15014,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 4, 16, 12, 3, 643755),
'item_scraped_count': 920,
'log_count/INFO': 117,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/BroadBenchSpider': 1,
'memdebug/live_refs/Request': 7889,
'memusage/max': 120696832,
'memusage/startup': 52322304,
'request_depth_max': 13,
'response_received_count': 920,
'scheduler/dequeued': 920,
'scheduler/dequeued/memory': 920,
'scheduler/enqueued': 8808,
'scheduler/enqueued/memory': 8808,
'start_time': datetime.datetime(2018, 12, 4, 16, 6, 36, 587721)}
2018-12-04 16:12:03 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Broad Crawl' Iterations = '1'
Mean : 2.81044919015 Median : 2.81044919015 Std Dev : 0.0
Executing scrapy-bench --n-runs 5 --book_url=http://172.17.0.6:8880 bookworm in /home/nikita/ves/scrapy-bench-2.7/
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:54:01 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:54:01 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:54:01 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-12-03 16:54:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:54:01 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:54:01 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:54:01 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:54:01 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:54:01 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:54:01 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:54:04 [scrapy.extensions.logstats] INFO: Crawled 276 pages (at 5520 pages/min), scraped 265 items (at 5300 items/min)
2018-12-03 16:54:08 [scrapy.extensions.logstats] INFO: Crawled 633 pages (at 7140 pages/min), scraped 574 items (at 6180 items/min)
2018-12-03 16:54:11 [scrapy.extensions.logstats] INFO: Crawled 989 pages (at 7120 pages/min), scraped 917 items (at 6860 items/min)
2018-12-03 16:54:11 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:54:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1067 items) in: items.csv
The average speed of the spider is 99.6846940525 items/sec
2018-12-03 16:54:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 371951,
'downloader/request_count': 1067,
'downloader/request_method_count/GET': 1067,
'downloader/response_bytes': 23429941,
'downloader/response_count': 1067,
'downloader/response_status_count/200': 1067,
'dupefilter/filtered': 14860,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 12, 19157),
'item_scraped_count': 1067,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 52396032,
'memusage/startup': 52396032,
'request_depth_max': 8,
'response_received_count': 1067,
'scheduler/dequeued': 1067,
'scheduler/dequeued/memory': 1067,
'scheduler/enqueued': 1100,
'scheduler/enqueued/memory': 1100,
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 1, 728204)}
2018-12-03 16:54:12 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:54:12 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:54:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:54:12 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-12-03 16:54:12 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:54:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:54:12 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:54:12 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:54:12 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:54:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:54:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:54:15 [scrapy.extensions.logstats] INFO: Crawled 278 pages (at 5560 pages/min), scraped 270 items (at 5400 items/min)
2018-12-03 16:54:18 [scrapy.extensions.logstats] INFO: Crawled 609 pages (at 6620 pages/min), scraped 597 items (at 6540 items/min)
2018-12-03 16:54:21 [scrapy.extensions.logstats] INFO: Crawled 956 pages (at 6940 pages/min), scraped 936 items (at 6780 items/min)
2018-12-03 16:54:22 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:54:22 [scrapy.extensions.feedexport] INFO: Stored csv feed (1056 items) in: items.csv
The average speed of the spider is 103.880934494 items/sec
2018-12-03 16:54:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 368183,
'downloader/request_count': 1056,
'downloader/request_method_count/GET': 1056,
'downloader/response_bytes': 23190637,
'downloader/response_count': 1056,
'downloader/response_status_count/200': 1056,
'dupefilter/filtered': 14717,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 22, 512185),
'item_scraped_count': 1056,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52391936,
'memusage/startup': 52391936,
'request_depth_max': 8,
'response_received_count': 1056,
'scheduler/dequeued': 1056,
'scheduler/dequeued/memory': 1056,
'scheduler/enqueued': 1079,
'scheduler/enqueued/memory': 1079,
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 12, 340425)}
2018-12-03 16:54:22 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:54:22 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:54:22 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:54:22 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-12-03 16:54:22 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:54:22 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:54:22 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:54:22 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:54:22 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:54:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:54:22 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:54:25 [scrapy.extensions.logstats] INFO: Crawled 278 pages (at 5560 pages/min), scraped 267 items (at 5340 items/min)
2018-12-03 16:54:29 [scrapy.extensions.logstats] INFO: Crawled 609 pages (at 6620 pages/min), scraped 597 items (at 6600 items/min)
2018-12-03 16:54:32 [scrapy.extensions.logstats] INFO: Crawled 961 pages (at 7040 pages/min), scraped 935 items (at 6760 items/min)
2018-12-03 16:54:32 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:54:33 [scrapy.extensions.feedexport] INFO: Stored csv feed (1077 items) in: items.csv
The average speed of the spider is 102.582743682 items/sec
2018-12-03 16:54:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 375405,
'downloader/request_count': 1077,
'downloader/request_method_count/GET': 1077,
'downloader/response_bytes': 23623787,
'downloader/response_count': 1077,
'downloader/response_status_count/200': 1077,
'dupefilter/filtered': 14950,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 33, 295471),
'item_scraped_count': 1077,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52355072,
'memusage/startup': 52355072,
'request_depth_max': 8,
'response_received_count': 1077,
'scheduler/dequeued': 1077,
'scheduler/dequeued/memory': 1077,
'scheduler/enqueued': 1100,
'scheduler/enqueued/memory': 1100,
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 22, 822215)}
2018-12-03 16:54:33 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:54:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:54:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:54:33 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-12-03 16:54:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:54:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:54:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:54:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:54:33 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:54:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:54:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:54:36 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 258 items (at 5160 items/min)
2018-12-03 16:54:39 [scrapy.extensions.logstats] INFO: Crawled 624 pages (at 7060 pages/min), scraped 591 items (at 6660 items/min)
2018-12-03 16:54:42 [scrapy.extensions.logstats] INFO: Crawled 945 pages (at 6420 pages/min), scraped 937 items (at 6920 items/min)
2018-12-03 16:54:43 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:54:43 [scrapy.extensions.feedexport] INFO: Stored csv feed (1077 items) in: items.csv
The average speed of the spider is 102.784142089 items/sec
2018-12-03 16:54:43 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 375459,
'downloader/request_count': 1077,
'downloader/request_method_count/GET': 1077,
'downloader/response_bytes': 23623787,
'downloader/response_count': 1077,
'downloader/response_status_count/200': 1077,
'dupefilter/filtered': 14950,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 43, 888686),
'item_scraped_count': 1077,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52465664,
'memusage/startup': 52465664,
'request_depth_max': 8,
'response_received_count': 1077,
'scheduler/dequeued': 1077,
'scheduler/dequeued/memory': 1077,
'scheduler/enqueued': 1100,
'scheduler/enqueued/memory': 1100,
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 33, 612214)}
2018-12-03 16:54:43 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
/home/nikita/ves/scrapy-bench-2.7/local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
utils.DeprecatedIn23,
2018-12-03 16:54:44 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:54:44 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.6 (default, Nov 23 2017, 15:49:48) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-Ubuntu-14.04-trusty
2018-12-03 16:54:44 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-12-03 16:54:44 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-12-03 16:54:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:54:44 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:54:44 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:54:44 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:54:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:54:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:54:47 [scrapy.extensions.logstats] INFO: Crawled 271 pages (at 5420 pages/min), scraped 260 items (at 5200 items/min)
2018-12-03 16:54:50 [scrapy.extensions.logstats] INFO: Crawled 641 pages (at 7400 pages/min), scraped 582 items (at 6440 items/min)
2018-12-03 16:54:53 [scrapy.extensions.logstats] INFO: Crawled 988 pages (at 6940 pages/min), scraped 927 items (at 6900 items/min)
2018-12-03 16:54:54 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:54:54 [scrapy.extensions.feedexport] INFO: Stored csv feed (1077 items) in: items.csv
The average speed of the spider is 100.624918156 items/sec
2018-12-03 16:54:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 375491,
'downloader/request_count': 1077,
'downloader/request_method_count/GET': 1077,
'downloader/response_bytes': 23656308,
'downloader/response_count': 1077,
'downloader/response_status_count/200': 1077,
'dupefilter/filtered': 15014,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 54, 54, 523037),
'item_scraped_count': 1077,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 25,
'memusage/max': 52346880,
'memusage/startup': 52346880,
'request_depth_max': 9,
'response_received_count': 1077,
'scheduler/dequeued': 1077,
'scheduler/dequeued/memory': 1077,
'scheduler/enqueued': 1101,
'scheduler/enqueued/memory': 1101,
'start_time': datetime.datetime(2018, 12, 3, 16, 54, 44, 216313)}
2018-12-03 16:54:54 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 101.911486495 Median : 102.582743682 Std Dev : 1.53001320847
Executing scrapy-bench --n-runs 1 broadworm in /home/nikita/ves/scrapy-bench-3.6/
2018-12-04 16:20:04 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: broadspider)
2018-12-04 16:20:04 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-04 16:20:04 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'broadspider', 'CLOSESPIDER_ITEMCOUNT': 800, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'broad.spiders', 'REACTOR_THREADPOOL_MAXSIZE': 20, 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['broad.spiders']}
2018-12-04 16:20:05 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.throttle.AutoThrottle']
2018-12-04 16:20:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-04 16:20:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-04 16:20:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-04 16:20:05 [scrapy.core.engine] INFO: Spider opened
2018-12-04 16:20:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-04 16:20:06 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-04 16:20:09 [scrapy.extensions.logstats] INFO: Crawled 51 pages (at 1020 pages/min), scraped 51 items (at 1020 items/min)
2018-12-04 16:20:12 [scrapy.extensions.logstats] INFO: Crawled 84 pages (at 660 pages/min), scraped 82 items (at 620 items/min)
2018-12-04 16:20:15 [scrapy.extensions.logstats] INFO: Crawled 165 pages (at 1620 pages/min), scraped 159 items (at 1540 items/min)
2018-12-04 16:20:18 [scrapy.extensions.logstats] INFO: Crawled 318 pages (at 3060 pages/min), scraped 289 items (at 2600 items/min)
2018-12-04 16:20:21 [scrapy.extensions.logstats] INFO: Crawled 443 pages (at 2500 pages/min), scraped 411 items (at 2440 items/min)
2018-12-04 16:20:24 [scrapy.extensions.logstats] INFO: Crawled 581 pages (at 2760 pages/min), scraped 543 items (at 2640 items/min)
2018-12-04 16:20:27 [scrapy.extensions.logstats] INFO: Crawled 714 pages (at 2660 pages/min), scraped 687 items (at 2880 items/min)
2018-12-04 16:20:29 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-04 16:20:30 [scrapy.extensions.logstats] INFO: Crawled 866 pages (at 3040 pages/min), scraped 832 items (at 2900 items/min)
2018-12-04 16:20:33 [scrapy.extensions.logstats] INFO: Crawled 896 pages (at 600 pages/min), scraped 896 items (at 1280 items/min)
2018-12-04 16:20:36 [scrapy.extensions.logstats] INFO: Crawled 925 pages (at 580 pages/min), scraped 924 items (at 560 items/min)
2018-12-04 16:20:39 [scrapy.extensions.logstats] INFO: Crawled 942 pages (at 340 pages/min), scraped 942 items (at 360 items/min)
2018-12-04 16:20:42 [scrapy.extensions.logstats] INFO: Crawled 943 pages (at 20 pages/min), scraped 943 items (at 20 items/min)
2018-12-04 16:20:45 [scrapy.extensions.logstats] INFO: Crawled 943 pages (at 0 pages/min), scraped 943 items (at 0 items/min)
2018-12-04 16:20:47 [scrapy.extensions.feedexport] INFO: Stored csv feed (944 items) in: items.csv
The average speed of the spider is 23.12196554851653 items/sec
2018-12-04 16:20:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 287368,
'downloader/request_count': 944,
'downloader/request_method_count/GET': 944,
'downloader/response_bytes': 34229023,
'downloader/response_count': 944,
'downloader/response_status_count/200': 944,
'dupefilter/filtered': 28023,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 4, 16, 20, 46, 938954),
'item_scraped_count': 944,
'log_count/INFO': 22,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/BroadBenchSpider': 1,
'memdebug/live_refs/Request': 12725,
'memusage/max': 52166656,
'memusage/startup': 52166656,
'request_depth_max': 15,
'response_received_count': 944,
'scheduler/dequeued': 944,
'scheduler/dequeued/memory': 944,
'scheduler/enqueued': 13668,
'scheduler/enqueued/memory': 13668,
'start_time': datetime.datetime(2018, 12, 4, 16, 20, 6, 150168)}
2018-12-04 16:20:47 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Broad Crawl' Iterations = '1'
Mean : 23.12196554851653 Median : 23.12196554851653 Std Dev : 0.0
Executing scrapy-bench --n-runs 5 --book_url=http://172.17.0.6:8880 bookworm in /home/nikita/ves/scrapy-bench-3.6/
2018-12-03 16:57:36 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:57:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 16:57:36 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 16:57:36 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 16:57:36 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:57:36 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:57:36 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:57:36 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:57:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:57:36 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:57:39 [scrapy.extensions.logstats] INFO: Crawled 254 pages (at 5080 pages/min), scraped 197 items (at 3940 items/min)
2018-12-03 16:57:42 [scrapy.extensions.logstats] INFO: Crawled 493 pages (at 4780 pages/min), scraped 484 items (at 5740 items/min)
2018-12-03 16:57:45 [scrapy.extensions.logstats] INFO: Crawled 801 pages (at 6160 pages/min), scraped 779 items (at 5900 items/min)
2018-12-03 16:57:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:57:48 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 91.14893914218703 items/sec
2018-12-03 16:57:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 376212,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23726256,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 57, 48, 343718),
'item_scraped_count': 1079,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 51806208,
'memusage/startup': 51806208,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 12, 3, 16, 57, 36, 506294)}
2018-12-03 16:57:48 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-12-03 16:57:48 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:57:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 16:57:48 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 16:57:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 16:57:48 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:57:48 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:57:48 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:57:48 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:57:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:57:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:57:52 [scrapy.extensions.logstats] INFO: Crawled 252 pages (at 5040 pages/min), scraped 205 items (at 4100 items/min)
2018-12-03 16:57:54 [scrapy.extensions.logstats] INFO: Crawled 523 pages (at 5420 pages/min), scraped 492 items (at 5740 items/min)
2018-12-03 16:57:58 [scrapy.extensions.logstats] INFO: Crawled 849 pages (at 6520 pages/min), scraped 784 items (at 5840 items/min)
2018-12-03 16:58:00 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:58:00 [scrapy.extensions.feedexport] INFO: Stored csv feed (1068 items) in: items.csv
The average speed of the spider is 90.72894556647711 items/sec
2018-12-03 16:58:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 372334,
'downloader/request_count': 1068,
'downloader/request_method_count/GET': 1068,
'downloader/response_bytes': 23480934,
'downloader/response_count': 1068,
'downloader/response_status_count/200': 1068,
'dupefilter/filtered': 14933,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 58, 0, 642647),
'item_scraped_count': 1068,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 51941376,
'memusage/startup': 51941376,
'request_depth_max': 9,
'response_received_count': 1068,
'scheduler/dequeued': 1068,
'scheduler/dequeued/memory': 1068,
'scheduler/enqueued': 1101,
'scheduler/enqueued/memory': 1101,
'start_time': datetime.datetime(2018, 12, 3, 16, 57, 48, 914985)}
2018-12-03 16:58:00 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-12-03 16:58:01 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:58:01 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 16:58:01 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 16:58:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 16:58:01 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:58:01 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:58:01 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:58:01 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:58:01 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:58:01 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:58:04 [scrapy.extensions.logstats] INFO: Crawled 252 pages (at 5040 pages/min), scraped 202 items (at 4040 items/min)
2018-12-03 16:58:07 [scrapy.extensions.logstats] INFO: Crawled 500 pages (at 4960 pages/min), scraped 492 items (at 5800 items/min)
2018-12-03 16:58:10 [scrapy.extensions.logstats] INFO: Crawled 801 pages (at 6020 pages/min), scraped 789 items (at 5940 items/min)
2018-12-03 16:58:12 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:58:12 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv
The average speed of the spider is 88.62741130753513 items/sec
2018-12-03 16:58:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 368803,
'downloader/request_count': 1058,
'downloader/request_method_count/GET': 1058,
'downloader/response_bytes': 23293106,
'downloader/response_count': 1058,
'downloader/response_status_count/200': 1058,
'dupefilter/filtered': 14863,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 58, 12, 940998),
'item_scraped_count': 1058,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52318208,
'memusage/startup': 52318208,
'request_depth_max': 9,
'response_received_count': 1058,
'scheduler/dequeued': 1058,
'scheduler/dequeued/memory': 1058,
'scheduler/enqueued': 1081,
'scheduler/enqueued/memory': 1081,
'start_time': datetime.datetime(2018, 12, 3, 16, 58, 1, 215937)}
2018-12-03 16:58:12 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-12-03 16:58:13 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:58:13 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 16:58:13 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 16:58:13 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 16:58:13 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:58:13 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:58:13 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:58:13 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:58:13 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:58:13 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:58:16 [scrapy.extensions.logstats] INFO: Crawled 253 pages (at 5060 pages/min), scraped 200 items (at 4000 items/min)
2018-12-03 16:58:19 [scrapy.extensions.logstats] INFO: Crawled 514 pages (at 5220 pages/min), scraped 492 items (at 5840 items/min)
2018-12-03 16:58:22 [scrapy.extensions.logstats] INFO: Crawled 842 pages (at 6560 pages/min), scraped 795 items (at 6060 items/min)
2018-12-03 16:58:24 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:58:25 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv
The average speed of the spider is 88.41912912771723 items/sec
2018-12-03 16:58:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 368726,
'downloader/request_count': 1057,
'downloader/request_method_count/GET': 1057,
'downloader/response_bytes': 23242113,
'downloader/response_count': 1057,
'downloader/response_status_count/200': 1057,
'dupefilter/filtered': 14790,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 58, 25, 40154),
'item_scraped_count': 1057,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52146176,
'memusage/startup': 52146176,
'request_depth_max': 9,
'response_received_count': 1057,
'scheduler/dequeued': 1057,
'scheduler/dequeued/memory': 1057,
'scheduler/enqueued': 1080,
'scheduler/enqueued/memory': 1080,
'start_time': datetime.datetime(2018, 12, 3, 16, 58, 13, 505983)}
2018-12-03 16:58:25 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-12-03 16:58:25 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 16:58:25 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 16:58:25 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue', 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 16:58:25 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 16:58:25 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 16:58:25 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 16:58:25 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 16:58:25 [scrapy.core.engine] INFO: Spider opened
2018-12-03 16:58:25 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 16:58:25 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 16:58:28 [scrapy.extensions.logstats] INFO: Crawled 251 pages (at 5020 pages/min), scraped 200 items (at 4000 items/min)
2018-12-03 16:58:31 [scrapy.extensions.logstats] INFO: Crawled 544 pages (at 5860 pages/min), scraped 478 items (at 5560 items/min)
2018-12-03 16:58:34 [scrapy.extensions.logstats] INFO: Crawled 791 pages (at 4940 pages/min), scraped 783 items (at 6100 items/min)
2018-12-03 16:58:36 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 16:58:37 [scrapy.extensions.feedexport] INFO: Stored csv feed (1077 items) in: items.csv
The average speed of the spider is 89.2150649171547 items/sec
2018-12-03 16:58:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 375586,
'downloader/request_count': 1077,
'downloader/request_method_count/GET': 1077,
'downloader/response_bytes': 23623787,
'downloader/response_count': 1077,
'downloader/response_status_count/200': 1077,
'dupefilter/filtered': 14950,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 16, 58, 37, 473215),
'item_scraped_count': 1077,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 51838976,
'memusage/startup': 51838976,
'request_depth_max': 8,
'response_received_count': 1077,
'scheduler/dequeued': 1077,
'scheduler/dequeued/memory': 1077,
'scheduler/enqueued': 1100,
'scheduler/enqueued/memory': 1100,
'start_time': datetime.datetime(2018, 12, 3, 16, 58, 25, 606346)}
2018-12-03 16:58:37 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 89.62789801221425 Median : 89.2150649171547 Std Dev : 1.1098106921958728
Executing scrapy-bench --n-runs 1 broadworm in /home/nikita/ves/scrapy-bench-3.6/
2018-12-04 16:23:18 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: broadspider)
2018-12-04 16:23:18 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-04 16:23:18 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'BOT_NAME': 'broadspider', 'CLOSESPIDER_ITEMCOUNT': 800, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'broad.spiders', 'REACTOR_THREADPOOL_MAXSIZE': 20, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['broad.spiders']}
2018-12-04 16:23:18 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.throttle.AutoThrottle']
2018-12-04 16:23:18 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-04 16:23:18 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-04 16:23:18 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-04 16:23:18 [scrapy.core.engine] INFO: Spider opened
2018-12-04 16:23:18 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-04 16:23:18 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-04 16:23:21 [scrapy.extensions.logstats] INFO: Crawled 49 pages (at 980 pages/min), scraped 48 items (at 960 items/min)
2018-12-04 16:23:24 [scrapy.extensions.logstats] INFO: Crawled 58 pages (at 180 pages/min), scraped 58 items (at 200 items/min)
2018-12-04 16:23:27 [scrapy.extensions.logstats] INFO: Crawled 77 pages (at 380 pages/min), scraped 75 items (at 340 items/min)
2018-12-04 16:23:30 [scrapy.extensions.logstats] INFO: Crawled 116 pages (at 780 pages/min), scraped 115 items (at 800 items/min)
2018-12-04 16:23:33 [scrapy.extensions.logstats] INFO: Crawled 194 pages (at 1560 pages/min), scraped 192 items (at 1540 items/min)
2018-12-04 16:23:36 [scrapy.extensions.logstats] INFO: Crawled 260 pages (at 1320 pages/min), scraped 258 items (at 1320 items/min)
2018-12-04 16:23:39 [scrapy.extensions.logstats] INFO: Crawled 310 pages (at 1000 pages/min), scraped 308 items (at 1000 items/min)
2018-12-04 16:23:42 [scrapy.extensions.logstats] INFO: Crawled 352 pages (at 840 pages/min), scraped 351 items (at 860 items/min)
2018-12-04 16:23:45 [scrapy.extensions.logstats] INFO: Crawled 384 pages (at 640 pages/min), scraped 384 items (at 660 items/min)
2018-12-04 16:23:48 [scrapy.extensions.logstats] INFO: Crawled 407 pages (at 460 pages/min), scraped 406 items (at 440 items/min)
2018-12-04 16:23:51 [scrapy.extensions.logstats] INFO: Crawled 428 pages (at 420 pages/min), scraped 427 items (at 420 items/min)
2018-12-04 16:23:54 [scrapy.extensions.logstats] INFO: Crawled 457 pages (at 580 pages/min), scraped 455 items (at 560 items/min)
2018-12-04 16:23:57 [scrapy.extensions.logstats] INFO: Crawled 489 pages (at 640 pages/min), scraped 487 items (at 640 items/min)
2018-12-04 16:24:00 [scrapy.extensions.logstats] INFO: Crawled 515 pages (at 520 pages/min), scraped 514 items (at 540 items/min)
2018-12-04 16:24:03 [scrapy.extensions.logstats] INFO: Crawled 536 pages (at 420 pages/min), scraped 535 items (at 420 items/min)
2018-12-04 16:24:06 [scrapy.extensions.logstats] INFO: Crawled 560 pages (at 480 pages/min), scraped 559 items (at 480 items/min)
2018-12-04 16:24:09 [scrapy.extensions.logstats] INFO: Crawled 588 pages (at 560 pages/min), scraped 586 items (at 540 items/min)
2018-12-04 16:24:12 [scrapy.extensions.logstats] INFO: Crawled 611 pages (at 460 pages/min), scraped 611 items (at 500 items/min)
2018-12-04 16:24:15 [scrapy.extensions.logstats] INFO: Crawled 635 pages (at 480 pages/min), scraped 635 items (at 480 items/min)
2018-12-04 16:24:18 [scrapy.extensions.logstats] INFO: Crawled 660 pages (at 500 pages/min), scraped 659 items (at 480 items/min)
2018-12-04 16:24:21 [scrapy.extensions.logstats] INFO: Crawled 679 pages (at 380 pages/min), scraped 678 items (at 380 items/min)
2018-12-04 16:24:24 [scrapy.extensions.logstats] INFO: Crawled 684 pages (at 100 pages/min), scraped 684 items (at 120 items/min)
2018-12-04 16:24:27 [scrapy.extensions.logstats] INFO: Crawled 696 pages (at 240 pages/min), scraped 694 items (at 200 items/min)
2018-12-04 16:24:30 [scrapy.extensions.logstats] INFO: Crawled 701 pages (at 100 pages/min), scraped 700 items (at 120 items/min)
2018-12-04 16:24:33 [scrapy.extensions.logstats] INFO: Crawled 705 pages (at 80 pages/min), scraped 705 items (at 100 items/min)
2018-12-04 16:24:36 [scrapy.extensions.logstats] INFO: Crawled 720 pages (at 300 pages/min), scraped 719 items (at 280 items/min)
2018-12-04 16:24:39 [scrapy.extensions.logstats] INFO: Crawled 740 pages (at 400 pages/min), scraped 739 items (at 400 items/min)
2018-12-04 16:24:42 [scrapy.extensions.logstats] INFO: Crawled 762 pages (at 440 pages/min), scraped 762 items (at 460 items/min)
2018-12-04 16:24:45 [scrapy.extensions.logstats] INFO: Crawled 770 pages (at 160 pages/min), scraped 770 items (at 160 items/min)
2018-12-04 16:24:48 [scrapy.extensions.logstats] INFO: Crawled 771 pages (at 20 pages/min), scraped 771 items (at 20 items/min)
2018-12-04 16:24:51 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 80 pages/min), scraped 775 items (at 80 items/min)
2018-12-04 16:24:54 [scrapy.extensions.logstats] INFO: Crawled 777 pages (at 40 pages/min), scraped 777 items (at 40 items/min)
2018-12-04 16:24:57 [scrapy.extensions.logstats] INFO: Crawled 780 pages (at 60 pages/min), scraped 780 items (at 60 items/min)
2018-12-04 16:25:00 [scrapy.extensions.logstats] INFO: Crawled 786 pages (at 120 pages/min), scraped 786 items (at 120 items/min)
2018-12-04 16:25:03 [scrapy.extensions.logstats] INFO: Crawled 789 pages (at 60 pages/min), scraped 789 items (at 60 items/min)
2018-12-04 16:25:06 [scrapy.extensions.logstats] INFO: Crawled 794 pages (at 100 pages/min), scraped 794 items (at 100 items/min)
2018-12-04 16:25:09 [scrapy.extensions.logstats] INFO: Crawled 796 pages (at 40 pages/min), scraped 796 items (at 40 items/min)
2018-12-04 16:25:12 [scrapy.extensions.logstats] INFO: Crawled 798 pages (at 40 pages/min), scraped 798 items (at 40 items/min)
2018-12-04 16:25:13 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-04 16:25:15 [scrapy.extensions.logstats] INFO: Crawled 802 pages (at 80 pages/min), scraped 802 items (at 80 items/min)
2018-12-04 16:25:18 [scrapy.extensions.logstats] INFO: Crawled 806 pages (at 80 pages/min), scraped 806 items (at 80 items/min)
2018-12-04 16:25:21 [scrapy.extensions.logstats] INFO: Crawled 808 pages (at 40 pages/min), scraped 808 items (at 40 items/min)
2018-12-04 16:25:24 [scrapy.extensions.logstats] INFO: Crawled 810 pages (at 40 pages/min), scraped 810 items (at 40 items/min)
2018-12-04 16:25:27 [scrapy.extensions.logstats] INFO: Crawled 814 pages (at 80 pages/min), scraped 813 items (at 60 items/min)
2018-12-04 16:25:30 [scrapy.extensions.logstats] INFO: Crawled 816 pages (at 40 pages/min), scraped 816 items (at 60 items/min)
2018-12-04 16:25:33 [scrapy.extensions.logstats] INFO: Crawled 817 pages (at 20 pages/min), scraped 817 items (at 20 items/min)
2018-12-04 16:25:36 [scrapy.extensions.logstats] INFO: Crawled 819 pages (at 40 pages/min), scraped 819 items (at 40 items/min)
2018-12-04 16:25:39 [scrapy.extensions.logstats] INFO: Crawled 821 pages (at 40 pages/min), scraped 821 items (at 40 items/min)
2018-12-04 16:25:42 [scrapy.extensions.logstats] INFO: Crawled 824 pages (at 60 pages/min), scraped 823 items (at 40 items/min)
2018-12-04 16:25:45 [scrapy.extensions.logstats] INFO: Crawled 827 pages (at 60 pages/min), scraped 827 items (at 80 items/min)
2018-12-04 16:25:48 [scrapy.extensions.logstats] INFO: Crawled 828 pages (at 20 pages/min), scraped 828 items (at 20 items/min)
2018-12-04 16:25:51 [scrapy.extensions.logstats] INFO: Crawled 832 pages (at 80 pages/min), scraped 832 items (at 80 items/min)
2018-12-04 16:25:54 [scrapy.extensions.logstats] INFO: Crawled 834 pages (at 40 pages/min), scraped 834 items (at 40 items/min)
2018-12-04 16:25:57 [scrapy.extensions.logstats] INFO: Crawled 838 pages (at 80 pages/min), scraped 838 items (at 80 items/min)
2018-12-04 16:26:00 [scrapy.extensions.logstats] INFO: Crawled 839 pages (at 20 pages/min), scraped 839 items (at 20 items/min)
2018-12-04 16:26:03 [scrapy.extensions.logstats] INFO: Crawled 840 pages (at 20 pages/min), scraped 840 items (at 20 items/min)
2018-12-04 16:26:06 [scrapy.extensions.logstats] INFO: Crawled 844 pages (at 80 pages/min), scraped 844 items (at 80 items/min)
2018-12-04 16:26:09 [scrapy.extensions.logstats] INFO: Crawled 847 pages (at 60 pages/min), scraped 847 items (at 60 items/min)
2018-12-04 16:26:12 [scrapy.extensions.logstats] INFO: Crawled 847 pages (at 0 pages/min), scraped 847 items (at 0 items/min)
2018-12-04 16:26:15 [scrapy.extensions.logstats] INFO: Crawled 849 pages (at 40 pages/min), scraped 849 items (at 40 items/min)
2018-12-04 16:26:18 [scrapy.extensions.logstats] INFO: Crawled 850 pages (at 20 pages/min), scraped 850 items (at 20 items/min)
2018-12-04 16:26:21 [scrapy.extensions.logstats] INFO: Crawled 856 pages (at 120 pages/min), scraped 855 items (at 100 items/min)
2018-12-04 16:26:24 [scrapy.extensions.logstats] INFO: Crawled 856 pages (at 0 pages/min), scraped 856 items (at 20 items/min)
2018-12-04 16:26:27 [scrapy.extensions.logstats] INFO: Crawled 857 pages (at 20 pages/min), scraped 857 items (at 20 items/min)
2018-12-04 16:26:30 [scrapy.extensions.logstats] INFO: Crawled 858 pages (at 20 pages/min), scraped 858 items (at 20 items/min)
2018-12-04 16:26:33 [scrapy.extensions.logstats] INFO: Crawled 858 pages (at 0 pages/min), scraped 858 items (at 0 items/min)
2018-12-04 16:26:36 [scrapy.extensions.logstats] INFO: Crawled 861 pages (at 60 pages/min), scraped 861 items (at 60 items/min)
2018-12-04 16:26:39 [scrapy.extensions.logstats] INFO: Crawled 863 pages (at 40 pages/min), scraped 863 items (at 40 items/min)
2018-12-04 16:26:42 [scrapy.extensions.logstats] INFO: Crawled 865 pages (at 40 pages/min), scraped 865 items (at 40 items/min)
2018-12-04 16:26:45 [scrapy.extensions.logstats] INFO: Crawled 865 pages (at 0 pages/min), scraped 865 items (at 0 items/min)
2018-12-04 16:26:48 [scrapy.extensions.logstats] INFO: Crawled 867 pages (at 40 pages/min), scraped 867 items (at 40 items/min)
2018-12-04 16:26:51 [scrapy.extensions.logstats] INFO: Crawled 869 pages (at 40 pages/min), scraped 869 items (at 40 items/min)
2018-12-04 16:26:54 [scrapy.extensions.logstats] INFO: Crawled 870 pages (at 20 pages/min), scraped 870 items (at 20 items/min)
2018-12-04 16:26:57 [scrapy.extensions.logstats] INFO: Crawled 871 pages (at 20 pages/min), scraped 871 items (at 20 items/min)
2018-12-04 16:27:00 [scrapy.extensions.logstats] INFO: Crawled 872 pages (at 20 pages/min), scraped 872 items (at 20 items/min)
2018-12-04 16:27:03 [scrapy.extensions.logstats] INFO: Crawled 873 pages (at 20 pages/min), scraped 873 items (at 20 items/min)
2018-12-04 16:27:06 [scrapy.extensions.logstats] INFO: Crawled 873 pages (at 0 pages/min), scraped 873 items (at 0 items/min)
2018-12-04 16:27:09 [scrapy.extensions.logstats] INFO: Crawled 875 pages (at 40 pages/min), scraped 875 items (at 40 items/min)
2018-12-04 16:27:12 [scrapy.extensions.logstats] INFO: Crawled 877 pages (at 40 pages/min), scraped 877 items (at 40 items/min)
2018-12-04 16:27:15 [scrapy.extensions.logstats] INFO: Crawled 877 pages (at 0 pages/min), scraped 877 items (at 0 items/min)
2018-12-04 16:27:18 [scrapy.extensions.logstats] INFO: Crawled 878 pages (at 20 pages/min), scraped 877 items (at 0 items/min)
2018-12-04 16:27:21 [scrapy.extensions.logstats] INFO: Crawled 878 pages (at 0 pages/min), scraped 878 items (at 20 items/min)
2018-12-04 16:27:24 [scrapy.extensions.logstats] INFO: Crawled 879 pages (at 20 pages/min), scraped 879 items (at 20 items/min)
2018-12-04 16:27:27 [scrapy.extensions.logstats] INFO: Crawled 881 pages (at 40 pages/min), scraped 881 items (at 40 items/min)
2018-12-04 16:27:30 [scrapy.extensions.logstats] INFO: Crawled 881 pages (at 0 pages/min), scraped 881 items (at 0 items/min)
2018-12-04 16:27:33 [scrapy.extensions.logstats] INFO: Crawled 883 pages (at 40 pages/min), scraped 883 items (at 40 items/min)
2018-12-04 16:27:36 [scrapy.extensions.logstats] INFO: Crawled 884 pages (at 20 pages/min), scraped 884 items (at 20 items/min)
2018-12-04 16:27:39 [scrapy.extensions.logstats] INFO: Crawled 886 pages (at 40 pages/min), scraped 886 items (at 40 items/min)
2018-12-04 16:27:42 [scrapy.extensions.logstats] INFO: Crawled 887 pages (at 20 pages/min), scraped 887 items (at 20 items/min)
2018-12-04 16:27:45 [scrapy.extensions.logstats] INFO: Crawled 888 pages (at 20 pages/min), scraped 888 items (at 20 items/min)
2018-12-04 16:27:48 [scrapy.extensions.logstats] INFO: Crawled 888 pages (at 0 pages/min), scraped 888 items (at 0 items/min)
2018-12-04 16:27:51 [scrapy.extensions.logstats] INFO: Crawled 888 pages (at 0 pages/min), scraped 888 items (at 0 items/min)
2018-12-04 16:27:54 [scrapy.extensions.logstats] INFO: Crawled 891 pages (at 60 pages/min), scraped 891 items (at 60 items/min)
2018-12-04 16:27:57 [scrapy.extensions.logstats] INFO: Crawled 892 pages (at 20 pages/min), scraped 892 items (at 20 items/min)
2018-12-04 16:28:00 [scrapy.extensions.logstats] INFO: Crawled 893 pages (at 20 pages/min), scraped 893 items (at 20 items/min)
2018-12-04 16:28:03 [scrapy.extensions.logstats] INFO: Crawled 895 pages (at 40 pages/min), scraped 895 items (at 40 items/min)
2018-12-04 16:28:06 [scrapy.extensions.logstats] INFO: Crawled 896 pages (at 20 pages/min), scraped 896 items (at 20 items/min)
2018-12-04 16:28:09 [scrapy.extensions.logstats] INFO: Crawled 898 pages (at 40 pages/min), scraped 898 items (at 40 items/min)
2018-12-04 16:28:12 [scrapy.extensions.logstats] INFO: Crawled 899 pages (at 20 pages/min), scraped 899 items (at 20 items/min)
2018-12-04 16:28:15 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 20 pages/min), scraped 900 items (at 20 items/min)
2018-12-04 16:28:18 [scrapy.extensions.logstats] INFO: Crawled 900 pages (at 0 pages/min), scraped 900 items (at 0 items/min)
2018-12-04 16:28:21 [scrapy.extensions.logstats] INFO: Crawled 901 pages (at 20 pages/min), scraped 901 items (at 20 items/min)
2018-12-04 16:28:24 [scrapy.extensions.logstats] INFO: Crawled 902 pages (at 20 pages/min), scraped 902 items (at 20 items/min)
2018-12-04 16:28:27 [scrapy.extensions.logstats] INFO: Crawled 902 pages (at 0 pages/min), scraped 902 items (at 0 items/min)
2018-12-04 16:28:30 [scrapy.extensions.logstats] INFO: Crawled 902 pages (at 0 pages/min), scraped 902 items (at 0 items/min)
2018-12-04 16:28:33 [scrapy.extensions.logstats] INFO: Crawled 905 pages (at 60 pages/min), scraped 905 items (at 60 items/min)
2018-12-04 16:28:36 [scrapy.extensions.logstats] INFO: Crawled 905 pages (at 0 pages/min), scraped 905 items (at 0 items/min)
2018-12-04 16:28:39 [scrapy.extensions.logstats] INFO: Crawled 906 pages (at 20 pages/min), scraped 906 items (at 20 items/min)
2018-12-04 16:28:42 [scrapy.extensions.logstats] INFO: Crawled 907 pages (at 20 pages/min), scraped 907 items (at 20 items/min)
2018-12-04 16:28:45 [scrapy.extensions.logstats] INFO: Crawled 908 pages (at 20 pages/min), scraped 908 items (at 20 items/min)
2018-12-04 16:28:48 [scrapy.extensions.logstats] INFO: Crawled 910 pages (at 40 pages/min), scraped 910 items (at 40 items/min)
2018-12-04 16:28:51 [scrapy.extensions.logstats] INFO: Crawled 910 pages (at 0 pages/min), scraped 910 items (at 0 items/min)
2018-12-04 16:28:54 [scrapy.extensions.logstats] INFO: Crawled 910 pages (at 0 pages/min), scraped 910 items (at 0 items/min)
2018-12-04 16:28:57 [scrapy.extensions.logstats] INFO: Crawled 911 pages (at 20 pages/min), scraped 911 items (at 20 items/min)
2018-12-04 16:29:00 [scrapy.extensions.logstats] INFO: Crawled 911 pages (at 0 pages/min), scraped 911 items (at 0 items/min)
2018-12-04 16:29:03 [scrapy.extensions.logstats] INFO: Crawled 912 pages (at 20 pages/min), scraped 912 items (at 20 items/min)
2018-12-04 16:29:06 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 20 pages/min), scraped 913 items (at 20 items/min)
2018-12-04 16:29:09 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 0 pages/min), scraped 913 items (at 0 items/min)
2018-12-04 16:29:12 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 0 pages/min), scraped 913 items (at 0 items/min)
2018-12-04 16:29:15 [scrapy.extensions.logstats] INFO: Crawled 913 pages (at 0 pages/min), scraped 913 items (at 0 items/min)
2018-12-04 16:29:18 [scrapy.extensions.logstats] INFO: Crawled 914 pages (at 20 pages/min), scraped 914 items (at 20 items/min)
2018-12-04 16:29:21 [scrapy.extensions.logstats] INFO: Crawled 914 pages (at 0 pages/min), scraped 914 items (at 0 items/min)
2018-12-04 16:29:24 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 20 pages/min), scraped 915 items (at 20 items/min)
2018-12-04 16:29:27 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 0 pages/min), scraped 915 items (at 0 items/min)
2018-12-04 16:29:30 [scrapy.extensions.logstats] INFO: Crawled 915 pages (at 0 pages/min), scraped 915 items (at 0 items/min)
2018-12-04 16:29:33 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 20 pages/min), scraped 916 items (at 20 items/min)
2018-12-04 16:29:36 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 0 pages/min), scraped 916 items (at 0 items/min)
2018-12-04 16:29:39 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 0 pages/min), scraped 916 items (at 0 items/min)
2018-12-04 16:29:42 [scrapy.extensions.logstats] INFO: Crawled 916 pages (at 0 pages/min), scraped 916 items (at 0 items/min)
2018-12-04 16:29:45 [scrapy.extensions.logstats] INFO: Crawled 918 pages (at 40 pages/min), scraped 918 items (at 40 items/min)
2018-12-04 16:29:48 [scrapy.extensions.logstats] INFO: Crawled 918 pages (at 0 pages/min), scraped 918 items (at 0 items/min)
2018-12-04 16:29:50 [scrapy.extensions.feedexport] INFO: Stored csv feed (920 items) in: items.csv
The average speed of the spider is 2.341064071191682 items/sec
2018-12-04 16:29:50 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 299263,
'downloader/request_count': 920,
'downloader/request_method_count/GET': 920,
'downloader/response_bytes': 26395440,
'downloader/response_count': 920,
'downloader/response_status_count/200': 920,
'dupefilter/filtered': 18812,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 4, 16, 29, 50, 879085),
'item_scraped_count': 920,
'log_count/INFO': 139,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/BroadBenchSpider': 1,
'memdebug/live_refs/Request': 6882,
'memusage/max': 100548608,
'memusage/startup': 52174848,
'request_depth_max': 51,
'response_received_count': 920,
'scheduler/dequeued': 920,
'scheduler/dequeued/memory': 920,
'scheduler/enqueued': 7801,
'scheduler/enqueued/memory': 7801,
'start_time': datetime.datetime(2018, 12, 4, 16, 23, 18, 317620)}
2018-12-04 16:29:50 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Broad Crawl' Iterations = '1'
Mean : 2.341064071191682 Median : 2.341064071191682 Std Dev : 0.0
Executing scrapy-bench --n-runs 5 --book_url=http://172.17.0.6:8880 bookworm in /home/nikita/ves/scrapy-bench-3.6/
2018-12-03 17:00:23 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 17:00:23 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 17:00:23 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 17:00:23 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 17:00:23 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 17:00:23 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 17:00:23 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 17:00:23 [scrapy.core.engine] INFO: Spider opened
2018-12-03 17:00:23 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 17:00:23 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 17:00:26 [scrapy.extensions.logstats] INFO: Crawled 260 pages (at 5200 pages/min), scraped 201 items (at 4020 items/min)
2018-12-03 17:00:29 [scrapy.extensions.logstats] INFO: Crawled 554 pages (at 5880 pages/min), scraped 492 items (at 5820 items/min)
2018-12-03 17:00:32 [scrapy.extensions.logstats] INFO: Crawled 806 pages (at 5040 pages/min), scraped 801 items (at 6180 items/min)
2018-12-03 17:00:34 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 17:00:35 [scrapy.extensions.feedexport] INFO: Stored csv feed (1067 items) in: items.csv
The average speed of the spider is 90.25091217578672 items/sec
2018-12-03 17:00:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 371877,
'downloader/request_count': 1067,
'downloader/request_method_count/GET': 1067,
'downloader/response_bytes': 23429941,
'downloader/response_count': 1067,
'downloader/response_status_count/200': 1067,
'dupefilter/filtered': 14860,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 17, 0, 35, 160678),
'item_scraped_count': 1067,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 51843072,
'memusage/startup': 51843072,
'request_depth_max': 8,
'response_received_count': 1067,
'scheduler/dequeued': 1067,
'scheduler/dequeued/memory': 1067,
'scheduler/enqueued': 1100,
'scheduler/enqueued/memory': 1100,
'start_time': datetime.datetime(2018, 12, 3, 17, 0, 23, 505147)}
2018-12-03 17:00:35 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-12-03 17:00:35 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 17:00:35 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 17:00:35 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 17:00:35 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 17:00:35 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 17:00:35 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 17:00:35 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 17:00:35 [scrapy.core.engine] INFO: Spider opened
2018-12-03 17:00:35 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 17:00:35 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 17:00:39 [scrapy.extensions.logstats] INFO: Crawled 244 pages (at 4880 pages/min), scraped 206 items (at 4120 items/min)
2018-12-03 17:00:41 [scrapy.extensions.logstats] INFO: Crawled 545 pages (at 6020 pages/min), scraped 482 items (at 5520 items/min)
2018-12-03 17:00:44 [scrapy.extensions.logstats] INFO: Crawled 804 pages (at 5180 pages/min), scraped 795 items (at 6260 items/min)
2018-12-03 17:00:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 17:00:47 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 89.73449994942929 items/sec
2018-12-03 17:00:47 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 376063,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23726256,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 17, 0, 47, 623049),
'item_scraped_count': 1079,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 51810304,
'memusage/startup': 51810304,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 12, 3, 17, 0, 35, 728579)}
2018-12-03 17:00:47 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-12-03 17:00:48 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 17:00:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 17:00:48 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 17:00:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 17:00:48 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 17:00:48 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 17:00:48 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 17:00:48 [scrapy.core.engine] INFO: Spider opened
2018-12-03 17:00:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 17:00:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 17:00:51 [scrapy.extensions.logstats] INFO: Crawled 256 pages (at 5120 pages/min), scraped 204 items (at 4080 items/min)
2018-12-03 17:00:54 [scrapy.extensions.logstats] INFO: Crawled 547 pages (at 5820 pages/min), scraped 489 items (at 5700 items/min)
2018-12-03 17:00:57 [scrapy.extensions.logstats] INFO: Crawled 823 pages (at 5520 pages/min), scraped 794 items (at 6100 items/min)
2018-12-03 17:00:59 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 17:00:59 [scrapy.extensions.feedexport] INFO: Stored csv feed (1069 items) in: items.csv
The average speed of the spider is 90.40642489479762 items/sec
2018-12-03 17:00:59 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 372636,
'downloader/request_count': 1069,
'downloader/request_method_count/GET': 1069,
'downloader/response_bytes': 23532410,
'downloader/response_count': 1069,
'downloader/response_status_count/200': 1069,
'dupefilter/filtered': 15006,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 17, 0, 59, 844157),
'item_scraped_count': 1069,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 51777536,
'memusage/startup': 51777536,
'request_depth_max': 9,
'response_received_count': 1069,
'scheduler/dequeued': 1069,
'scheduler/dequeued/memory': 1069,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 12, 3, 17, 0, 48, 202209)}
2018-12-03 17:00:59 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-12-03 17:01:00 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 17:01:00 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 17:01:00 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 17:01:00 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 17:01:00 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 17:01:00 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 17:01:00 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 17:01:00 [scrapy.core.engine] INFO: Spider opened
2018-12-03 17:01:00 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 17:01:00 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 17:01:03 [scrapy.extensions.logstats] INFO: Crawled 251 pages (at 5020 pages/min), scraped 194 items (at 3880 items/min)
2018-12-03 17:01:06 [scrapy.extensions.logstats] INFO: Crawled 532 pages (at 5620 pages/min), scraped 485 items (at 5820 items/min)
2018-12-03 17:01:09 [scrapy.extensions.logstats] INFO: Crawled 842 pages (at 6200 pages/min), scraped 784 items (at 5980 items/min)
2018-12-03 17:01:11 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 17:01:12 [scrapy.extensions.feedexport] INFO: Stored csv feed (1078 items) in: items.csv
The average speed of the spider is 88.46023391045921 items/sec
2018-12-03 17:01:12 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 375744,
'downloader/request_count': 1078,
'downloader/request_method_count/GET': 1078,
'downloader/response_bytes': 23674780,
'downloader/response_count': 1078,
'downloader/response_status_count/200': 1078,
'dupefilter/filtered': 15023,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 17, 1, 12, 204779),
'item_scraped_count': 1078,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 51875840,
'memusage/startup': 51875840,
'request_depth_max': 9,
'response_received_count': 1078,
'scheduler/dequeued': 1078,
'scheduler/dequeued/memory': 1078,
'scheduler/enqueued': 1101,
'scheduler/enqueued/memory': 1101,
'start_time': datetime.datetime(2018, 12, 3, 17, 1, 0, 413734)}
2018-12-03 17:01:12 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-12-03 17:01:12 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-12-03 17:01:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Sep 27 2018, 15:23:50) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-4.4.0-134-generic-x86_64-with-debian-jessie-sid
2018-12-03 17:01:12 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-12-03 17:01:12 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-12-03 17:01:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-12-03 17:01:12 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-12-03 17:01:12 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-12-03 17:01:12 [scrapy.core.engine] INFO: Spider opened
2018-12-03 17:01:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-12-03 17:01:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2018-12-03 17:01:16 [scrapy.extensions.logstats] INFO: Crawled 257 pages (at 5140 pages/min), scraped 204 items (at 4080 items/min)
2018-12-03 17:01:18 [scrapy.extensions.logstats] INFO: Crawled 493 pages (at 4720 pages/min), scraped 485 items (at 5620 items/min)
2018-12-03 17:01:21 [scrapy.extensions.logstats] INFO: Crawled 835 pages (at 6840 pages/min), scraped 781 items (at 5920 items/min)
2018-12-03 17:01:24 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-12-03 17:01:24 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 91.11756327055262 items/sec
2018-12-03 17:01:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 376063,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23726256,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 12, 3, 17, 1, 24, 581383),
'item_scraped_count': 1079,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 51843072,
'memusage/startup': 51843072,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 12, 3, 17, 1, 12, 773580)}
2018-12-03 17:01:24 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 89.99392684020509 Median : 90.25091217578672 Std Dev : 0.8852424808786434
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment