Skip to content

Instantly share code, notes, and snippets.

@whalebot-helmsman
Created November 29, 2018 10:31
Show Gist options
  • Save whalebot-helmsman/d27104d9d328ae8a9331a8623ef931fd to your computer and use it in GitHub Desktop.
Save whalebot-helmsman/d27104d9d328ae8a9331a8623ef931fd to your computer and use it in GitHub Desktop.
scrapy-bench for round-robin and downloaderaware
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/
2018-11-29 10:17:09 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:17:09 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:17:09 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-11-29 10:17:09 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:17:09 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:17:09 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:17:09 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:17:09 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:17:09 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:17:09 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:17:12 [scrapy.extensions.logstats] INFO: Crawled 120 pages (at 2400 pages/min), scraped 104 items (at 2080 items/min)
2018-11-29 10:17:15 [scrapy.extensions.logstats] INFO: Crawled 296 pages (at 3520 pages/min), scraped 277 items (at 3460 items/min)
2018-11-29 10:17:18 [scrapy.extensions.logstats] INFO: Crawled 469 pages (at 3460 pages/min), scraped 445 items (at 3360 items/min)
2018-11-29 10:17:21 [scrapy.extensions.logstats] INFO: Crawled 628 pages (at 3180 pages/min), scraped 621 items (at 3520 items/min)
2018-11-29 10:17:24 [scrapy.extensions.logstats] INFO: Crawled 802 pages (at 3480 pages/min), scraped 794 items (at 3460 items/min)
2018-11-29 10:17:27 [scrapy.extensions.logstats] INFO: Crawled 986 pages (at 3680 pages/min), scraped 978 items (at 3680 items/min)
2018-11-29 10:17:27 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:17:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv
The average speed of the spider is 54.3871510199 items/sec
2018-11-29 10:17:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362478,
'downloader/request_count': 1044,
'downloader/request_method_count/GET': 1044,
'downloader/response_bytes': 22995909,
'downloader/response_count': 1044,
'downloader/response_status_count/200': 1044,
'dupefilter/filtered': 14757,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 17, 28, 621201),
'item_scraped_count': 1044,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 18,
'memusage/max': 46665728,
'memusage/startup': 46665728,
'request_depth_max': 10,
'response_received_count': 1044,
'scheduler/dequeued': 1044,
'scheduler/dequeued/memory': 1044,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 17, 9, 394821)}
2018-11-29 10:17:28 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:17:29 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:17:29 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:17:29 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-11-29 10:17:29 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:17:29 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:17:29 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:17:29 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:17:29 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:17:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:17:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:17:32 [scrapy.extensions.logstats] INFO: Crawled 113 pages (at 2260 pages/min), scraped 105 items (at 2100 items/min)
2018-11-29 10:17:35 [scrapy.extensions.logstats] INFO: Crawled 317 pages (at 4080 pages/min), scraped 284 items (at 3580 items/min)
2018-11-29 10:17:38 [scrapy.extensions.logstats] INFO: Crawled 479 pages (at 3240 pages/min), scraped 464 items (at 3600 items/min)
2018-11-29 10:17:41 [scrapy.extensions.logstats] INFO: Crawled 660 pages (at 3620 pages/min), scraped 637 items (at 3460 items/min)
2018-11-29 10:17:44 [scrapy.extensions.logstats] INFO: Crawled 824 pages (at 3280 pages/min), scraped 816 items (at 3580 items/min)
2018-11-29 10:17:47 [scrapy.extensions.logstats] INFO: Crawled 1012 pages (at 3760 pages/min), scraped 989 items (at 3460 items/min)
2018-11-29 10:17:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:17:48 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 54.9992442872 items/sec
2018-11-29 10:17:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363581,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 17, 48, 320235),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46661632,
'memusage/startup': 46661632,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 17, 29, 261736)}
2018-11-29 10:17:48 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:17:48 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:17:48 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:17:48 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-11-29 10:17:48 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:17:48 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:17:48 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:17:48 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:17:48 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:17:48 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:17:48 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:17:52 [scrapy.extensions.logstats] INFO: Crawled 120 pages (at 2400 pages/min), scraped 105 items (at 2100 items/min)
2018-11-29 10:17:54 [scrapy.extensions.logstats] INFO: Crawled 289 pages (at 3380 pages/min), scraped 282 items (at 3540 items/min)
2018-11-29 10:17:58 [scrapy.extensions.logstats] INFO: Crawled 464 pages (at 3500 pages/min), scraped 454 items (at 3440 items/min)
2018-11-29 10:18:00 [scrapy.extensions.logstats] INFO: Crawled 646 pages (at 3640 pages/min), scraped 618 items (at 3280 items/min)
2018-11-29 10:18:03 [scrapy.extensions.logstats] INFO: Crawled 822 pages (at 3520 pages/min), scraped 791 items (at 3460 items/min)
2018-11-29 10:18:07 [scrapy.extensions.logstats] INFO: Crawled 963 pages (at 2820 pages/min), scraped 963 items (at 3440 items/min)
2018-11-29 10:18:07 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:18:08 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv
The average speed of the spider is 53.5703756542 items/sec
2018-11-29 10:18:08 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362729,
'downloader/request_count': 1044,
'downloader/request_method_count/GET': 1044,
'downloader/response_bytes': 22995909,
'downloader/response_count': 1044,
'downloader/response_status_count/200': 1044,
'dupefilter/filtered': 14757,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 18, 8, 455063),
'item_scraped_count': 1044,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 18,
'memusage/max': 46657536,
'memusage/startup': 46657536,
'request_depth_max': 10,
'response_received_count': 1044,
'scheduler/dequeued': 1044,
'scheduler/dequeued/memory': 1044,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 17, 48, 946302)}
2018-11-29 10:18:08 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:18:09 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:18:09 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:18:09 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-11-29 10:18:09 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:18:09 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:18:09 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:18:09 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:18:09 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:18:09 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:18:09 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:18:12 [scrapy.extensions.logstats] INFO: Crawled 126 pages (at 2520 pages/min), scraped 103 items (at 2060 items/min)
2018-11-29 10:18:15 [scrapy.extensions.logstats] INFO: Crawled 289 pages (at 3260 pages/min), scraped 281 items (at 3560 items/min)
2018-11-29 10:18:18 [scrapy.extensions.logstats] INFO: Crawled 470 pages (at 3620 pages/min), scraped 445 items (at 3280 items/min)
2018-11-29 10:18:21 [scrapy.extensions.logstats] INFO: Crawled 635 pages (at 3300 pages/min), scraped 624 items (at 3580 items/min)
2018-11-29 10:18:24 [scrapy.extensions.logstats] INFO: Crawled 797 pages (at 3240 pages/min), scraped 790 items (at 3320 items/min)
2018-11-29 10:18:27 [scrapy.extensions.logstats] INFO: Crawled 975 pages (at 3560 pages/min), scraped 962 items (at 3440 items/min)
2018-11-29 10:18:27 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:18:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (1043 items) in: items.csv
The average speed of the spider is 53.8518536026 items/sec
2018-11-29 10:18:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362230,
'downloader/request_count': 1043,
'downloader/request_method_count/GET': 1043,
'downloader/response_bytes': 22941621,
'downloader/response_count': 1043,
'downloader/response_status_count/200': 1043,
'dupefilter/filtered': 14684,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 18, 28, 483271),
'item_scraped_count': 1043,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 19,
'memusage/max': 46653440,
'memusage/startup': 46653440,
'request_depth_max': 10,
'response_received_count': 1043,
'scheduler/dequeued': 1043,
'scheduler/dequeued/memory': 1043,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 18, 9, 113817)}
2018-11-29 10:18:28 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:18:29 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:18:29 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:18:29 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.DownloaderAwarePriorityQueue'}
2018-11-29 10:18:29 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:18:29 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:18:29 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:18:29 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:18:29 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:18:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:18:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:18:32 [scrapy.extensions.logstats] INFO: Crawled 118 pages (at 2360 pages/min), scraped 104 items (at 2080 items/min)
2018-11-29 10:18:35 [scrapy.extensions.logstats] INFO: Crawled 307 pages (at 3780 pages/min), scraped 287 items (at 3660 items/min)
2018-11-29 10:18:38 [scrapy.extensions.logstats] INFO: Crawled 464 pages (at 3140 pages/min), scraped 462 items (at 3500 items/min)
2018-11-29 10:18:41 [scrapy.extensions.logstats] INFO: Crawled 648 pages (at 3680 pages/min), scraped 641 items (at 3580 items/min)
2018-11-29 10:18:44 [scrapy.extensions.logstats] INFO: Crawled 827 pages (at 3580 pages/min), scraped 817 items (at 3520 items/min)
2018-11-29 10:18:47 [scrapy.extensions.logstats] INFO: Crawled 1014 pages (at 3740 pages/min), scraped 989 items (at 3440 items/min)
2018-11-29 10:18:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:18:48 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv
The average speed of the spider is 55.2377428746 items/sec
2018-11-29 10:18:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362630,
'downloader/request_count': 1044,
'downloader/request_method_count/GET': 1044,
'downloader/response_bytes': 22995909,
'downloader/response_count': 1044,
'downloader/response_status_count/200': 1044,
'dupefilter/filtered': 14757,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 18, 48, 44454),
'item_scraped_count': 1044,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 18,
'memusage/max': 46661632,
'memusage/startup': 46661632,
'request_depth_max': 10,
'response_received_count': 1044,
'scheduler/dequeued': 1044,
'scheduler/dequeued/memory': 1044,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 18, 29, 121805)}
2018-11-29 10:18:48 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 54.4092734877 Median : 54.3871510199 Std Dev : 0.640216053568
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/
2018-11-29 09:51:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:51:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:51:33 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 09:51:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 09:51:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:51:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:51:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:51:33 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:51:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:51:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:51:36 [scrapy.extensions.logstats] INFO: Crawled 130 pages (at 2600 pages/min), scraped 111 items (at 2220 items/min)
2018-11-29 09:51:39 [scrapy.extensions.logstats] INFO: Crawled 315 pages (at 3700 pages/min), scraped 298 items (at 3740 items/min)
2018-11-29 09:51:42 [scrapy.extensions.logstats] INFO: Crawled 487 pages (at 3440 pages/min), scraped 467 items (at 3380 items/min)
2018-11-29 09:51:45 [scrapy.extensions.logstats] INFO: Crawled 668 pages (at 3620 pages/min), scraped 639 items (at 3440 items/min)
2018-11-29 09:51:48 [scrapy.extensions.logstats] INFO: Crawled 843 pages (at 3500 pages/min), scraped 816 items (at 3540 items/min)
2018-11-29 09:51:51 [scrapy.extensions.logstats] INFO: Crawled 1007 pages (at 3280 pages/min), scraped 982 items (at 3320 items/min)
2018-11-29 09:51:52 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:51:53 [scrapy.extensions.feedexport] INFO: Stored csv feed (1089 items) in: items.csv
The average speed of the spider is 55.14306299 items/sec
2018-11-29 09:51:53 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 378764,
'downloader/request_count': 1089,
'downloader/request_method_count/GET': 1089,
'downloader/response_bytes': 23896282,
'downloader/response_count': 1089,
'downloader/response_status_count/200': 1089,
'dupefilter/filtered': 15230,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 51, 53, 523689),
'item_scraped_count': 1089,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 27,
'memusage/max': 46616576,
'memusage/startup': 46616576,
'request_depth_max': 10,
'response_received_count': 1089,
'scheduler/dequeued': 1089,
'scheduler/dequeued/memory': 1089,
'scheduler/enqueued': 1115,
'scheduler/enqueued/memory': 1115,
'start_time': datetime.datetime(2018, 11, 29, 9, 51, 33, 770465)}
2018-11-29 09:51:53 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:51:54 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:51:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:51:54 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 09:51:54 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 09:51:54 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:51:54 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:51:54 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:51:54 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:51:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:51:54 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:51:57 [scrapy.extensions.logstats] INFO: Crawled 113 pages (at 2260 pages/min), scraped 106 items (at 2120 items/min)
2018-11-29 09:52:00 [scrapy.extensions.logstats] INFO: Crawled 293 pages (at 3600 pages/min), scraped 289 items (at 3660 items/min)
2018-11-29 09:52:03 [scrapy.extensions.logstats] INFO: Crawled 466 pages (at 3460 pages/min), scraped 452 items (at 3260 items/min)
2018-11-29 09:52:06 [scrapy.extensions.logstats] INFO: Crawled 633 pages (at 3340 pages/min), scraped 603 items (at 3020 items/min)
2018-11-29 09:52:09 [scrapy.extensions.logstats] INFO: Crawled 792 pages (at 3180 pages/min), scraped 785 items (at 3640 items/min)
2018-11-29 09:52:12 [scrapy.extensions.logstats] INFO: Crawled 958 pages (at 3320 pages/min), scraped 956 items (at 3420 items/min)
2018-11-29 09:52:12 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:52:13 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv
The average speed of the spider is 53.5057043813 items/sec
2018-11-29 09:52:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 367329,
'downloader/request_count': 1059,
'downloader/request_method_count/GET': 1059,
'downloader/response_bytes': 23312675,
'downloader/response_count': 1059,
'downloader/response_status_count/200': 1059,
'dupefilter/filtered': 14936,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 52, 13, 969656),
'item_scraped_count': 1059,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 46620672,
'memusage/startup': 46620672,
'request_depth_max': 10,
'response_received_count': 1059,
'scheduler/dequeued': 1059,
'scheduler/dequeued/memory': 1059,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 51, 54, 166172)}
2018-11-29 09:52:13 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:52:14 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:52:14 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:52:14 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 09:52:14 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 09:52:14 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:52:14 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:52:14 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:52:14 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:52:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:52:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:52:17 [scrapy.extensions.logstats] INFO: Crawled 115 pages (at 2300 pages/min), scraped 108 items (at 2160 items/min)
2018-11-29 09:52:20 [scrapy.extensions.logstats] INFO: Crawled 306 pages (at 3820 pages/min), scraped 287 items (at 3580 items/min)
2018-11-29 09:52:23 [scrapy.extensions.logstats] INFO: Crawled 452 pages (at 2920 pages/min), scraped 452 items (at 3300 items/min)
2018-11-29 09:52:26 [scrapy.extensions.logstats] INFO: Crawled 646 pages (at 3880 pages/min), scraped 615 items (at 3260 items/min)
2018-11-29 09:52:29 [scrapy.extensions.logstats] INFO: Crawled 801 pages (at 3100 pages/min), scraped 799 items (at 3680 items/min)
2018-11-29 09:52:32 [scrapy.extensions.logstats] INFO: Crawled 972 pages (at 3420 pages/min), scraped 965 items (at 3320 items/min)
2018-11-29 09:52:33 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:52:34 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 53.599917227 items/sec
2018-11-29 09:52:34 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363375,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 52, 34, 214603),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46620672,
'memusage/startup': 46620672,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 52, 14, 622975)}
2018-11-29 09:52:34 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:52:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:52:34 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:52:34 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 09:52:34 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 09:52:34 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:52:34 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:52:34 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:52:34 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:52:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:52:34 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:52:37 [scrapy.extensions.logstats] INFO: Crawled 122 pages (at 2440 pages/min), scraped 110 items (at 2200 items/min)
2018-11-29 09:52:40 [scrapy.extensions.logstats] INFO: Crawled 308 pages (at 3720 pages/min), scraped 292 items (at 3640 items/min)
2018-11-29 09:52:43 [scrapy.extensions.logstats] INFO: Crawled 474 pages (at 3320 pages/min), scraped 460 items (at 3360 items/min)
2018-11-29 09:52:46 [scrapy.extensions.logstats] INFO: Crawled 651 pages (at 3540 pages/min), scraped 643 items (at 3660 items/min)
2018-11-29 09:52:49 [scrapy.extensions.logstats] INFO: Crawled 817 pages (at 3320 pages/min), scraped 811 items (at 3360 items/min)
2018-11-29 09:52:52 [scrapy.extensions.logstats] INFO: Crawled 1018 pages (at 4020 pages/min), scraped 991 items (at 3600 items/min)
2018-11-29 09:52:53 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:52:53 [scrapy.extensions.feedexport] INFO: Stored csv feed (1028 items) in: items.csv
The average speed of the spider is 55.1098586806 items/sec
2018-11-29 09:52:53 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 356943,
'downloader/request_count': 1028,
'downloader/request_method_count/GET': 1028,
'downloader/response_bytes': 22689867,
'downloader/response_count': 1028,
'downloader/response_status_count/200': 1028,
'dupefilter/filtered': 14613,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 52, 53, 510264),
'item_scraped_count': 1028,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46624768,
'memusage/startup': 46624768,
'request_depth_max': 10,
'response_received_count': 1028,
'scheduler/dequeued': 1028,
'scheduler/dequeued/memory': 1028,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 9, 52, 34, 846166)}
2018-11-29 09:52:53 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:52:54 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:52:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:52:54 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 09:52:54 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 09:52:54 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:52:54 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:52:54 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:52:54 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:52:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:52:54 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:52:57 [scrapy.extensions.logstats] INFO: Crawled 115 pages (at 2300 pages/min), scraped 106 items (at 2120 items/min)
2018-11-29 09:53:00 [scrapy.extensions.logstats] INFO: Crawled 300 pages (at 3700 pages/min), scraped 281 items (at 3500 items/min)
2018-11-29 09:53:03 [scrapy.extensions.logstats] INFO: Crawled 460 pages (at 3200 pages/min), scraped 454 items (at 3460 items/min)
2018-11-29 09:53:06 [scrapy.extensions.logstats] INFO: Crawled 640 pages (at 3600 pages/min), scraped 620 items (at 3320 items/min)
2018-11-29 09:53:09 [scrapy.extensions.logstats] INFO: Crawled 823 pages (at 3660 pages/min), scraped 787 items (at 3340 items/min)
2018-11-29 09:53:12 [scrapy.extensions.logstats] INFO: Crawled 976 pages (at 3060 pages/min), scraped 970 items (at 3660 items/min)
2018-11-29 09:53:12 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:53:13 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv
The average speed of the spider is 54.0053977703 items/sec
2018-11-29 09:53:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 367314,
'downloader/request_count': 1058,
'downloader/request_method_count/GET': 1058,
'downloader/response_bytes': 23258387,
'downloader/response_count': 1058,
'downloader/response_status_count/200': 1058,
'dupefilter/filtered': 14863,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 53, 13, 733749),
'item_scraped_count': 1058,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 25,
'memusage/max': 46620672,
'memusage/startup': 46620672,
'request_depth_max': 10,
'response_received_count': 1058,
'scheduler/dequeued': 1058,
'scheduler/dequeued/memory': 1058,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 52, 54, 162092)}
2018-11-29 09:53:13 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 54.2727882098 Median : 54.0053977703 Std Dev : 0.717040552928
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/
2018-11-29 10:09:52 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:09:52 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:09:52 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'}
2018-11-29 10:09:52 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:09:52 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:09:52 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:09:52 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:09:52 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:09:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:09:52 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:09:55 [scrapy.extensions.logstats] INFO: Crawled 123 pages (at 2460 pages/min), scraped 104 items (at 2080 items/min)
2018-11-29 10:09:58 [scrapy.extensions.logstats] INFO: Crawled 316 pages (at 3860 pages/min), scraped 287 items (at 3660 items/min)
2018-11-29 10:10:01 [scrapy.extensions.logstats] INFO: Crawled 467 pages (at 3020 pages/min), scraped 456 items (at 3380 items/min)
2018-11-29 10:10:04 [scrapy.extensions.logstats] INFO: Crawled 630 pages (at 3260 pages/min), scraped 630 items (at 3480 items/min)
2018-11-29 10:10:07 [scrapy.extensions.logstats] INFO: Crawled 822 pages (at 3840 pages/min), scraped 803 items (at 3460 items/min)
2018-11-29 10:10:10 [scrapy.extensions.logstats] INFO: Crawled 989 pages (at 3340 pages/min), scraped 972 items (at 3380 items/min)
2018-11-29 10:10:10 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:10:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 54.0142018659 items/sec
2018-11-29 10:10:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363934,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 10, 11, 505091),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46661632,
'memusage/startup': 46661632,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 9, 52, 76579)}
2018-11-29 10:10:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:10:12 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:10:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:10:12 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'}
2018-11-29 10:10:12 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:10:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:10:12 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:10:12 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:10:12 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:10:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:10:12 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:10:15 [scrapy.extensions.logstats] INFO: Crawled 123 pages (at 2460 pages/min), scraped 101 items (at 2020 items/min)
2018-11-29 10:10:18 [scrapy.extensions.logstats] INFO: Crawled 298 pages (at 3500 pages/min), scraped 290 items (at 3780 items/min)
2018-11-29 10:10:21 [scrapy.extensions.logstats] INFO: Crawled 455 pages (at 3140 pages/min), scraped 447 items (at 3140 items/min)
2018-11-29 10:10:24 [scrapy.extensions.logstats] INFO: Crawled 641 pages (at 3720 pages/min), scraped 627 items (at 3600 items/min)
2018-11-29 10:10:27 [scrapy.extensions.logstats] INFO: Crawled 826 pages (at 3700 pages/min), scraped 802 items (at 3500 items/min)
2018-11-29 10:10:30 [scrapy.extensions.logstats] INFO: Crawled 1000 pages (at 3480 pages/min), scraped 975 items (at 3460 items/min)
2018-11-29 10:10:30 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:10:31 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 54.6088343637 items/sec
2018-11-29 10:10:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363475,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 10, 31, 374879),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46657536,
'memusage/startup': 46657536,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 10, 12, 160928)}
2018-11-29 10:10:31 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:10:31 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:10:31 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:10:31 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'}
2018-11-29 10:10:31 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:10:32 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:10:32 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:10:32 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:10:32 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:10:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:10:32 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:10:35 [scrapy.extensions.logstats] INFO: Crawled 117 pages (at 2340 pages/min), scraped 103 items (at 2060 items/min)
2018-11-29 10:10:38 [scrapy.extensions.logstats] INFO: Crawled 307 pages (at 3800 pages/min), scraped 281 items (at 3560 items/min)
2018-11-29 10:10:41 [scrapy.extensions.logstats] INFO: Crawled 480 pages (at 3460 pages/min), scraped 454 items (at 3460 items/min)
2018-11-29 10:10:44 [scrapy.extensions.logstats] INFO: Crawled 646 pages (at 3320 pages/min), scraped 626 items (at 3440 items/min)
2018-11-29 10:10:47 [scrapy.extensions.logstats] INFO: Crawled 805 pages (at 3180 pages/min), scraped 798 items (at 3440 items/min)
2018-11-29 10:10:50 [scrapy.extensions.logstats] INFO: Crawled 1013 pages (at 4160 pages/min), scraped 975 items (at 3540 items/min)
2018-11-29 10:10:50 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:10:51 [scrapy.extensions.feedexport] INFO: Stored csv feed (1046 items) in: items.csv
The average speed of the spider is 54.728090246 items/sec
2018-11-29 10:10:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362920,
'downloader/request_count': 1046,
'downloader/request_method_count/GET': 1046,
'downloader/response_bytes': 22962086,
'downloader/response_count': 1046,
'downloader/response_status_count/200': 1046,
'dupefilter/filtered': 14627,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 10, 51, 145362),
'item_scraped_count': 1046,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 35,
'memusage/max': 46657536,
'memusage/startup': 46657536,
'request_depth_max': 9,
'response_received_count': 1046,
'scheduler/dequeued': 1046,
'scheduler/dequeued/memory': 1046,
'scheduler/enqueued': 1080,
'scheduler/enqueued/memory': 1080,
'start_time': datetime.datetime(2018, 11, 29, 10, 10, 32, 21111)}
2018-11-29 10:10:51 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:10:51 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:10:51 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:10:51 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'}
2018-11-29 10:10:51 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:10:51 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:10:51 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:10:51 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:10:51 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:10:51 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:10:51 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:10:54 [scrapy.extensions.logstats] INFO: Crawled 135 pages (at 2700 pages/min), scraped 104 items (at 2080 items/min)
2018-11-29 10:10:57 [scrapy.extensions.logstats] INFO: Crawled 301 pages (at 3320 pages/min), scraped 295 items (at 3820 items/min)
2018-11-29 10:11:00 [scrapy.extensions.logstats] INFO: Crawled 477 pages (at 3520 pages/min), scraped 446 items (at 3020 items/min)
2018-11-29 10:11:03 [scrapy.extensions.logstats] INFO: Crawled 648 pages (at 3420 pages/min), scraped 631 items (at 3700 items/min)
2018-11-29 10:11:06 [scrapy.extensions.logstats] INFO: Crawled 803 pages (at 3100 pages/min), scraped 801 items (at 3400 items/min)
2018-11-29 10:11:10 [scrapy.extensions.logstats] INFO: Crawled 996 pages (at 3860 pages/min), scraped 985 items (at 3680 items/min)
2018-11-29 10:11:10 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:11:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv
The average speed of the spider is 54.5775567775 items/sec
2018-11-29 10:11:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 367892,
'downloader/request_count': 1058,
'downloader/request_method_count/GET': 1058,
'downloader/response_bytes': 23261366,
'downloader/response_count': 1058,
'downloader/response_status_count/200': 1058,
'dupefilter/filtered': 14863,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 11, 11, 185001),
'item_scraped_count': 1058,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 46657536,
'memusage/startup': 46657536,
'request_depth_max': 9,
'response_received_count': 1058,
'scheduler/dequeued': 1058,
'scheduler/dequeued/memory': 1058,
'scheduler/enqueued': 1081,
'scheduler/enqueued/memory': 1081,
'start_time': datetime.datetime(2018, 11, 29, 10, 10, 51, 787689)}
2018-11-29 10:11:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:11:11 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:11:11 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:11:11 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv', 'SCHEDULER_PRIORITY_QUEUE': 'scrapy.pqueues.RoundRobinPriorityQueue'}
2018-11-29 10:11:11 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:11:11 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:11:11 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:11:11 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:11:11 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:11:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:11:11 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:11:14 [scrapy.extensions.logstats] INFO: Crawled 116 pages (at 2320 pages/min), scraped 108 items (at 2160 items/min)
2018-11-29 10:11:17 [scrapy.extensions.logstats] INFO: Crawled 309 pages (at 3860 pages/min), scraped 283 items (at 3500 items/min)
2018-11-29 10:11:20 [scrapy.extensions.logstats] INFO: Crawled 456 pages (at 2940 pages/min), scraped 456 items (at 3460 items/min)
2018-11-29 10:11:23 [scrapy.extensions.logstats] INFO: Crawled 648 pages (at 3840 pages/min), scraped 635 items (at 3580 items/min)
2018-11-29 10:11:27 [scrapy.extensions.logstats] INFO: Crawled 817 pages (at 3380 pages/min), scraped 805 items (at 3400 items/min)
2018-11-29 10:11:29 [scrapy.extensions.logstats] INFO: Crawled 986 pages (at 3380 pages/min), scraped 979 items (at 3480 items/min)
2018-11-29 10:11:30 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:11:31 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv
The average speed of the spider is 54.2897166857 items/sec
2018-11-29 10:11:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 367769,
'downloader/request_count': 1059,
'downloader/request_method_count/GET': 1059,
'downloader/response_bytes': 23312675,
'downloader/response_count': 1059,
'downloader/response_status_count/200': 1059,
'dupefilter/filtered': 14936,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 11, 31, 244085),
'item_scraped_count': 1059,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 46657536,
'memusage/startup': 46657536,
'request_depth_max': 10,
'response_received_count': 1059,
'scheduler/dequeued': 1059,
'scheduler/dequeued/memory': 1059,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 11, 11, 823783)}
2018-11-29 10:11:31 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 54.4436799878 Median : 54.5775567775 Std Dev : 0.258551470022
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/
2018-11-29 10:20:13 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:20:13 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:20:13 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:20:13 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:20:13 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:20:13 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:20:13 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:20:13 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:20:13 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:20:16 [scrapy.extensions.logstats] INFO: Crawled 117 pages (at 2340 pages/min), scraped 106 items (at 2120 items/min)
2018-11-29 10:20:19 [scrapy.extensions.logstats] INFO: Crawled 292 pages (at 3500 pages/min), scraped 284 items (at 3560 items/min)
2018-11-29 10:20:22 [scrapy.extensions.logstats] INFO: Crawled 477 pages (at 3700 pages/min), scraped 441 items (at 3140 items/min)
2018-11-29 10:20:25 [scrapy.extensions.logstats] INFO: Crawled 629 pages (at 3040 pages/min), scraped 623 items (at 3640 items/min)
2018-11-29 10:20:28 [scrapy.extensions.logstats] INFO: Crawled 816 pages (at 3740 pages/min), scraped 800 items (at 3540 items/min)
2018-11-29 10:20:31 [scrapy.extensions.logstats] INFO: Crawled 998 pages (at 3640 pages/min), scraped 980 items (at 3600 items/min)
2018-11-29 10:20:31 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:20:32 [scrapy.extensions.feedexport] INFO: Stored csv feed (1028 items) in: items.csv
The average speed of the spider is 54.0763760286 items/sec
2018-11-29 10:20:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 356434,
'downloader/request_count': 1028,
'downloader/request_method_count/GET': 1028,
'downloader/response_bytes': 22689867,
'downloader/response_count': 1028,
'downloader/response_status_count/200': 1028,
'dupefilter/filtered': 14613,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 20, 32, 40008),
'item_scraped_count': 1028,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46452736,
'memusage/startup': 46452736,
'request_depth_max': 10,
'response_received_count': 1028,
'scheduler/dequeued': 1028,
'scheduler/dequeued/memory': 1028,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 20, 13, 120222)}
2018-11-29 10:20:32 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:20:32 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:20:32 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:20:32 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:20:32 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:20:32 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:20:32 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:20:32 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:20:32 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:20:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:20:35 [scrapy.extensions.logstats] INFO: Crawled 123 pages (at 2460 pages/min), scraped 103 items (at 2060 items/min)
2018-11-29 10:20:38 [scrapy.extensions.logstats] INFO: Crawled 290 pages (at 3340 pages/min), scraped 288 items (at 3700 items/min)
2018-11-29 10:20:41 [scrapy.extensions.logstats] INFO: Crawled 470 pages (at 3600 pages/min), scraped 457 items (at 3380 items/min)
2018-11-29 10:20:44 [scrapy.extensions.logstats] INFO: Crawled 658 pages (at 3760 pages/min), scraped 626 items (at 3380 items/min)
2018-11-29 10:20:47 [scrapy.extensions.logstats] INFO: Crawled 826 pages (at 3360 pages/min), scraped 811 items (at 3700 items/min)
2018-11-29 10:20:50 [scrapy.extensions.logstats] INFO: Crawled 994 pages (at 3360 pages/min), scraped 987 items (at 3520 items/min)
2018-11-29 10:20:51 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:20:51 [scrapy.extensions.feedexport] INFO: Stored csv feed (1052 items) in: items.csv
The average speed of the spider is 54.9769549264 items/sec
2018-11-29 10:20:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 365067,
'downloader/request_count': 1052,
'downloader/request_method_count/GET': 1052,
'downloader/response_bytes': 23177070,
'downloader/response_count': 1052,
'downloader/response_status_count/200': 1052,
'dupefilter/filtered': 14873,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 20, 51, 778587),
'item_scraped_count': 1052,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 31,
'memusage/max': 46456832,
'memusage/startup': 46456832,
'request_depth_max': 10,
'response_received_count': 1052,
'scheduler/dequeued': 1052,
'scheduler/dequeued/memory': 1052,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 20, 32, 620265)}
2018-11-29 10:20:51 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:20:52 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:20:52 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:20:52 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:20:52 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:20:52 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:20:52 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:20:52 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:20:52 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:20:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:20:55 [scrapy.extensions.logstats] INFO: Crawled 121 pages (at 2420 pages/min), scraped 107 items (at 2140 items/min)
2018-11-29 10:20:58 [scrapy.extensions.logstats] INFO: Crawled 300 pages (at 3580 pages/min), scraped 287 items (at 3600 items/min)
2018-11-29 10:21:01 [scrapy.extensions.logstats] INFO: Crawled 461 pages (at 3220 pages/min), scraped 455 items (at 3360 items/min)
2018-11-29 10:21:04 [scrapy.extensions.logstats] INFO: Crawled 634 pages (at 3460 pages/min), scraped 626 items (at 3420 items/min)
2018-11-29 10:21:07 [scrapy.extensions.logstats] INFO: Crawled 801 pages (at 3340 pages/min), scraped 788 items (at 3240 items/min)
2018-11-29 10:21:10 [scrapy.extensions.logstats] INFO: Crawled 994 pages (at 3860 pages/min), scraped 960 items (at 3440 items/min)
2018-11-29 10:21:11 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:21:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 53.7050977679 items/sec
2018-11-29 10:21:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 364014,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 21, 11, 885192),
'item_scraped_count': 1049,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46456832,
'memusage/startup': 46456832,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 20, 52, 342563)}
2018-11-29 10:21:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:21:12 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:21:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:21:12 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:21:12 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:21:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:21:12 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:21:12 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:21:12 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:21:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:21:15 [scrapy.extensions.logstats] INFO: Crawled 108 pages (at 2160 pages/min), scraped 106 items (at 2120 items/min)
2018-11-29 10:21:18 [scrapy.extensions.logstats] INFO: Crawled 293 pages (at 3700 pages/min), scraped 285 items (at 3580 items/min)
2018-11-29 10:21:21 [scrapy.extensions.logstats] INFO: Crawled 470 pages (at 3540 pages/min), scraped 451 items (at 3320 items/min)
2018-11-29 10:21:24 [scrapy.extensions.logstats] INFO: Crawled 630 pages (at 3200 pages/min), scraped 624 items (at 3460 items/min)
2018-11-29 10:21:27 [scrapy.extensions.logstats] INFO: Crawled 808 pages (at 3560 pages/min), scraped 797 items (at 3460 items/min)
2018-11-29 10:21:30 [scrapy.extensions.logstats] INFO: Crawled 981 pages (at 3460 pages/min), scraped 968 items (at 3420 items/min)
2018-11-29 10:21:31 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:21:31 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv
The average speed of the spider is 53.8195090969 items/sec
2018-11-29 10:21:31 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362732,
'downloader/request_count': 1044,
'downloader/request_method_count/GET': 1044,
'downloader/response_bytes': 22995909,
'downloader/response_count': 1044,
'downloader/response_status_count/200': 1044,
'dupefilter/filtered': 14757,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 21, 31, 862245),
'item_scraped_count': 1044,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 18,
'memusage/max': 46452736,
'memusage/startup': 46452736,
'request_depth_max': 10,
'response_received_count': 1044,
'scheduler/dequeued': 1044,
'scheduler/dequeued/memory': 1044,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 21, 12, 461530)}
2018-11-29 10:21:31 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:21:32 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:21:32 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:21:32 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:21:32 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:21:32 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:21:32 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:21:32 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:21:32 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:21:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:21:35 [scrapy.extensions.logstats] INFO: Crawled 130 pages (at 2600 pages/min), scraped 103 items (at 2060 items/min)
2018-11-29 10:21:38 [scrapy.extensions.logstats] INFO: Crawled 291 pages (at 3220 pages/min), scraped 290 items (at 3740 items/min)
2018-11-29 10:21:41 [scrapy.extensions.logstats] INFO: Crawled 480 pages (at 3780 pages/min), scraped 452 items (at 3240 items/min)
2018-11-29 10:21:44 [scrapy.extensions.logstats] INFO: Crawled 645 pages (at 3300 pages/min), scraped 635 items (at 3660 items/min)
2018-11-29 10:21:47 [scrapy.extensions.logstats] INFO: Crawled 813 pages (at 3360 pages/min), scraped 811 items (at 3520 items/min)
2018-11-29 10:21:50 [scrapy.extensions.logstats] INFO: Crawled 998 pages (at 3700 pages/min), scraped 990 items (at 3580 items/min)
2018-11-29 10:21:50 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:21:51 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv
The average speed of the spider is 54.5206981948 items/sec
2018-11-29 10:21:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362562,
'downloader/request_count': 1044,
'downloader/request_method_count/GET': 1044,
'downloader/response_bytes': 22995909,
'downloader/response_count': 1044,
'downloader/response_status_count/200': 1044,
'dupefilter/filtered': 14757,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 21, 51, 540612),
'item_scraped_count': 1044,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 18,
'memusage/max': 46452736,
'memusage/startup': 46452736,
'request_depth_max': 10,
'response_received_count': 1044,
'scheduler/dequeued': 1044,
'scheduler/dequeued/memory': 1044,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 21, 32, 413729)}
2018-11-29 10:21:51 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 54.2197272029 Median : 54.0763760286 Std Dev : 0.471072162224
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-2.7/
2018-11-29 10:26:33 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:26:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:26:33 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:26:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:26:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:26:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:26:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:26:33 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:26:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:26:36 [scrapy.extensions.logstats] INFO: Crawled 112 pages (at 2240 pages/min), scraped 96 items (at 1920 items/min)
2018-11-29 10:26:39 [scrapy.extensions.logstats] INFO: Crawled 289 pages (at 3540 pages/min), scraped 273 items (at 3540 items/min)
2018-11-29 10:26:42 [scrapy.extensions.logstats] INFO: Crawled 442 pages (at 3060 pages/min), scraped 427 items (at 3080 items/min)
2018-11-29 10:26:45 [scrapy.extensions.logstats] INFO: Crawled 609 pages (at 3340 pages/min), scraped 594 items (at 3340 items/min)
2018-11-29 10:26:48 [scrapy.extensions.logstats] INFO: Crawled 773 pages (at 3280 pages/min), scraped 758 items (at 3280 items/min)
2018-11-29 10:26:51 [scrapy.extensions.logstats] INFO: Crawled 933 pages (at 3200 pages/min), scraped 920 items (at 3240 items/min)
2018-11-29 10:26:53 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:26:54 [scrapy.extensions.feedexport] INFO: Stored csv feed (1070 items) in: items.csv
The average speed of the spider is 51.9762194939 items/sec
2018-11-29 10:26:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 371166,
'downloader/request_count': 1070,
'downloader/request_method_count/GET': 1070,
'downloader/response_bytes': 4169684,
'downloader/response_count': 1070,
'downloader/response_status_count/200': 1070,
'dupefilter/filtered': 15079,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 26, 54, 312540),
'item_scraped_count': 1070,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46456832,
'memusage/startup': 46456832,
'request_depth_max': 10,
'response_received_count': 1070,
'scheduler/dequeued': 1070,
'scheduler/dequeued/memory': 1070,
'scheduler/enqueued': 1103,
'scheduler/enqueued/memory': 1103,
'start_time': datetime.datetime(2018, 11, 29, 10, 26, 33, 708329)}
2018-11-29 10:26:54 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:26:54 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:26:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:26:54 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:26:54 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:26:54 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:26:54 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:26:54 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:26:54 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:26:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:26:57 [scrapy.extensions.logstats] INFO: Crawled 109 pages (at 2180 pages/min), scraped 90 items (at 1800 items/min)
2018-11-29 10:27:00 [scrapy.extensions.logstats] INFO: Crawled 277 pages (at 3360 pages/min), scraped 262 items (at 3440 items/min)
2018-11-29 10:27:03 [scrapy.extensions.logstats] INFO: Crawled 433 pages (at 3120 pages/min), scraped 421 items (at 3180 items/min)
2018-11-29 10:27:06 [scrapy.extensions.logstats] INFO: Crawled 601 pages (at 3360 pages/min), scraped 585 items (at 3280 items/min)
2018-11-29 10:27:09 [scrapy.extensions.logstats] INFO: Crawled 765 pages (at 3280 pages/min), scraped 750 items (at 3300 items/min)
2018-11-29 10:27:12 [scrapy.extensions.logstats] INFO: Crawled 929 pages (at 3280 pages/min), scraped 915 items (at 3300 items/min)
2018-11-29 10:27:14 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:27:15 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv
The average speed of the spider is 51.4176928926 items/sec
2018-11-29 10:27:15 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362970,
'downloader/request_count': 1044,
'downloader/request_method_count/GET': 1044,
'downloader/response_bytes': 4072056,
'downloader/response_count': 1044,
'downloader/response_status_count/200': 1044,
'dupefilter/filtered': 14757,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 27, 15, 102491),
'item_scraped_count': 1044,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 18,
'memusage/max': 46456832,
'memusage/startup': 46456832,
'request_depth_max': 10,
'response_received_count': 1044,
'scheduler/dequeued': 1044,
'scheduler/dequeued/memory': 1044,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 26, 54, 879857)}
2018-11-29 10:27:15 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:27:15 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:27:15 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:27:15 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:27:15 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:27:15 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:27:15 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:27:15 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:27:15 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:27:15 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:27:18 [scrapy.extensions.logstats] INFO: Crawled 114 pages (at 2280 pages/min), scraped 98 items (at 1960 items/min)
2018-11-29 10:27:21 [scrapy.extensions.logstats] INFO: Crawled 282 pages (at 3360 pages/min), scraped 273 items (at 3500 items/min)
2018-11-29 10:27:24 [scrapy.extensions.logstats] INFO: Crawled 442 pages (at 3200 pages/min), scraped 433 items (at 3200 items/min)
2018-11-29 10:27:27 [scrapy.extensions.logstats] INFO: Crawled 610 pages (at 3360 pages/min), scraped 595 items (at 3240 items/min)
2018-11-29 10:27:30 [scrapy.extensions.logstats] INFO: Crawled 771 pages (at 3220 pages/min), scraped 760 items (at 3300 items/min)
2018-11-29 10:27:33 [scrapy.extensions.logstats] INFO: Crawled 938 pages (at 3340 pages/min), scraped 930 items (at 3400 items/min)
2018-11-29 10:27:34 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:27:35 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 52.0539729896 items/sec
2018-11-29 10:27:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 364301,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 4091076,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 27, 35, 768699),
'item_scraped_count': 1049,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46456832,
'memusage/startup': 46456832,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 27, 15, 658557)}
2018-11-29 10:27:35 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:27:36 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:27:36 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:27:36 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:27:36 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:27:36 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:27:36 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:27:36 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:27:36 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:27:36 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:27:39 [scrapy.extensions.logstats] INFO: Crawled 111 pages (at 2220 pages/min), scraped 98 items (at 1960 items/min)
2018-11-29 10:27:42 [scrapy.extensions.logstats] INFO: Crawled 285 pages (at 3480 pages/min), scraped 272 items (at 3480 items/min)
2018-11-29 10:27:45 [scrapy.extensions.logstats] INFO: Crawled 445 pages (at 3200 pages/min), scraped 430 items (at 3160 items/min)
2018-11-29 10:27:48 [scrapy.extensions.logstats] INFO: Crawled 620 pages (at 3500 pages/min), scraped 604 items (at 3480 items/min)
2018-11-29 10:27:51 [scrapy.extensions.logstats] INFO: Crawled 781 pages (at 3220 pages/min), scraped 767 items (at 3260 items/min)
2018-11-29 10:27:54 [scrapy.extensions.logstats] INFO: Crawled 949 pages (at 3360 pages/min), scraped 933 items (at 3320 items/min)
2018-11-29 10:27:55 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:27:56 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 52.2991827852 items/sec
2018-11-29 10:27:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363833,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 4091076,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 27, 56, 342556),
'item_scraped_count': 1049,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46456832,
'memusage/startup': 46456832,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 27, 36, 337224)}
2018-11-29 10:27:56 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:27:56 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 10:27:56 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 2.7.5 (default, Sep 12 2018, 05:31:16) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:27:56 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'books.spiders', 'CLOSESPIDER_ITEMCOUNT': 1000, 'FEED_URI': 'items.csv', 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'CONCURRENT_REQUESTS': 120, 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders'], 'BOT_NAME': 'books', 'LOGSTATS_INTERVAL': 3, 'FEED_FORMAT': 'csv'}
2018-11-29 10:27:56 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-11-29 10:27:56 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:27:56 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:27:56 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:27:56 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:27:56 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:27:59 [scrapy.extensions.logstats] INFO: Crawled 114 pages (at 2280 pages/min), scraped 98 items (at 1960 items/min)
2018-11-29 10:28:02 [scrapy.extensions.logstats] INFO: Crawled 282 pages (at 3360 pages/min), scraped 270 items (at 3440 items/min)
2018-11-29 10:28:05 [scrapy.extensions.logstats] INFO: Crawled 442 pages (at 3200 pages/min), scraped 433 items (at 3260 items/min)
2018-11-29 10:28:08 [scrapy.extensions.logstats] INFO: Crawled 602 pages (at 3200 pages/min), scraped 594 items (at 3220 items/min)
2018-11-29 10:28:11 [scrapy.extensions.logstats] INFO: Crawled 762 pages (at 3200 pages/min), scraped 750 items (at 3120 items/min)
2018-11-29 10:28:14 [scrapy.extensions.logstats] INFO: Crawled 926 pages (at 3280 pages/min), scraped 914 items (at 3280 items/min)
2018-11-29 10:28:16 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:28:17 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 51.6355868318 items/sec
2018-11-29 10:28:17 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363596,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 4091076,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 28, 17, 193057),
'item_scraped_count': 1049,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 46452736,
'memusage/startup': 46452736,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 27, 56, 892126)}
2018-11-29 10:28:17 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 51.8765309986 Median : 51.9762194939 Std Dev : 0.312616214555
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-3.6/
2018-11-29 10:14:44 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:14:44 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:14:44 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:14:44 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:14:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:14:44 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:14:44 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:14:44 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:14:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:14:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:14:47 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 85 items (at 1700 items/min)
2018-11-29 10:14:50 [scrapy.extensions.logstats] INFO: Crawled 262 pages (at 3380 pages/min), scraped 255 items (at 3400 items/min)
2018-11-29 10:14:53 [scrapy.extensions.logstats] INFO: Crawled 426 pages (at 3280 pages/min), scraped 424 items (at 3380 items/min)
2018-11-29 10:14:56 [scrapy.extensions.logstats] INFO: Crawled 602 pages (at 3520 pages/min), scraped 583 items (at 3180 items/min)
2018-11-29 10:14:59 [scrapy.extensions.logstats] INFO: Crawled 761 pages (at 3180 pages/min), scraped 753 items (at 3400 items/min)
2018-11-29 10:15:02 [scrapy.extensions.logstats] INFO: Crawled 924 pages (at 3260 pages/min), scraped 911 items (at 3160 items/min)
2018-11-29 10:15:03 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:15:04 [scrapy.extensions.feedexport] INFO: Stored csv feed (1048 items) in: items.csv
The average speed of the spider is 51.12914776632539 items/sec
2018-11-29 10:15:04 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363184,
'downloader/request_count': 1048,
'downloader/request_method_count/GET': 1048,
'downloader/response_bytes': 23064841,
'downloader/response_count': 1048,
'downloader/response_status_count/200': 1048,
'dupefilter/filtered': 14773,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 15, 4, 565306),
'item_scraped_count': 1048,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 35,
'memusage/max': 45412352,
'memusage/startup': 45412352,
'request_depth_max': 10,
'response_received_count': 1048,
'scheduler/dequeued': 1048,
'scheduler/dequeued/memory': 1048,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 14, 44, 106279)}
2018-11-29 10:15:04 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:15:05 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:15:05 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:15:05 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:15:05 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:15:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:15:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:15:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:15:05 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:15:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:15:05 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:15:08 [scrapy.extensions.logstats] INFO: Crawled 92 pages (at 1840 pages/min), scraped 85 items (at 1700 items/min)
2018-11-29 10:15:11 [scrapy.extensions.logstats] INFO: Crawled 262 pages (at 3400 pages/min), scraped 255 items (at 3400 items/min)
2018-11-29 10:15:14 [scrapy.extensions.logstats] INFO: Crawled 431 pages (at 3380 pages/min), scraped 424 items (at 3380 items/min)
2018-11-29 10:15:17 [scrapy.extensions.logstats] INFO: Crawled 610 pages (at 3580 pages/min), scraped 587 items (at 3260 items/min)
2018-11-29 10:15:20 [scrapy.extensions.logstats] INFO: Crawled 768 pages (at 3160 pages/min), scraped 749 items (at 3240 items/min)
2018-11-29 10:15:23 [scrapy.extensions.logstats] INFO: Crawled 931 pages (at 3260 pages/min), scraped 922 items (at 3460 items/min)
2018-11-29 10:15:25 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:15:25 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 51.56846499863366 items/sec
2018-11-29 10:15:25 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363582,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 15, 25, 786526),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45412352,
'memusage/startup': 45412352,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 15, 5, 495634)}
2018-11-29 10:15:25 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:15:26 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:15:26 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:15:26 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:15:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:15:26 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:15:26 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:15:26 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:15:26 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:15:26 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:15:26 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:15:29 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 86 items (at 1720 items/min)
2018-11-29 10:15:32 [scrapy.extensions.logstats] INFO: Crawled 274 pages (at 3620 pages/min), scraped 261 items (at 3500 items/min)
2018-11-29 10:15:35 [scrapy.extensions.logstats] INFO: Crawled 433 pages (at 3180 pages/min), scraped 419 items (at 3160 items/min)
2018-11-29 10:15:38 [scrapy.extensions.logstats] INFO: Crawled 595 pages (at 3240 pages/min), scraped 581 items (at 3240 items/min)
2018-11-29 10:15:41 [scrapy.extensions.logstats] INFO: Crawled 756 pages (at 3220 pages/min), scraped 756 items (at 3500 items/min)
2018-11-29 10:15:44 [scrapy.extensions.logstats] INFO: Crawled 940 pages (at 3680 pages/min), scraped 928 items (at 3440 items/min)
2018-11-29 10:15:46 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:15:46 [scrapy.extensions.feedexport] INFO: Stored csv feed (1028 items) in: items.csv
The average speed of the spider is 51.51978070283345 items/sec
2018-11-29 10:15:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 356398,
'downloader/request_count': 1028,
'downloader/request_method_count/GET': 1028,
'downloader/response_bytes': 22689867,
'downloader/response_count': 1028,
'downloader/response_status_count/200': 1028,
'dupefilter/filtered': 14613,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 15, 46, 603307),
'item_scraped_count': 1028,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45412352,
'memusage/startup': 45412352,
'request_depth_max': 10,
'response_received_count': 1028,
'scheduler/dequeued': 1028,
'scheduler/dequeued/memory': 1028,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 15, 26, 686372)}
2018-11-29 10:15:46 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:15:47 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:15:47 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:15:47 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:15:47 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:15:47 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:15:47 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:15:47 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:15:47 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:15:47 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:15:47 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:15:50 [scrapy.extensions.logstats] INFO: Crawled 88 pages (at 1760 pages/min), scraped 81 items (at 1620 items/min)
2018-11-29 10:15:53 [scrapy.extensions.logstats] INFO: Crawled 273 pages (at 3700 pages/min), scraped 265 items (at 3680 items/min)
2018-11-29 10:15:56 [scrapy.extensions.logstats] INFO: Crawled 437 pages (at 3280 pages/min), scraped 429 items (at 3280 items/min)
2018-11-29 10:15:59 [scrapy.extensions.logstats] INFO: Crawled 598 pages (at 3220 pages/min), scraped 595 items (at 3320 items/min)
2018-11-29 10:16:02 [scrapy.extensions.logstats] INFO: Crawled 772 pages (at 3480 pages/min), scraped 756 items (at 3220 items/min)
2018-11-29 10:16:05 [scrapy.extensions.logstats] INFO: Crawled 955 pages (at 3660 pages/min), scraped 922 items (at 3320 items/min)
2018-11-29 10:16:06 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:16:07 [scrapy.extensions.feedexport] INFO: Stored csv feed (1043 items) in: items.csv
The average speed of the spider is 51.87628543449582 items/sec
2018-11-29 10:16:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362309,
'downloader/request_count': 1043,
'downloader/request_method_count/GET': 1043,
'downloader/response_bytes': 22941621,
'downloader/response_count': 1043,
'downloader/response_status_count/200': 1043,
'dupefilter/filtered': 14684,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 16, 7, 595425),
'item_scraped_count': 1043,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 19,
'memusage/max': 45400064,
'memusage/startup': 45400064,
'request_depth_max': 10,
'response_received_count': 1043,
'scheduler/dequeued': 1043,
'scheduler/dequeued/memory': 1043,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 15, 47, 506996)}
2018-11-29 10:16:07 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:16:08 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:16:08 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:16:08 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:16:08 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:16:08 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:16:08 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:16:08 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:16:08 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:16:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:16:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:16:11 [scrapy.extensions.logstats] INFO: Crawled 92 pages (at 1840 pages/min), scraped 92 items (at 1840 items/min)
2018-11-29 10:16:14 [scrapy.extensions.logstats] INFO: Crawled 279 pages (at 3740 pages/min), scraped 261 items (at 3380 items/min)
2018-11-29 10:16:17 [scrapy.extensions.logstats] INFO: Crawled 446 pages (at 3340 pages/min), scraped 415 items (at 3080 items/min)
2018-11-29 10:16:20 [scrapy.extensions.logstats] INFO: Crawled 594 pages (at 2960 pages/min), scraped 586 items (at 3420 items/min)
2018-11-29 10:16:23 [scrapy.extensions.logstats] INFO: Crawled 761 pages (at 3340 pages/min), scraped 749 items (at 3260 items/min)
2018-11-29 10:16:26 [scrapy.extensions.logstats] INFO: Crawled 948 pages (at 3740 pages/min), scraped 921 items (at 3440 items/min)
2018-11-29 10:16:28 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:16:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (1048 items) in: items.csv
The average speed of the spider is 51.690047106138145 items/sec
2018-11-29 10:16:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363320,
'downloader/request_count': 1048,
'downloader/request_method_count/GET': 1048,
'downloader/response_bytes': 23067820,
'downloader/response_count': 1048,
'downloader/response_status_count/200': 1048,
'dupefilter/filtered': 14773,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 16, 28, 777623),
'item_scraped_count': 1048,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45412352,
'memusage/startup': 45412352,
'request_depth_max': 9,
'response_received_count': 1048,
'scheduler/dequeued': 1048,
'scheduler/dequeued/memory': 1048,
'scheduler/enqueued': 1081,
'scheduler/enqueued/memory': 1081,
'start_time': datetime.datetime(2018, 11, 29, 10, 16, 8, 516604)}
2018-11-29 10:16:28 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 51.556745201685295 Median : 51.56846499863366 Std Dev : 0.24666521954163279
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-3.6/
2018-11-29 09:48:53 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:48:53 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:48:53 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:48:53 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:48:53 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:48:53 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:48:53 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:48:53 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:48:53 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:48:53 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:48:57 [scrapy.extensions.logstats] INFO: Crawled 98 pages (at 1960 pages/min), scraped 96 items (at 1920 items/min)
2018-11-29 09:49:00 [scrapy.extensions.logstats] INFO: Crawled 280 pages (at 3640 pages/min), scraped 259 items (at 3260 items/min)
2018-11-29 09:49:02 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 3360 pages/min), scraped 421 items (at 3240 items/min)
2018-11-29 09:49:06 [scrapy.extensions.logstats] INFO: Crawled 613 pages (at 3300 pages/min), scraped 586 items (at 3300 items/min)
2018-11-29 09:49:08 [scrapy.extensions.logstats] INFO: Crawled 757 pages (at 2880 pages/min), scraped 755 items (at 3380 items/min)
2018-11-29 09:49:12 [scrapy.extensions.logstats] INFO: Crawled 944 pages (at 3740 pages/min), scraped 932 items (at 3540 items/min)
2018-11-29 09:49:13 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:49:13 [scrapy.extensions.feedexport] INFO: Stored csv feed (1047 items) in: items.csv
The average speed of the spider is 52.103400842770014 items/sec
2018-11-29 09:49:13 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362941,
'downloader/request_count': 1047,
'downloader/request_method_count/GET': 1047,
'downloader/response_bytes': 23016374,
'downloader/response_count': 1047,
'downloader/response_status_count/200': 1047,
'dupefilter/filtered': 14700,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 49, 13, 934364),
'item_scraped_count': 1047,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45408256,
'memusage/startup': 45408256,
'request_depth_max': 9,
'response_received_count': 1047,
'scheduler/dequeued': 1047,
'scheduler/dequeued/memory': 1047,
'scheduler/enqueued': 1080,
'scheduler/enqueued/memory': 1080,
'start_time': datetime.datetime(2018, 11, 29, 9, 48, 53, 845726)}
2018-11-29 09:49:13 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:49:14 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:49:14 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:49:14 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:49:14 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:49:14 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:49:14 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:49:14 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:49:14 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:49:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:49:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:49:17 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 85 items (at 1700 items/min)
2018-11-29 09:49:20 [scrapy.extensions.logstats] INFO: Crawled 275 pages (at 3640 pages/min), scraped 268 items (at 3660 items/min)
2018-11-29 09:49:24 [scrapy.extensions.logstats] INFO: Crawled 446 pages (at 3420 pages/min), scraped 434 items (at 3320 items/min)
2018-11-29 09:49:26 [scrapy.extensions.logstats] INFO: Crawled 611 pages (at 3300 pages/min), scraped 596 items (at 3240 items/min)
2018-11-29 09:49:29 [scrapy.extensions.logstats] INFO: Crawled 769 pages (at 3160 pages/min), scraped 761 items (at 3300 items/min)
2018-11-29 09:49:32 [scrapy.extensions.logstats] INFO: Crawled 952 pages (at 3660 pages/min), scraped 944 items (at 3660 items/min)
2018-11-29 09:49:33 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:49:34 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 52.091360755197876 items/sec
2018-11-29 09:49:34 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363552,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 49, 34, 941327),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45416448,
'memusage/startup': 45416448,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 49, 14, 851541)}
2018-11-29 09:49:34 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:49:35 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:49:35 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:49:35 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:49:35 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:49:35 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:49:35 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:49:35 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:49:35 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:49:35 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:49:35 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:49:38 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 85 items (at 1700 items/min)
2018-11-29 09:49:41 [scrapy.extensions.logstats] INFO: Crawled 264 pages (at 3420 pages/min), scraped 257 items (at 3440 items/min)
2018-11-29 09:49:44 [scrapy.extensions.logstats] INFO: Crawled 429 pages (at 3300 pages/min), scraped 424 items (at 3340 items/min)
2018-11-29 09:49:47 [scrapy.extensions.logstats] INFO: Crawled 602 pages (at 3460 pages/min), scraped 583 items (at 3180 items/min)
2018-11-29 09:49:51 [scrapy.extensions.logstats] INFO: Crawled 769 pages (at 3340 pages/min), scraped 748 items (at 3300 items/min)
2018-11-29 09:49:54 [scrapy.extensions.logstats] INFO: Crawled 929 pages (at 3200 pages/min), scraped 921 items (at 3460 items/min)
2018-11-29 09:49:55 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:49:56 [scrapy.extensions.feedexport] INFO: Stored csv feed (1048 items) in: items.csv
The average speed of the spider is 51.49821654192627 items/sec
2018-11-29 09:49:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363177,
'downloader/request_count': 1048,
'downloader/request_method_count/GET': 1048,
'downloader/response_bytes': 23064841,
'downloader/response_count': 1048,
'downloader/response_status_count/200': 1048,
'dupefilter/filtered': 14773,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 49, 56, 192262),
'item_scraped_count': 1048,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 35,
'memusage/max': 45408256,
'memusage/startup': 45408256,
'request_depth_max': 10,
'response_received_count': 1048,
'scheduler/dequeued': 1048,
'scheduler/dequeued/memory': 1048,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 49, 35, 858083)}
2018-11-29 09:49:56 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:49:57 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:49:57 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:49:57 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:49:57 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:49:57 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:49:57 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:49:57 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:49:57 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:49:57 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:49:57 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:50:00 [scrapy.extensions.logstats] INFO: Crawled 84 pages (at 1680 pages/min), scraped 84 items (at 1680 items/min)
2018-11-29 09:50:03 [scrapy.extensions.logstats] INFO: Crawled 267 pages (at 3660 pages/min), scraped 259 items (at 3500 items/min)
2018-11-29 09:50:06 [scrapy.extensions.logstats] INFO: Crawled 428 pages (at 3220 pages/min), scraped 428 items (at 3380 items/min)
2018-11-29 09:50:09 [scrapy.extensions.logstats] INFO: Crawled 603 pages (at 3500 pages/min), scraped 582 items (at 3080 items/min)
2018-11-29 09:50:12 [scrapy.extensions.logstats] INFO: Crawled 759 pages (at 3120 pages/min), scraped 747 items (at 3300 items/min)
2018-11-29 09:50:15 [scrapy.extensions.logstats] INFO: Crawled 948 pages (at 3780 pages/min), scraped 917 items (at 3400 items/min)
2018-11-29 09:50:16 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:50:17 [scrapy.extensions.feedexport] INFO: Stored csv feed (1047 items) in: items.csv
The average speed of the spider is 51.51026875692216 items/sec
2018-11-29 09:50:17 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363350,
'downloader/request_count': 1047,
'downloader/request_method_count/GET': 1047,
'downloader/response_bytes': 23016374,
'downloader/response_count': 1047,
'downloader/response_status_count/200': 1047,
'dupefilter/filtered': 14700,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 50, 17, 441551),
'item_scraped_count': 1047,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45404160,
'memusage/startup': 45404160,
'request_depth_max': 9,
'response_received_count': 1047,
'scheduler/dequeued': 1047,
'scheduler/dequeued/memory': 1047,
'scheduler/enqueued': 1080,
'scheduler/enqueued/memory': 1080,
'start_time': datetime.datetime(2018, 11, 29, 9, 49, 57, 103363)}
2018-11-29 09:50:17 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:50:18 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 09:50:18 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:50:18 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:50:18 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:50:18 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:50:18 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:50:18 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:50:18 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:50:18 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:50:18 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 09:50:21 [scrapy.extensions.logstats] INFO: Crawled 98 pages (at 1960 pages/min), scraped 90 items (at 1800 items/min)
2018-11-29 09:50:24 [scrapy.extensions.logstats] INFO: Crawled 269 pages (at 3420 pages/min), scraped 261 items (at 3420 items/min)
2018-11-29 09:50:27 [scrapy.extensions.logstats] INFO: Crawled 443 pages (at 3480 pages/min), scraped 428 items (at 3340 items/min)
2018-11-29 09:50:30 [scrapy.extensions.logstats] INFO: Crawled 621 pages (at 3560 pages/min), scraped 590 items (at 3240 items/min)
2018-11-29 09:50:33 [scrapy.extensions.logstats] INFO: Crawled 779 pages (at 3160 pages/min), scraped 761 items (at 3420 items/min)
2018-11-29 09:50:36 [scrapy.extensions.logstats] INFO: Crawled 938 pages (at 3180 pages/min), scraped 932 items (at 3420 items/min)
2018-11-29 09:50:37 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:50:38 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 52.10664560334505 items/sec
2018-11-29 09:50:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363515,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 50, 38, 380067),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45412352,
'memusage/startup': 45412352,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 50, 18, 346932)}
2018-11-29 09:50:38 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 51.86197850003227 Median : 52.091360755197876 Std Dev : 0.29215936184384017
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-3.6/
2018-11-29 10:12:07 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:12:07 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:12:07 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:12:07 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:12:07 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:12:07 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:12:07 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:12:07 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:12:07 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:12:07 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:12:10 [scrapy.extensions.logstats] INFO: Crawled 98 pages (at 1960 pages/min), scraped 91 items (at 1820 items/min)
2018-11-29 10:12:13 [scrapy.extensions.logstats] INFO: Crawled 281 pages (at 3660 pages/min), scraped 256 items (at 3300 items/min)
2018-11-29 10:12:16 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 3340 pages/min), scraped 422 items (at 3320 items/min)
2018-11-29 10:12:19 [scrapy.extensions.logstats] INFO: Crawled 606 pages (at 3160 pages/min), scraped 587 items (at 3300 items/min)
2018-11-29 10:12:22 [scrapy.extensions.logstats] INFO: Crawled 779 pages (at 3460 pages/min), scraped 760 items (at 3460 items/min)
2018-11-29 10:12:25 [scrapy.extensions.logstats] INFO: Crawled 950 pages (at 3420 pages/min), scraped 930 items (at 3400 items/min)
2018-11-29 10:12:26 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:12:27 [scrapy.extensions.feedexport] INFO: Stored csv feed (1047 items) in: items.csv
The average speed of the spider is 52.264764796054884 items/sec
2018-11-29 10:12:27 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362913,
'downloader/request_count': 1047,
'downloader/request_method_count/GET': 1047,
'downloader/response_bytes': 23016374,
'downloader/response_count': 1047,
'downloader/response_status_count/200': 1047,
'dupefilter/filtered': 14700,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 12, 27, 217546),
'item_scraped_count': 1047,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45436928,
'memusage/startup': 45436928,
'request_depth_max': 9,
'response_received_count': 1047,
'scheduler/dequeued': 1047,
'scheduler/dequeued/memory': 1047,
'scheduler/enqueued': 1080,
'scheduler/enqueued/memory': 1080,
'start_time': datetime.datetime(2018, 11, 29, 10, 12, 7, 214518)}
2018-11-29 10:12:27 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:12:28 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:12:28 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:12:28 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:12:28 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:12:28 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:12:28 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:12:28 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:12:28 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:12:28 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:12:28 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:12:31 [scrapy.extensions.logstats] INFO: Crawled 90 pages (at 1800 pages/min), scraped 82 items (at 1640 items/min)
2018-11-29 10:12:34 [scrapy.extensions.logstats] INFO: Crawled 267 pages (at 3540 pages/min), scraped 260 items (at 3560 items/min)
2018-11-29 10:12:37 [scrapy.extensions.logstats] INFO: Crawled 431 pages (at 3280 pages/min), scraped 425 items (at 3300 items/min)
2018-11-29 10:12:40 [scrapy.extensions.logstats] INFO: Crawled 613 pages (at 3640 pages/min), scraped 584 items (at 3180 items/min)
2018-11-29 10:12:43 [scrapy.extensions.logstats] INFO: Crawled 764 pages (at 3020 pages/min), scraped 762 items (at 3560 items/min)
2018-11-29 10:12:46 [scrapy.extensions.logstats] INFO: Crawled 948 pages (at 3680 pages/min), scraped 924 items (at 3240 items/min)
2018-11-29 10:12:47 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:12:48 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 52.10157002214863 items/sec
2018-11-29 10:12:48 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363571,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 12, 48, 244966),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45408256,
'memusage/startup': 45408256,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 12, 28, 125770)}
2018-11-29 10:12:48 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:12:49 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:12:49 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:12:49 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:12:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:12:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:12:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:12:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:12:49 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:12:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:12:49 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:12:52 [scrapy.extensions.logstats] INFO: Crawled 86 pages (at 1720 pages/min), scraped 86 items (at 1720 items/min)
2018-11-29 10:12:55 [scrapy.extensions.logstats] INFO: Crawled 282 pages (at 3920 pages/min), scraped 255 items (at 3380 items/min)
2018-11-29 10:12:58 [scrapy.extensions.logstats] INFO: Crawled 440 pages (at 3160 pages/min), scraped 416 items (at 3220 items/min)
2018-11-29 10:13:01 [scrapy.extensions.logstats] INFO: Crawled 596 pages (at 3120 pages/min), scraped 588 items (at 3440 items/min)
2018-11-29 10:13:04 [scrapy.extensions.logstats] INFO: Crawled 776 pages (at 3600 pages/min), scraped 749 items (at 3220 items/min)
2018-11-29 10:13:07 [scrapy.extensions.logstats] INFO: Crawled 942 pages (at 3320 pages/min), scraped 922 items (at 3460 items/min)
2018-11-29 10:13:08 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:13:09 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv
The average speed of the spider is 51.930715786382954 items/sec
2018-11-29 10:13:09 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 368205,
'downloader/request_count': 1059,
'downloader/request_method_count/GET': 1059,
'downloader/response_bytes': 23312675,
'downloader/response_count': 1059,
'downloader/response_status_count/200': 1059,
'dupefilter/filtered': 14936,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 13, 9, 522094),
'item_scraped_count': 1059,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 45408256,
'memusage/startup': 45408256,
'request_depth_max': 10,
'response_received_count': 1059,
'scheduler/dequeued': 1059,
'scheduler/dequeued/memory': 1059,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 12, 49, 143973)}
2018-11-29 10:13:09 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:13:10 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:13:10 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:13:10 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:13:10 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:13:10 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:13:10 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:13:10 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:13:10 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:13:10 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:13:10 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:13:13 [scrapy.extensions.logstats] INFO: Crawled 95 pages (at 1900 pages/min), scraped 88 items (at 1760 items/min)
2018-11-29 10:13:16 [scrapy.extensions.logstats] INFO: Crawled 269 pages (at 3480 pages/min), scraped 257 items (at 3380 items/min)
2018-11-29 10:13:19 [scrapy.extensions.logstats] INFO: Crawled 422 pages (at 3060 pages/min), scraped 416 items (at 3180 items/min)
2018-11-29 10:13:22 [scrapy.extensions.logstats] INFO: Crawled 595 pages (at 3460 pages/min), scraped 581 items (at 3300 items/min)
2018-11-29 10:13:25 [scrapy.extensions.logstats] INFO: Crawled 761 pages (at 3320 pages/min), scraped 755 items (at 3480 items/min)
2018-11-29 10:13:28 [scrapy.extensions.logstats] INFO: Crawled 926 pages (at 3300 pages/min), scraped 920 items (at 3300 items/min)
2018-11-29 10:13:29 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:13:30 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 51.69871550439042 items/sec
2018-11-29 10:13:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363583,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 13, 30, 695211),
'item_scraped_count': 1049,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45408256,
'memusage/startup': 45408256,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 10, 13, 10, 425861)}
2018-11-29 10:13:30 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 10:13:31 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-11-29 10:13:31 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 10:13:31 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 10:13:31 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 10:13:31 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 10:13:31 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 10:13:31 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 10:13:31 [scrapy.core.engine] INFO: Spider opened
2018-11-29 10:13:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 10:13:31 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-11-29 10:13:34 [scrapy.extensions.logstats] INFO: Crawled 85 pages (at 1700 pages/min), scraped 85 items (at 1700 items/min)
2018-11-29 10:13:37 [scrapy.extensions.logstats] INFO: Crawled 277 pages (at 3840 pages/min), scraped 254 items (at 3380 items/min)
2018-11-29 10:13:40 [scrapy.extensions.logstats] INFO: Crawled 431 pages (at 3080 pages/min), scraped 417 items (at 3260 items/min)
2018-11-29 10:13:43 [scrapy.extensions.logstats] INFO: Crawled 611 pages (at 3600 pages/min), scraped 586 items (at 3380 items/min)
2018-11-29 10:13:46 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 3280 pages/min), scraped 757 items (at 3420 items/min)
2018-11-29 10:13:49 [scrapy.extensions.logstats] INFO: Crawled 932 pages (at 3140 pages/min), scraped 930 items (at 3460 items/min)
2018-11-29 10:13:51 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 10:13:51 [scrapy.extensions.feedexport] INFO: Stored csv feed (1044 items) in: items.csv
The average speed of the spider is 51.745365191545254 items/sec
2018-11-29 10:13:51 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 362646,
'downloader/request_count': 1044,
'downloader/request_method_count/GET': 1044,
'downloader/response_bytes': 22995909,
'downloader/response_count': 1044,
'downloader/response_status_count/200': 1044,
'dupefilter/filtered': 14757,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 10, 13, 51, 714065),
'item_scraped_count': 1044,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 18,
'memusage/max': 45400064,
'memusage/startup': 45400064,
'request_depth_max': 10,
'response_received_count': 1044,
'scheduler/dequeued': 1044,
'scheduler/dequeued/memory': 1044,
'scheduler/enqueued': 1061,
'scheduler/enqueued/memory': 1061,
'start_time': datetime.datetime(2018, 11, 29, 10, 13, 31, 585460)}
2018-11-29 10:13:51 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 51.94822626010443 Median : 51.930715786382954 Std Dev : 0.21327250227979302
Executing scrapy-bench --n-runs 5 --book_url http://localhost:8000/ bookworm in /home/ec2-user/ves/scrapy-bench-3.6/
2018-11-29 09:44:12 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 09:44:12 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:44:12 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:44:12 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:44:12 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:44:12 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:44:12 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:44:12 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:44:12 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:44:15 [scrapy.extensions.logstats] INFO: Crawled 97 pages (at 1940 pages/min), scraped 90 items (at 1800 items/min)
2018-11-29 09:44:18 [scrapy.extensions.logstats] INFO: Crawled 286 pages (at 3780 pages/min), scraped 254 items (at 3280 items/min)
2018-11-29 09:44:21 [scrapy.extensions.logstats] INFO: Crawled 438 pages (at 3040 pages/min), scraped 430 items (at 3520 items/min)
2018-11-29 09:44:24 [scrapy.extensions.logstats] INFO: Crawled 597 pages (at 3180 pages/min), scraped 590 items (at 3200 items/min)
2018-11-29 09:44:27 [scrapy.extensions.logstats] INFO: Crawled 777 pages (at 3600 pages/min), scraped 755 items (at 3300 items/min)
2018-11-29 09:44:30 [scrapy.extensions.logstats] INFO: Crawled 955 pages (at 3560 pages/min), scraped 924 items (at 3380 items/min)
2018-11-29 09:44:31 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:44:32 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv
The average speed of the spider is 52.12280606504324 items/sec
2018-11-29 09:44:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 367102,
'downloader/request_count': 1059,
'downloader/request_method_count/GET': 1059,
'downloader/response_bytes': 23312675,
'downloader/response_count': 1059,
'downloader/response_status_count/200': 1059,
'dupefilter/filtered': 14936,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 44, 32, 369966),
'item_scraped_count': 1059,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 45654016,
'memusage/startup': 45654016,
'request_depth_max': 10,
'response_received_count': 1059,
'scheduler/dequeued': 1059,
'scheduler/dequeued/memory': 1059,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 44, 12, 82335)}
2018-11-29 09:44:32 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:44:33 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 09:44:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:44:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:44:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:44:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:44:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:44:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:44:33 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:44:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:44:36 [scrapy.extensions.logstats] INFO: Crawled 92 pages (at 1840 pages/min), scraped 92 items (at 1840 items/min)
2018-11-29 09:44:39 [scrapy.extensions.logstats] INFO: Crawled 273 pages (at 3620 pages/min), scraped 265 items (at 3460 items/min)
2018-11-29 09:44:42 [scrapy.extensions.logstats] INFO: Crawled 448 pages (at 3500 pages/min), scraped 435 items (at 3400 items/min)
2018-11-29 09:44:45 [scrapy.extensions.logstats] INFO: Crawled 611 pages (at 3260 pages/min), scraped 599 items (at 3280 items/min)
2018-11-29 09:44:48 [scrapy.extensions.logstats] INFO: Crawled 779 pages (at 3360 pages/min), scraped 764 items (at 3300 items/min)
2018-11-29 09:44:51 [scrapy.extensions.logstats] INFO: Crawled 954 pages (at 3500 pages/min), scraped 942 items (at 3560 items/min)
2018-11-29 09:44:52 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:44:53 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 52.22712560289813 items/sec
2018-11-29 09:44:53 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363560,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 44, 53, 410047),
'item_scraped_count': 1049,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45649920,
'memusage/startup': 45649920,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 44, 33, 338861)}
2018-11-29 09:44:53 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:44:54 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 09:44:54 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:44:54 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:44:54 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:44:54 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:44:54 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:44:54 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:44:54 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:44:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:44:57 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 85 items (at 1700 items/min)
2018-11-29 09:45:00 [scrapy.extensions.logstats] INFO: Crawled 272 pages (at 3580 pages/min), scraped 263 items (at 3560 items/min)
2018-11-29 09:45:03 [scrapy.extensions.logstats] INFO: Crawled 433 pages (at 3220 pages/min), scraped 414 items (at 3020 items/min)
2018-11-29 09:45:06 [scrapy.extensions.logstats] INFO: Crawled 596 pages (at 3260 pages/min), scraped 581 items (at 3340 items/min)
2018-11-29 09:45:09 [scrapy.extensions.logstats] INFO: Crawled 724 pages (at 2560 pages/min), scraped 722 items (at 2820 items/min)
2018-11-29 09:45:12 [scrapy.extensions.logstats] INFO: Crawled 888 pages (at 3280 pages/min), scraped 886 items (at 3280 items/min)
2018-11-29 09:45:14 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:45:15 [scrapy.extensions.logstats] INFO: Crawled 1057 pages (at 3380 pages/min), scraped 1055 items (at 3380 items/min)
2018-11-29 09:45:15 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv
The average speed of the spider is 50.04157957218292 items/sec
2018-11-29 09:45:15 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 366382,
'downloader/request_count': 1057,
'downloader/request_method_count/GET': 1057,
'downloader/response_bytes': 23206941,
'downloader/response_count': 1057,
'downloader/response_status_count/200': 1057,
'dupefilter/filtered': 14790,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 45, 15, 468617),
'item_scraped_count': 1057,
'log_count/INFO': 15,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 25,
'memusage/max': 45649920,
'memusage/startup': 45649920,
'request_depth_max': 10,
'response_received_count': 1057,
'scheduler/dequeued': 1057,
'scheduler/dequeued/memory': 1057,
'scheduler/enqueued': 1081,
'scheduler/enqueued/memory': 1081,
'start_time': datetime.datetime(2018, 11, 29, 9, 44, 54, 376711)}
2018-11-29 09:45:15 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:45:16 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 09:45:16 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:45:16 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:45:16 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:45:16 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:45:16 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:45:16 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:45:16 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:45:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:45:19 [scrapy.extensions.logstats] INFO: Crawled 93 pages (at 1860 pages/min), scraped 86 items (at 1720 items/min)
2018-11-29 09:45:22 [scrapy.extensions.logstats] INFO: Crawled 280 pages (at 3740 pages/min), scraped 261 items (at 3500 items/min)
2018-11-29 09:45:25 [scrapy.extensions.logstats] INFO: Crawled 440 pages (at 3200 pages/min), scraped 422 items (at 3220 items/min)
2018-11-29 09:45:28 [scrapy.extensions.logstats] INFO: Crawled 615 pages (at 3500 pages/min), scraped 587 items (at 3300 items/min)
2018-11-29 09:45:31 [scrapy.extensions.logstats] INFO: Crawled 776 pages (at 3220 pages/min), scraped 766 items (at 3580 items/min)
2018-11-29 09:45:34 [scrapy.extensions.logstats] INFO: Crawled 948 pages (at 3440 pages/min), scraped 933 items (at 3340 items/min)
2018-11-29 09:45:35 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:45:36 [scrapy.extensions.feedexport] INFO: Stored csv feed (1048 items) in: items.csv
The average speed of the spider is 52.2544657729506 items/sec
2018-11-29 09:45:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363602,
'downloader/request_count': 1048,
'downloader/request_method_count/GET': 1048,
'downloader/response_bytes': 23067683,
'downloader/response_count': 1048,
'downloader/response_status_count/200': 1048,
'dupefilter/filtered': 14773,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 45, 36, 491510),
'item_scraped_count': 1048,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45649920,
'memusage/startup': 45649920,
'request_depth_max': 10,
'response_received_count': 1048,
'scheduler/dequeued': 1048,
'scheduler/dequeued/memory': 1048,
'scheduler/enqueued': 1081,
'scheduler/enqueued/memory': 1081,
'start_time': datetime.datetime(2018, 11, 29, 9, 45, 16, 450202)}
2018-11-29 09:45:36 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-11-29 09:45:37 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: books)
2018-11-29 09:45:37 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.1, w3lib 1.19.0, Twisted 18.9.0, Python 3.6.6 (default, Aug 13 2018, 18:24:23) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Linux-3.10.0-693.el7.x86_64-x86_64-with-redhat-7.4-Maipo
2018-11-29 09:45:37 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-11-29 09:45:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-11-29 09:45:37 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-11-29 09:45:37 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-11-29 09:45:37 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-11-29 09:45:37 [scrapy.core.engine] INFO: Spider opened
2018-11-29 09:45:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-11-29 09:45:40 [scrapy.extensions.logstats] INFO: Crawled 97 pages (at 1940 pages/min), scraped 84 items (at 1680 items/min)
2018-11-29 09:45:43 [scrapy.extensions.logstats] INFO: Crawled 282 pages (at 3700 pages/min), scraped 264 items (at 3600 items/min)
2018-11-29 09:45:46 [scrapy.extensions.logstats] INFO: Crawled 442 pages (at 3200 pages/min), scraped 428 items (at 3280 items/min)
2018-11-29 09:45:49 [scrapy.extensions.logstats] INFO: Crawled 622 pages (at 3600 pages/min), scraped 591 items (at 3260 items/min)
2018-11-29 09:45:52 [scrapy.extensions.logstats] INFO: Crawled 774 pages (at 3040 pages/min), scraped 766 items (at 3500 items/min)
2018-11-29 09:45:55 [scrapy.extensions.logstats] INFO: Crawled 963 pages (at 3780 pages/min), scraped 946 items (at 3600 items/min)
2018-11-29 09:45:56 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-11-29 09:45:57 [scrapy.extensions.feedexport] INFO: Stored csv feed (1049 items) in: items.csv
The average speed of the spider is 52.416718620295846 items/sec
2018-11-29 09:45:57 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 363635,
'downloader/request_count': 1049,
'downloader/request_method_count/GET': 1049,
'downloader/response_bytes': 23119129,
'downloader/response_count': 1049,
'downloader/response_status_count/200': 1049,
'dupefilter/filtered': 14846,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 11, 29, 9, 45, 57, 448373),
'item_scraped_count': 1049,
'log_count/INFO': 14,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 45649920,
'memusage/startup': 45649920,
'request_depth_max': 10,
'response_received_count': 1049,
'scheduler/dequeued': 1049,
'scheduler/dequeued/memory': 1049,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 11, 29, 9, 45, 37, 465467)}
2018-11-29 09:45:57 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '5'
Mean : 51.812539126674146 Median : 52.22712560289813 Std Dev : 0.8904804640371003
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment