Skip to content

Instantly share code, notes, and snippets.

@whalebot-helmsman
Created August 28, 2018 09:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save whalebot-helmsman/427908453e28e4d91ba24062a3a1aa05 to your computer and use it in GitHub Desktop.
Save whalebot-helmsman/427908453e28e4d91ba24062a3a1aa05 to your computer and use it in GitHub Desktop.
Performance comprasion
Executing scrapy-bench --n-runs 10 --book_url http://localhost:8080/books.toscrape.com/ bookworm in /home/nikita/ves/scrapy-bench
2018-08-28 09:07:03 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:07:03 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:07:03 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:07:03 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:07:03 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:07:03 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:07:03 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:07:03 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:07:03 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:07:03 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:07:07 [scrapy.extensions.logstats] INFO: Crawled 224 pages (at 4480 pages/min), scraped 202 items (at 4040 items/min)
2018-08-28 09:07:10 [scrapy.extensions.logstats] INFO: Crawled 492 pages (at 5360 pages/min), scraped 472 items (at 5400 items/min)
2018-08-28 09:07:13 [scrapy.extensions.logstats] INFO: Crawled 772 pages (at 5600 pages/min), scraped 743 items (at 5420 items/min)
2018-08-28 09:07:15 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:07:16 [scrapy.extensions.logstats] INFO: Crawled 1078 pages (at 6120 pages/min), scraped 1042 items (at 5980 items/min)
2018-08-28 09:07:16 [scrapy.extensions.feedexport] INFO: Stored csv feed (1078 items) in: items.csv
The average speed of the spider is 85.19939933195813 items/sec
2018-08-28 09:07:16 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414549,
'downloader/request_count': 1078,
'downloader/request_method_count/GET': 1078,
'downloader/response_bytes': 23642440,
'downloader/response_count': 1078,
'downloader/response_status_count/200': 1078,
'dupefilter/filtered': 15023,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 7, 16, 239236),
'item_scraped_count': 1078,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52228096,
'memusage/startup': 52228096,
'request_depth_max': 9,
'response_received_count': 1078,
'scheduler/dequeued': 1078,
'scheduler/dequeued/memory': 1078,
'scheduler/enqueued': 1101,
'scheduler/enqueued/memory': 1101,
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 3, 917687)}
2018-08-28 09:07:16 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:07:16 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:07:16 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:07:16 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:07:16 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:07:16 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:07:16 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:07:16 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:07:16 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:07:16 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:07:16 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:07:19 [scrapy.extensions.logstats] INFO: Crawled 212 pages (at 4240 pages/min), scraped 205 items (at 4100 items/min)
2018-08-28 09:07:22 [scrapy.extensions.logstats] INFO: Crawled 511 pages (at 5980 pages/min), scraped 469 items (at 5280 items/min)
2018-08-28 09:07:25 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 5280 pages/min), scraped 767 items (at 5960 items/min)
2018-08-28 09:07:28 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:07:28 [scrapy.extensions.logstats] INFO: Crawled 1069 pages (at 5880 pages/min), scraped 1065 items (at 5960 items/min)
2018-08-28 09:07:28 [scrapy.extensions.feedexport] INFO: Stored csv feed (1069 items) in: items.csv
The average speed of the spider is 88.36135238416237 items/sec
2018-08-28 09:07:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 411082,
'downloader/request_count': 1069,
'downloader/request_method_count/GET': 1069,
'downloader/response_bytes': 23500203,
'downloader/response_count': 1069,
'downloader/response_status_count/200': 1069,
'dupefilter/filtered': 15006,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 7, 28, 906855),
'item_scraped_count': 1069,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 52129792,
'memusage/startup': 52129792,
'request_depth_max': 10,
'response_received_count': 1069,
'scheduler/dequeued': 1069,
'scheduler/dequeued/memory': 1069,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 16, 818949)}
2018-08-28 09:07:28 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:07:29 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:07:29 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:07:29 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:07:29 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:07:29 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:07:29 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:07:29 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:07:29 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:07:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:07:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:07:32 [scrapy.extensions.logstats] INFO: Crawled 209 pages (at 4180 pages/min), scraped 203 items (at 4060 items/min)
2018-08-28 09:07:35 [scrapy.extensions.logstats] INFO: Crawled 475 pages (at 5320 pages/min), scraped 470 items (at 5340 items/min)
2018-08-28 09:07:38 [scrapy.extensions.logstats] INFO: Crawled 803 pages (at 6560 pages/min), scraped 753 items (at 5660 items/min)
2018-08-28 09:07:41 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:07:41 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 5520 pages/min), scraped 1055 items (at 6040 items/min)
2018-08-28 09:07:41 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 86.64585359347458 items/sec
2018-08-28 09:07:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 415124,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 7, 41, 762999),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52117504,
'memusage/startup': 52117504,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 29, 505044)}
2018-08-28 09:07:41 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:07:42 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:07:42 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:07:42 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:07:42 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:07:42 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:07:42 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:07:42 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:07:42 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:07:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:07:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:07:45 [scrapy.extensions.logstats] INFO: Crawled 225 pages (at 4500 pages/min), scraped 202 items (at 4040 items/min)
2018-08-28 09:07:48 [scrapy.extensions.logstats] INFO: Crawled 493 pages (at 5360 pages/min), scraped 475 items (at 5460 items/min)
2018-08-28 09:07:51 [scrapy.extensions.logstats] INFO: Crawled 777 pages (at 5680 pages/min), scraped 746 items (at 5420 items/min)
2018-08-28 09:07:53 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:07:54 [scrapy.extensions.logstats] INFO: Crawled 1074 pages (at 5940 pages/min), scraped 1056 items (at 6200 items/min)
2018-08-28 09:07:54 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 86.9510758781473 items/sec
2018-08-28 09:07:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414850,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 7, 54, 748334),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52105216,
'memusage/startup': 52105216,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 42, 358562)}
2018-08-28 09:07:54 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:07:55 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:07:55 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:07:55 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:07:55 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:07:55 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:07:55 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:07:55 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:07:55 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:07:55 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:07:55 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:07:58 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 205 items (at 4100 items/min)
2018-08-28 09:08:01 [scrapy.extensions.logstats] INFO: Crawled 519 pages (at 6160 pages/min), scraped 467 items (at 5240 items/min)
2018-08-28 09:08:04 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 5120 pages/min), scraped 767 items (at 6000 items/min)
2018-08-28 09:08:07 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:08:07 [scrapy.extensions.logstats] INFO: Crawled 1069 pages (at 5880 pages/min), scraped 1053 items (at 5720 items/min)
2018-08-28 09:08:07 [scrapy.extensions.feedexport] INFO: Stored csv feed (1069 items) in: items.csv
The average speed of the spider is 86.69472099064264 items/sec
2018-08-28 09:08:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 411007,
'downloader/request_count': 1069,
'downloader/request_method_count/GET': 1069,
'downloader/response_bytes': 23500340,
'downloader/response_count': 1069,
'downloader/response_status_count/200': 1069,
'dupefilter/filtered': 15006,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 7, 544255),
'item_scraped_count': 1069,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 34,
'memusage/max': 52109312,
'memusage/startup': 52109312,
'request_depth_max': 9,
'response_received_count': 1069,
'scheduler/dequeued': 1069,
'scheduler/dequeued/memory': 1069,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 7, 55, 346436)}
2018-08-28 09:08:07 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:08:08 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:08:08 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:08:08 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:08:08 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:08:08 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:08:08 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:08:08 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:08:08 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:08:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:08:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:08:11 [scrapy.extensions.logstats] INFO: Crawled 212 pages (at 4240 pages/min), scraped 205 items (at 4100 items/min)
2018-08-28 09:08:14 [scrapy.extensions.logstats] INFO: Crawled 502 pages (at 5800 pages/min), scraped 483 items (at 5560 items/min)
2018-08-28 09:08:17 [scrapy.extensions.logstats] INFO: Crawled 809 pages (at 6140 pages/min), scraped 753 items (at 5400 items/min)
2018-08-28 09:08:19 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:08:20 [scrapy.extensions.logstats] INFO: Crawled 1056 pages (at 4940 pages/min), scraped 1049 items (at 5920 items/min)
2018-08-28 09:08:20 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv
The average speed of the spider is 86.72703503427279 items/sec
2018-08-28 09:08:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 406518,
'downloader/request_count': 1057,
'downloader/request_method_count/GET': 1057,
'downloader/response_bytes': 23209920,
'downloader/response_count': 1057,
'downloader/response_status_count/200': 1057,
'dupefilter/filtered': 14790,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 20, 307838),
'item_scraped_count': 1057,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52531200,
'memusage/startup': 52531200,
'request_depth_max': 9,
'response_received_count': 1057,
'scheduler/dequeued': 1057,
'scheduler/dequeued/memory': 1057,
'scheduler/enqueued': 1080,
'scheduler/enqueued/memory': 1080,
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 8, 128974)}
2018-08-28 09:08:20 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:08:20 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:08:20 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:08:20 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:08:20 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:08:20 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:08:20 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:08:20 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:08:20 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:08:20 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:08:20 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:08:24 [scrapy.extensions.logstats] INFO: Crawled 226 pages (at 4520 pages/min), scraped 196 items (at 3920 items/min)
2018-08-28 09:08:27 [scrapy.extensions.logstats] INFO: Crawled 486 pages (at 5200 pages/min), scraped 470 items (at 5480 items/min)
2018-08-28 09:08:30 [scrapy.extensions.logstats] INFO: Crawled 762 pages (at 5520 pages/min), scraped 748 items (at 5560 items/min)
2018-08-28 09:08:32 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:08:32 [scrapy.extensions.logstats] INFO: Crawled 1053 pages (at 5820 pages/min), scraped 1049 items (at 6020 items/min)
2018-08-28 09:08:33 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv
The average speed of the spider is 86.12669129960925 items/sec
2018-08-28 09:08:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 406997,
'downloader/request_count': 1058,
'downloader/request_method_count/GET': 1058,
'downloader/response_bytes': 23261366,
'downloader/response_count': 1058,
'downloader/response_status_count/200': 1058,
'dupefilter/filtered': 14863,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 33, 164731),
'item_scraped_count': 1058,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52207616,
'memusage/startup': 52207616,
'request_depth_max': 9,
'response_received_count': 1058,
'scheduler/dequeued': 1058,
'scheduler/dequeued/memory': 1058,
'scheduler/enqueued': 1081,
'scheduler/enqueued/memory': 1081,
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 20, 902393)}
2018-08-28 09:08:33 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:08:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:08:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:08:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:08:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:08:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:08:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:08:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:08:33 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:08:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:08:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:08:36 [scrapy.extensions.logstats] INFO: Crawled 209 pages (at 4180 pages/min), scraped 204 items (at 4080 items/min)
2018-08-28 09:08:39 [scrapy.extensions.logstats] INFO: Crawled 481 pages (at 5440 pages/min), scraped 477 items (at 5460 items/min)
2018-08-28 09:08:42 [scrapy.extensions.logstats] INFO: Crawled 765 pages (at 5680 pages/min), scraped 754 items (at 5540 items/min)
2018-08-28 09:08:45 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:08:45 [scrapy.extensions.logstats] INFO: Crawled 1070 pages (at 6100 pages/min), scraped 1048 items (at 5880 items/min)
2018-08-28 09:08:46 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 86.4397849642877 items/sec
2018-08-28 09:08:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414962,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 46, 159024),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52101120,
'memusage/startup': 52101120,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 33, 751394)}
2018-08-28 09:08:46 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:08:46 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:08:46 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:08:46 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:08:46 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:08:46 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:08:46 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:08:46 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:08:46 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:08:46 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:08:46 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:08:49 [scrapy.extensions.logstats] INFO: Crawled 207 pages (at 4140 pages/min), scraped 205 items (at 4100 items/min)
2018-08-28 09:08:52 [scrapy.extensions.logstats] INFO: Crawled 529 pages (at 6440 pages/min), scraped 469 items (at 5280 items/min)
2018-08-28 09:08:56 [scrapy.extensions.logstats] INFO: Crawled 797 pages (at 5360 pages/min), scraped 762 items (at 5860 items/min)
2018-08-28 09:08:58 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:08:58 [scrapy.extensions.logstats] INFO: Crawled 1059 pages (at 5240 pages/min), scraped 1057 items (at 5900 items/min)
2018-08-28 09:08:58 [scrapy.extensions.feedexport] INFO: Stored csv feed (1059 items) in: items.csv
The average speed of the spider is 87.85250088725208 items/sec
2018-08-28 09:08:58 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 407320,
'downloader/request_count': 1059,
'downloader/request_method_count/GET': 1059,
'downloader/response_bytes': 23312675,
'downloader/response_count': 1059,
'downloader/response_status_count/200': 1059,
'dupefilter/filtered': 14936,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 8, 58, 801872),
'item_scraped_count': 1059,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52125696,
'memusage/startup': 52125696,
'request_depth_max': 10,
'response_received_count': 1059,
'scheduler/dequeued': 1059,
'scheduler/dequeued/memory': 1059,
'scheduler/enqueued': 1082,
'scheduler/enqueued/memory': 1082,
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 46, 749978)}
2018-08-28 09:08:58 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:08:59 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:08:59 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:08:59 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:08:59 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:08:59 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:08:59 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:08:59 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:08:59 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:08:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:08:59 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:09:02 [scrapy.extensions.logstats] INFO: Crawled 207 pages (at 4140 pages/min), scraped 205 items (at 4100 items/min)
2018-08-28 09:09:05 [scrapy.extensions.logstats] INFO: Crawled 492 pages (at 5700 pages/min), scraped 479 items (at 5480 items/min)
2018-08-28 09:09:08 [scrapy.extensions.logstats] INFO: Crawled 773 pages (at 5620 pages/min), scraped 763 items (at 5680 items/min)
2018-08-28 09:09:10 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:09:11 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 6120 pages/min), scraped 1070 items (at 6140 items/min)
2018-08-28 09:09:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 88.45882549677393 items/sec
2018-08-28 09:09:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414962,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 9, 11, 569799),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52412416,
'memusage/startup': 52412416,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 8, 59, 389989)}
2018-08-28 09:09:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '10'
Mean : 86.94572398605807 Median : 86.71087801245771 Std Dev : 0.9639990239084966
Executing scrapy-bench --n-runs 10 --book_url http://localhost:8080/books.toscrape.com/ bookworm in /home/nikita/ves/scrapy-bench
2018-08-28 09:09:29 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:09:29 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:09:29 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:09:29 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:09:29 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:09:29 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:09:29 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:09:29 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:09:29 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:09:29 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:09:32 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 204 items (at 4080 items/min)
2018-08-28 09:09:35 [scrapy.extensions.logstats] INFO: Crawled 482 pages (at 5420 pages/min), scraped 480 items (at 5520 items/min)
2018-08-28 09:09:38 [scrapy.extensions.logstats] INFO: Crawled 800 pages (at 6360 pages/min), scraped 767 items (at 5740 items/min)
2018-08-28 09:09:41 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:09:41 [scrapy.extensions.logstats] INFO: Crawled 1072 pages (at 5440 pages/min), scraped 1067 items (at 6000 items/min)
2018-08-28 09:09:41 [scrapy.extensions.feedexport] INFO: Stored csv feed (1080 items) in: items.csv
The average speed of the spider is 87.72160455346388 items/sec
2018-08-28 09:09:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 415333,
'downloader/request_count': 1080,
'downloader/request_method_count/GET': 1080,
'downloader/response_bytes': 23745195,
'downloader/response_count': 1080,
'downloader/response_status_count/200': 1080,
'dupefilter/filtered': 15169,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 9, 41, 974612),
'item_scraped_count': 1080,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52178944,
'memusage/startup': 52178944,
'request_depth_max': 10,
'response_received_count': 1080,
'scheduler/dequeued': 1080,
'scheduler/dequeued/memory': 1080,
'scheduler/enqueued': 1103,
'scheduler/enqueued/memory': 1103,
'start_time': datetime.datetime(2018, 8, 28, 9, 9, 29, 722831)}
2018-08-28 09:09:41 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:09:42 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:09:42 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:09:42 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:09:42 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:09:42 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:09:42 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:09:42 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:09:42 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:09:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:09:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:09:45 [scrapy.extensions.logstats] INFO: Crawled 206 pages (at 4120 pages/min), scraped 201 items (at 4020 items/min)
2018-08-28 09:09:48 [scrapy.extensions.logstats] INFO: Crawled 472 pages (at 5320 pages/min), scraped 465 items (at 5280 items/min)
2018-08-28 09:09:51 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 6060 pages/min), scraped 765 items (at 6000 items/min)
2018-08-28 09:09:54 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:09:54 [scrapy.extensions.logstats] INFO: Crawled 1058 pages (at 5660 pages/min), scraped 1055 items (at 5800 items/min)
2018-08-28 09:09:54 [scrapy.extensions.feedexport] INFO: Stored csv feed (1058 items) in: items.csv
The average speed of the spider is 87.18518182155935 items/sec
2018-08-28 09:09:54 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 407139,
'downloader/request_count': 1058,
'downloader/request_method_count/GET': 1058,
'downloader/response_bytes': 23261366,
'downloader/response_count': 1058,
'downloader/response_status_count/200': 1058,
'dupefilter/filtered': 14863,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 9, 54, 679296),
'item_scraped_count': 1058,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52494336,
'memusage/startup': 52494336,
'request_depth_max': 9,
'response_received_count': 1058,
'scheduler/dequeued': 1058,
'scheduler/dequeued/memory': 1058,
'scheduler/enqueued': 1081,
'scheduler/enqueued/memory': 1081,
'start_time': datetime.datetime(2018, 8, 28, 9, 9, 42, 554629)}
2018-08-28 09:09:54 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:09:55 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:09:55 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:09:55 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:09:55 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:09:55 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:09:55 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:09:55 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:09:55 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:09:55 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:09:55 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:09:58 [scrapy.extensions.logstats] INFO: Crawled 203 pages (at 4060 pages/min), scraped 197 items (at 3940 items/min)
2018-08-28 09:10:01 [scrapy.extensions.logstats] INFO: Crawled 501 pages (at 5960 pages/min), scraped 480 items (at 5660 items/min)
2018-08-28 09:10:04 [scrapy.extensions.logstats] INFO: Crawled 799 pages (at 5960 pages/min), scraped 747 items (at 5340 items/min)
2018-08-28 09:10:07 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:10:07 [scrapy.extensions.logstats] INFO: Crawled 1055 pages (at 5120 pages/min), scraped 1051 items (at 6080 items/min)
2018-08-28 09:10:07 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 85.799653992805 items/sec
2018-08-28 09:10:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414973,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 7, 652824),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52105216,
'memusage/startup': 52105216,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 9, 55, 270759)}
2018-08-28 09:10:07 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:10:08 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:10:08 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:10:08 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:10:08 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:10:08 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:10:08 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:10:08 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:10:08 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:10:08 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:10:08 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:10:11 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 205 items (at 4100 items/min)
2018-08-28 09:10:14 [scrapy.extensions.logstats] INFO: Crawled 507 pages (at 5920 pages/min), scraped 471 items (at 5320 items/min)
2018-08-28 09:10:17 [scrapy.extensions.logstats] INFO: Crawled 777 pages (at 5400 pages/min), scraped 762 items (at 5820 items/min)
2018-08-28 09:10:19 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:10:20 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 6040 pages/min), scraped 1064 items (at 6040 items/min)
2018-08-28 09:10:20 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 86.19604560587408 items/sec
2018-08-28 09:10:20 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414860,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 20, 463788),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52125696,
'memusage/startup': 52125696,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 8, 244426)}
2018-08-28 09:10:20 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:10:21 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:10:21 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:10:21 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:10:21 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:10:21 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:10:21 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:10:21 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:10:21 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:10:21 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:10:21 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:10:24 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 207 items (at 4140 items/min)
2018-08-28 09:10:27 [scrapy.extensions.logstats] INFO: Crawled 487 pages (at 5520 pages/min), scraped 480 items (at 5460 items/min)
2018-08-28 09:10:30 [scrapy.extensions.logstats] INFO: Crawled 772 pages (at 5700 pages/min), scraped 766 items (at 5720 items/min)
2018-08-28 09:10:32 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:10:33 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 6140 pages/min), scraped 1078 items (at 6240 items/min)
2018-08-28 09:10:33 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 87.88084122833638 items/sec
2018-08-28 09:10:33 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414975,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 33, 139077),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52408320,
'memusage/startup': 52408320,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 21, 48833)}
2018-08-28 09:10:33 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:10:33 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:10:33 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:10:33 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:10:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:10:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:10:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:10:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:10:33 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:10:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:10:33 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:10:36 [scrapy.extensions.logstats] INFO: Crawled 223 pages (at 4460 pages/min), scraped 203 items (at 4060 items/min)
2018-08-28 09:10:39 [scrapy.extensions.logstats] INFO: Crawled 495 pages (at 5440 pages/min), scraped 473 items (at 5400 items/min)
2018-08-28 09:10:42 [scrapy.extensions.logstats] INFO: Crawled 783 pages (at 5760 pages/min), scraped 747 items (at 5480 items/min)
2018-08-28 09:10:45 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:10:45 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 5920 pages/min), scraped 1059 items (at 6240 items/min)
2018-08-28 09:10:45 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 86.54747083527113 items/sec
2018-08-28 09:10:45 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414864,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 45, 959290),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52027392,
'memusage/startup': 52027392,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 33, 722323)}
2018-08-28 09:10:45 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:10:46 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:10:46 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:10:46 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:10:46 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:10:46 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:10:46 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:10:46 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:10:46 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:10:46 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:10:46 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:10:49 [scrapy.extensions.logstats] INFO: Crawled 226 pages (at 4520 pages/min), scraped 199 items (at 3980 items/min)
2018-08-28 09:10:52 [scrapy.extensions.logstats] INFO: Crawled 484 pages (at 5160 pages/min), scraped 471 items (at 5440 items/min)
2018-08-28 09:10:55 [scrapy.extensions.logstats] INFO: Crawled 770 pages (at 5720 pages/min), scraped 764 items (at 5860 items/min)
2018-08-28 09:10:58 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:10:58 [scrapy.extensions.logstats] INFO: Crawled 1070 pages (at 6000 pages/min), scraped 1065 items (at 6020 items/min)
2018-08-28 09:10:58 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 87.01701995983392 items/sec
2018-08-28 09:10:58 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 415045,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 10, 58, 863905),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52129792,
'memusage/startup': 52129792,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 46, 547978)}
2018-08-28 09:10:58 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:10:59 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:10:59 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:10:59 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:10:59 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:10:59 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:10:59 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:10:59 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:10:59 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:10:59 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:10:59 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:11:02 [scrapy.extensions.logstats] INFO: Crawled 209 pages (at 4180 pages/min), scraped 204 items (at 4080 items/min)
2018-08-28 09:11:05 [scrapy.extensions.logstats] INFO: Crawled 495 pages (at 5720 pages/min), scraped 486 items (at 5640 items/min)
2018-08-28 09:11:08 [scrapy.extensions.logstats] INFO: Crawled 771 pages (at 5520 pages/min), scraped 767 items (at 5620 items/min)
2018-08-28 09:11:10 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:11:11 [scrapy.extensions.feedexport] INFO: Stored csv feed (1057 items) in: items.csv
The average speed of the spider is 86.25596305414507 items/sec
2018-08-28 09:11:11 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 406642,
'downloader/request_count': 1057,
'downloader/request_method_count/GET': 1057,
'downloader/response_bytes': 23209920,
'downloader/response_count': 1057,
'downloader/response_status_count/200': 1057,
'dupefilter/filtered': 14790,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 11, 11, 411467),
'item_scraped_count': 1057,
'log_count/INFO': 12,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52207616,
'memusage/startup': 52207616,
'request_depth_max': 9,
'response_received_count': 1057,
'scheduler/dequeued': 1057,
'scheduler/dequeued/memory': 1057,
'scheduler/enqueued': 1080,
'scheduler/enqueued/memory': 1080,
'start_time': datetime.datetime(2018, 8, 28, 9, 10, 59, 450315)}
2018-08-28 09:11:11 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:11:11 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:11:11 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:11:11 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:11:11 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:11:11 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:11:11 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:11:11 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:11:11 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:11:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:11:11 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:11:15 [scrapy.extensions.logstats] INFO: Crawled 211 pages (at 4220 pages/min), scraped 205 items (at 4100 items/min)
2018-08-28 09:11:18 [scrapy.extensions.logstats] INFO: Crawled 512 pages (at 6020 pages/min), scraped 465 items (at 5200 items/min)
2018-08-28 09:11:21 [scrapy.extensions.logstats] INFO: Crawled 764 pages (at 5040 pages/min), scraped 746 items (at 5620 items/min)
2018-08-28 09:11:23 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:11:24 [scrapy.extensions.logstats] INFO: Crawled 1054 pages (at 5800 pages/min), scraped 1048 items (at 6040 items/min)
2018-08-28 09:11:24 [scrapy.extensions.feedexport] INFO: Stored csv feed (1078 items) in: items.csv
The average speed of the spider is 85.46989085226573 items/sec
2018-08-28 09:11:24 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414607,
'downloader/request_count': 1078,
'downloader/request_method_count/GET': 1078,
'downloader/response_bytes': 23642440,
'downloader/response_count': 1078,
'downloader/response_status_count/200': 1078,
'dupefilter/filtered': 15023,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 11, 24, 362757),
'item_scraped_count': 1078,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52387840,
'memusage/startup': 52387840,
'request_depth_max': 9,
'response_received_count': 1078,
'scheduler/dequeued': 1078,
'scheduler/dequeued/memory': 1078,
'scheduler/enqueued': 1101,
'scheduler/enqueued/memory': 1101,
'start_time': datetime.datetime(2018, 8, 28, 9, 11, 11, 992710)}
2018-08-28 09:11:24 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
2018-08-28 09:11:24 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: books)
2018-08-28 09:11:24 [scrapy.utils.log] INFO: Versions: lxml 4.2.4.0, libxml2 2.9.8, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.3 (default, Jun 4 2018, 10:24:41) - [GCC 4.8.4], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018), cryptography 2.3.1, Platform Linux-4.4.0-96-generic-x86_64-with-debian-jessie-sid
2018-08-28 09:11:24 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'books', 'CLOSESPIDER_ITEMCOUNT': 1000, 'CONCURRENT_REQUESTS': 120, 'FEED_FORMAT': 'csv', 'FEED_URI': 'items.csv', 'LOGSTATS_INTERVAL': 3, 'LOG_LEVEL': 'INFO', 'MEMDEBUG_ENABLED': True, 'NEWSPIDER_MODULE': 'books.spiders', 'RETRY_ENABLED': False, 'SPIDER_MODULES': ['books.spiders']}
2018-08-28 09:11:24 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.memdebug.MemoryDebugger',
'scrapy.extensions.closespider.CloseSpider',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2018-08-28 09:11:24 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-08-28 09:11:24 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-08-28 09:11:24 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2018-08-28 09:11:24 [scrapy.core.engine] INFO: Spider opened
2018-08-28 09:11:24 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-08-28 09:11:24 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2018-08-28 09:11:27 [scrapy.extensions.logstats] INFO: Crawled 207 pages (at 4140 pages/min), scraped 201 items (at 4020 items/min)
2018-08-28 09:11:31 [scrapy.extensions.logstats] INFO: Crawled 491 pages (at 5680 pages/min), scraped 477 items (at 5520 items/min)
2018-08-28 09:11:34 [scrapy.extensions.logstats] INFO: Crawled 775 pages (at 5680 pages/min), scraped 768 items (at 5820 items/min)
2018-08-28 09:11:36 [scrapy.core.engine] INFO: Closing spider (closespider_itemcount)
2018-08-28 09:11:37 [scrapy.extensions.logstats] INFO: Crawled 1079 pages (at 6080 pages/min), scraped 1071 items (at 6060 items/min)
2018-08-28 09:11:37 [scrapy.extensions.feedexport] INFO: Stored csv feed (1079 items) in: items.csv
The average speed of the spider is 85.69234060623224 items/sec
2018-08-28 09:11:37 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 414962,
'downloader/request_count': 1079,
'downloader/request_method_count/GET': 1079,
'downloader/response_bytes': 23693886,
'downloader/response_count': 1079,
'downloader/response_status_count/200': 1079,
'dupefilter/filtered': 15096,
'finish_reason': 'closespider_itemcount',
'finish_time': datetime.datetime(2018, 8, 28, 9, 11, 37, 132935),
'item_scraped_count': 1079,
'log_count/INFO': 13,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FollowAllSpider': 1,
'memdebug/live_refs/Request': 24,
'memusage/max': 52129792,
'memusage/startup': 52129792,
'request_depth_max': 9,
'response_received_count': 1079,
'scheduler/dequeued': 1079,
'scheduler/dequeued/memory': 1079,
'scheduler/enqueued': 1102,
'scheduler/enqueued/memory': 1102,
'start_time': datetime.datetime(2018, 8, 28, 9, 11, 24, 957884)}
2018-08-28 09:11:37 [scrapy.core.engine] INFO: Spider closed (closespider_itemcount)
The results of the benchmark are (all speeds in items/sec) :
Test = 'Book Spider' Iterations = '10'
Mean : 86.57660125097868 Median : 86.4017169447081 Std Dev : 0.8022010534209844
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment