Skip to content

Instantly share code, notes, and snippets.

@jheasly
Created March 16, 2021 21:02
Show Gist options
  • Save jheasly/a923f02e9ed5ab7faf1336a88a3e59d9 to your computer and use it in GitHub Desktop.
Save jheasly/a923f02e9ed5ab7faf1336a88a3e59d9 to your computer and use it in GitHub Desktop.
Scrapy Oregon health inspection output
(open-health-inspection-scraper) bash-3.2$ ./scrapeHealthData.py
/Users/jpheasly/Development/open-health-inspection-scraper/scraper/spiders/healthspace_spider.py:206: SyntaxWarning: "is" with a literal. Did you mean "=="?
'critical': critical is "critical",
2021-03-16 15:49:34 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybot)
2021-03-16 15:49:34 [scrapy.utils.log] INFO: Overridden settings: {'DOWNLOAD_DELAY': 10, 'SPIDER_MODULES': ['scraper.spiders']}
2021-03-16 15:49:34 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.spiderstate.SpiderState']
2021-03-16 15:49:34 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-03-16 15:49:34 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-03-16 15:49:34 [py.warnings] WARNING: /Users/jpheasly/Development/open-health-inspection-scraper/scraper/pipelines.py:87: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if result['n'] is not 1:
2021-03-16 15:49:34 [py.warnings] WARNING: /Users/jpheasly/Development/open-health-inspection-scraper/scraper/pipelines.py:101: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if result['n'] is not 1:
2021-03-16 15:49:34 [scrapy.middleware] INFO: Enabled item pipelines:
['scraper.pipelines.MongoDBPipeline']
2021-03-16 15:49:34 [scrapy.core.engine] INFO: Spider opened
2021-03-16 15:49:34 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 15:49:34 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
2021-03-16 15:49:35 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/clients/oregon/state/statewebportal.nsf/module_healthRegions.xsp?showview=region> from <GET https://healthspace.com/clients/oregon/state/statewebportal.nsf/module_healthRegions.xsp?showview=region>
2021-03-16 15:49:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://healthspace.com/clients/oregon/state/statewebportal.nsf/module_healthRegions.xsp?showview=region> (referer: None)
2021-03-16 15:49:49 [scrapy.spidermiddlewares.offsite] DEBUG: Filtered offsite request to 'www.co.washington.or.us': <GET http://www.co.washington.or.us/HHS/EnvironmentalHealth/FoodSafety/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:50:00 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Benton/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Benton/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:50:00 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://healthspace.com/Clients/Oregon/Benton/web.nsf/module_facilities.xsp?module=Food> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
2021-03-16 15:50:15 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Baker/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Baker/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:50:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/HoodRiver/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/HoodRiver/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:50:34 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 15:50:40 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Harney/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Harney/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:50:54 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Grant/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Grant/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:51:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Douglas/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Douglas/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:51:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Deschutes/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Deschutes/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:51:31 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Curry/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Curry/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:51:34 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 15:51:47 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Crook/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Crook/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:51:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Coos/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Coos/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:52:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Columbia/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Columbia/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:52:17 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Clatsop/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Clatsop/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:52:28 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Clackamas/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Clackamas/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:52:34 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 15:52:44 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/NorthCentral/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/NorthCentral/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:52:56 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Multnomah/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Multnomah/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:53:08 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://healthspace.com/Clients/Oregon/Morrow/web.nsf/module_facilities.xsp?module=Food> (referer: https://healthspace.com/clients/oregon/state/statewebportal.nsf/module_healthRegions.xsp?showview=region)
2021-03-16 15:53:08 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://healthspace.com/Clients/Oregon/Morrow/web.nsf/module_facilities.xsp?module=Food>: HTTP status code is not handled or not allowed
2021-03-16 15:53:21 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Yamhill/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Yamhill/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:53:33 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Wheeler/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Wheeler/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:53:34 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 15:53:47 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Wallowa/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Wallowa/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:54:01 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Union/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Union/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:54:13 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Umatilla/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Umatilla/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:54:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Tillamook/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Tillamook/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:54:34 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 15:54:38 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Polk/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Polk/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:54:50 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Marion/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Marion/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:54:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Malheur/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Malheur/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:55:11 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Linn/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Linn/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:55:25 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Lincoln/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Lincoln/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:55:34 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 15:55:36 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Lane/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Lane/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:55:52 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Lake/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Lake/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:56:03 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Klamath/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Klamath/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:56:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Josephine/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Josephine/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:56:34 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 15:56:35 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Jefferson/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Jefferson/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:56:46 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://healthspace.com/Clients/Oregon/Jackson/web.nsf/module_facilities.xsp?module=Food> from <GET https://healthspace.com/Clients/Oregon/Jackson/web.nsf/module_facilities.xsp?module=Food>
2021-03-16 15:56:46 [scrapy.core.engine] INFO: Closing spider (finished)
2021-03-16 15:56:46 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 13574,
'downloader/request_count': 35,
'downloader/request_method_count/GET': 35,
'downloader/response_bytes': 17678,
'downloader/response_count': 35,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/302': 33,
'downloader/response_status_count/404': 1,
'dupefilter/filtered': 32,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 3, 16, 20, 56, 46, 437621),
'log_count/DEBUG': 38,
'log_count/INFO': 15,
'log_count/WARNING': 2,
'offsite/domains': 1,
'offsite/filtered': 1,
'request_depth_max': 1,
'response_received_count': 2,
'scheduler/dequeued': 35,
'scheduler/dequeued/disk': 35,
'scheduler/enqueued': 35,
'scheduler/enqueued/disk': 35,
'start_time': datetime.datetime(2021, 3, 16, 20, 49, 34, 336531)}
2021-03-16 15:56:46 [scrapy.core.engine] INFO: Spider closed (finished)
(open-health-inspection-scraper) bash-3.2$ ./scrapeHealthData.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment