Skip to content

Instantly share code, notes, and snippets.

@redapple
Created July 26, 2016 09:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save redapple/78275cffad34219ae6fe07383ba36802 to your computer and use it in GitHub Desktop.
Save redapple/78275cffad34219ae6fe07383ba36802 to your computer and use it in GitHub Desktop.
StackOverflow #38577374
$ scrapy runspider sitemapspider.py
2016-07-26 10:41:29 [scrapy] INFO: Scrapy 1.1.0 started (bot: scrapybot)
2016-07-26 10:41:29 [scrapy] INFO: Overridden settings: {}
2016-07-26 10:41:32 [scrapy] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.logstats.LogStats']
2016-07-26 10:41:34 [scrapy] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.chunked.ChunkedTransferMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2016-07-26 10:41:34 [scrapy] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2016-07-26 10:41:34 [scrapy] INFO: Enabled item pipelines:
[]
2016-07-26 10:41:34 [scrapy] INFO: Spider opened
2016-07-26 10:41:35 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-07-26 10:41:36 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/robots.txt> (referer: None)
2016-07-26 10:41:37 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/sitemap.aspx> (referer: http://www.officerstore.com/robots.txt)
2016-07-26 10:41:37 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/default.aspx> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:37 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/default.aspx>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/94/Careers/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/94/Careers/>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/93/Our-Return-Policy/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/111/Contact-a-Salesman-MA/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/93/Our-Return-Policy/>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/109/Contact-a-Salesperson-PA/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/103/Account-Type/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/111/Contact-a-Salesman-MA/>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/92/Shipping-Costs-and-Terms/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/109/Contact-a-Salesperson-PA/>
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/103/Account-Type/>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/91/Privacy-Policy/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/92/Shipping-Costs-and-Terms/>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/89/Request-Bulk-Pricing/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/91/Privacy-Policy/>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/88/Testimonials/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/90/Terms-and-Conditions/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/89/Request-Bulk-Pricing/>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/87/Contact/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/88/Testimonials/>
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/90/Terms-and-Conditions/>
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/87/Contact/>
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/110/Contact-a-Salesman-ME/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/store/product.aspx/productId/13535/Port-Authority-Competitor-Jacket/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/108/Contact-a-Salesperson/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/page.aspx/contentId/95/Customer-Service/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/store/product.aspx/productId/17845/Gould-Goodrich-Duty-Leather-Speedloader-Case/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:38 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/110/Contact-a-Salesman-ME/>
2016-07-26 10:41:39 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/store/product.aspx/productId/13535/Port-Authority-Competitor-Jacket/>
2016-07-26 10:41:39 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/108/Contact-a-Salesperson/>
2016-07-26 10:41:39 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/page.aspx/contentId/95/Customer-Service/>
2016-07-26 10:41:39 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/store/product.aspx/productId/17845/Gould-Goodrich-Duty-Leather-Speedloader-Case/>
2016-07-26 10:41:39 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/store/product.aspx/productId/17463/BlackHawk-Duty-Gear-Molded-Belt-Keepers/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:39 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/store/product.aspx/productId/28327/Defense-Technology-First-Defense-360-Inert-MK-4-Stream-OC-Aerosol/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:39 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/store/product.aspx/productId/18278/Blue-Force-Gear-Helium-Whisper-Single-Frag-Grenade-Pouch-with-Flap/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:39 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/store/product.aspx/productId/17463/BlackHawk-Duty-Gear-Molded-Belt-Keepers/>
2016-07-26 10:41:39 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/store/product.aspx/productId/28327/Defense-Technology-First-Defense-360-Inert-MK-4-Stream-OC-Aerosol/>
2016-07-26 10:41:39 [SiteMap] DEBUG: In Loop: processing <200 http://www.officerstore.com/store/product.aspx/productId/18278/Blue-Force-Gear-Helium-Whisper-Single-Frag-Grenade-Pouch-with-Flap/>
2016-07-26 10:41:39 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/store/product.aspx/productId/13793/Streamlight-UltraStinger-Xenon-Halogen-Replacement-Bulb/> (referer: http://www.officerstore.com/sitemap.aspx)
2016-07-26 10:41:39 [scrapy] DEBUG: Crawled (200) <GET http://www.officerstore.com/store/product.aspx/productId/24352/Zico-4095-Hi-Tensile-Center-Cut-Bolt-Cutter-Head-Only-Red/> (referer: http://www.officerstore.com/sitemap.aspx)
...
#!/usr/bin/python
from scrapy.spiders import SitemapSpider
class MySpider(SitemapSpider):
name="SiteMap"
sitemap_urls = ['http://www.officerstore.com/robots.txt']
def parse (self, response):
self.logger.debug("In Loop: processing %r" % response)
if __name__ == "__main__":
MySpider()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment