Skip to content

Instantly share code, notes, and snippets.

@jenya
Created July 26, 2017 16:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jenya/4e5277d74f435699d3ce2aea0b40f3ce to your computer and use it in GitHub Desktop.
Save jenya/4e5277d74f435699d3ce2aea0b40f3ce to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
from scrapy.spiders import SitemapSpider
class HbiczSpider(SitemapSpider):
name = 'hbicz'
sitemap_urls = ['http://hbi.cz/sitemap.xml']
sitemap_rules = [(r"http://www\.hbi\.cz/en/firmy/[^./]+-C\d+.html", 'parse')]
sitemap_follow = [r"http://www\.hbi\.cz/CoSiteMap_C184.xml"]
sitemap_alternate_links = True
def parse(self, response):
pass
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment