Skip to content

Instantly share code, notes, and snippets.

@BetterProgramming
Created May 14, 2019 12:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save BetterProgramming/56ed0808b80f8fd2d7d3ef658c11a0e0 to your computer and use it in GitHub Desktop.
Save BetterProgramming/56ed0808b80f8fd2d7d3ef658c11a0e0 to your computer and use it in GitHub Desktop.
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class ElectronicsSpider(CrawlSpider):
name = "electronics"
allowed_domains = ["www.olx.com.pk"]
start_urls = [
'https://www.olx.com.pk/computers-accessories/',
'https://www.olx.com.pk/tv-video-audio/',
'https://www.olx.com.pk/games-entertainment/'
]
rules = (
Rule(LinkExtractor(allow=(), restrict_css=('.pageNextPrev',)),
callback="parse_item",
follow=True),)
def parse_item(self, response):
print('Processing..' + response.url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment