Skip to content

Instantly share code, notes, and snippets.

@eparikh
Created February 19, 2017 14:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save eparikh/a188fa71f279bc04b14175e671512079 to your computer and use it in GitHub Desktop.
Save eparikh/a188fa71f279bc04b14175e671512079 to your computer and use it in GitHub Desktop.
A sample of my Wikipedia TV show scraper
class TVSpider(Spider):
name = "tv_spider"
allowed_urls = ["en.wikipedia.org/"]
start_urls = ["https://en.wikipedia.org/wiki/List_of_American_television_series"]
def parse(self, response):
# TV show URLs scraped from the list of TV shows
urls = response.xpath("//i//a/@href").extract()
# follow URL to get show details
for url in urls:
yield (scrapy.Request(response.urljoin(url),
callback=self.parse_tv_show)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment