Skip to content

Instantly share code, notes, and snippets.

@eparikh

eparikh/scrape_wiki.py Secret

Created Feb 19, 2017
Embed
What would you like to do?
A sample of my Wikipedia TV show scraper
class TVSpider(Spider):
name = "tv_spider"
allowed_urls = ["en.wikipedia.org/"]
start_urls = ["https://en.wikipedia.org/wiki/List_of_American_television_series"]
def parse(self, response):
# TV show URLs scraped from the list of TV shows
urls = response.xpath("//i//a/@href").extract()
# follow URL to get show details
for url in urls:
yield (scrapy.Request(response.urljoin(url),
callback=self.parse_tv_show)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.