Skip to content

Instantly share code, notes, and snippets.

@premit
premit / imdb_details_page_spider.py
Created July 23, 2014 14:16
Scrapy reference: Crawling scraped links & next pagination
'''
Spider for IMDb
- Retrieve most popular movies & TV series with rating of 8.0 and above that have at least 5 award nominations
- Crawl next pages recursively
- Follow the details pages of scraped films to retrieve more information of each film
'''
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
@premit
premit / imdb_next_page_spider.py
Created July 23, 2014 14:14
Scrapy reference: Crawling next pagination
'''
Spider for IMDb
- Retrieve most popular movies & TV series with rating of 8.0 and above
- Crawl next pages recursively
'''
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector