Skip to content

Instantly share code, notes, and snippets.

@jluczak
Created July 31, 2017 19:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jluczak/b333d26bf2a2a2e2f2892182fbdb28c0 to your computer and use it in GitHub Desktop.
Save jluczak/b333d26bf2a2a2e2f2892182fbdb28c0 to your computer and use it in GitHub Desktop.
Write a spider in Scrapy that parses all Romance books and extracts title, price and rating from them; it should output JSON with 35 elements
import scrapy
class QuotesSpider(scrapy.Spider):
name = "books"
start_urls = [
'http://books.toscrape.com/catalogue/category/books/romance_8/index.html',
]
def parse(self, response):
for quote in response.css('ol.row'):
yield {
'title': quote.css('a[href]::attr(title)').extract(),
'price': quote.css('p.price_color::text').extract(),
'rating': quote.css('p[class*=star-rating]::attr(class)').extract(),
}
next_page = response.css('li.next a::attr(href)').extract_first()
if next_page is not None:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment