Skip to content

Instantly share code, notes, and snippets.

@ajitmp
Created July 27, 2024 10:19
Show Gist options
  • Save ajitmp/5c4e4c4460fd28377e85540224803e53 to your computer and use it in GitHub Desktop.
Save ajitmp/5c4e4c4460fd28377e85540224803e53 to your computer and use it in GitHub Desktop.
A basic spider for scrapy project to scrape quotes from https://quotes.toscrape.com/
import scrapy
class QuotesSpider(scrapy.Spider):
name = "Quotes"
allowed_domains = ["quotes.toscrape.com"]
start_urls = ["https://quotes.toscrape.com/"]
def parse(self, response):
#each quote is within <div class="quote" ...>
quotes = response.css("div.quote")
for quote in quotes:
#each quote text is within <span class="text" ...>
title=quote.css("span.text::text").get()
#each author info is within <small class="author" ...>
author =quote.css("small.author::text").get()
yield{
'title':title,
'author':author
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment