Skip to content

Instantly share code, notes, and snippets.

@stummjr
Last active June 6, 2018 16:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save stummjr/d2d6b561334051214238cb705ced9f92 to your computer and use it in GitHub Desktop.
Save stummjr/d2d6b561334051214238cb705ced9f92 to your computer and use it in GitHub Desktop.
Scrapy + Splash example
import scrapy
# this example needs the scrapyjs package: pip install scrapyjs
# it also needs a splash instance running in your env or on Scrapy Cloud (https://github.com/scrapinghub/splash)
class SplashSpider(scrapy.Spider):
name = 'splash-spider'
download_delay = 3
def start_requests(self):
yield scrapy.Request(
'http://quotes.toscrape.com/js', self.parse,
meta={
'splash': {
'endpoint': 'render.html',
}
}
)
def parse(self, response):
print response.body
for quote in response.css('.quote'):
yield {
'text': quote.css('span::text').extract_first(),
'author': quote.css('small::text').extract_first(),
'tags': quote.css('.tags a::text').extract(),
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment