Skip to content

Instantly share code, notes, and snippets.

@bjpcjp
Created December 27, 2015 17:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bjpcjp/b8a2f6204652f6452590 to your computer and use it in GitHub Desktop.
Save bjpcjp/b8a2f6204652f6452590 to your computer and use it in GitHub Desktop.
import scrapy
class StackOverflowSpider(scrapy.Spider):
name = 'stackoverflow'
start_urls = ['http://stackoverflow.com/questions?sort=votes']
def parse(self, response):
for href in response.css('.question-summary h3 a::attr(href)'):
full_url = response.urljoin(href.extract())
yield scrapy.Request(full_url, callback=self.parse_question)
def parse_question(self, response):
yield {
'title': response.css('h1 a::text').extract()[0],
'votes': response.css('.question .vote-count-post::text').extract()[0],
'body': response.css('.question .post-text').extract()[0],
'tags': response.css('.question .post-tag::text').extract(),
'link': response.url,
}
@bjpcjp
Copy link
Author

bjpcjp commented Dec 27, 2015

From the intro to Scrapy v1.0.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment