Skip to content

Instantly share code, notes, and snippets.

@cargan
Last active September 4, 2018 13:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cargan/53ac8288b03984b32a19e840fffac452 to your computer and use it in GitHub Desktop.
Save cargan/53ac8288b03984b32a19e840fffac452 to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
# 1. create virtual environment: virtualenv venv
# 2. install scrapy: ./venv/bin/pip install scrapy
# 3. run spider: ./venv/bin/scrapy runspider spider.py -o saras_delfi.json
import scrapy
class SarasDelfiSpider(scrapy.Spider):
name = "saras"
start_urls = [
'https://www.delfi.lt/temos/sarunas-jasikevicius',
]
def parse(self, response):
for quote in response.css('div.headline'):
yield {
'title': quote.css('h3.headline-title').xpath('a/text()').extract_first(),
'excerpt': quote.css('p.headline-lead').xpath('text()').extract_first()
}
next_page = response.css('a.next:not([class^="next hidden"])::attr("href")').extract_first()
if next_page is not None:
yield response.follow(next_page, self.parse)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment