Skip to content

Instantly share code, notes, and snippets.

@LittleYenMin
Created July 8, 2019 14:15
Show Gist options
  • Save LittleYenMin/cd738b27619a398dbd9b9f36fbba7443 to your computer and use it in GitHub Desktop.
Save LittleYenMin/cd738b27619a398dbd9b9f36fbba7443 to your computer and use it in GitHub Desktop.
自由時報新聞爬蟲(純換頁)
import scrapy
class LtnSearchCrawler(scrapy.Spider):
name = 'ltn_search_page'
start_urls = ['https://news.ltn.com.tw/search/?keyword=反紅媒']
def parse(self, response):
for block in response.xpath('//ul[@id="newslistul"]//li'):
href = block.xpath('.//a[contains(@class, "tit")]/@href').extract_first()
print(href)
a_next = response.xpath('//a[contains(@class, "p_next")]/@href').extract_first()
if a_next:
yield response.follow(a_next, callback=self.parse)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment