Skip to content

Instantly share code, notes, and snippets.

@LittleYenMin
Last active July 13, 2019 15:16
Show Gist options
  • Save LittleYenMin/d8238258bf8c43417a1d84668b3df5e0 to your computer and use it in GitHub Desktop.
Save LittleYenMin/d8238258bf8c43417a1d84668b3df5e0 to your computer and use it in GitHub Desktop.
Scrapy第六章改寫後的parse程式
def parse(self, response):
for block in response.xpath('//ul[@id="newslistul"]//li'):
href = block.xpath('.//a[contains(@class, "tit")]/@href').extract_first()
# 爬取新聞正文內容
yield response.follow(url=href, callback=self.parse_content)
a_next = response.xpath('//a[contains(@class, "p_next")]/@href').extract_first()
if a_next:
# 爬下一頁
yield response.follow(a_next, callback=self.parse)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment