Skip to content

Instantly share code, notes, and snippets.

@LittleYenMin
Created July 13, 2019 14:56
Show Gist options
  • Save LittleYenMin/53a46af982b470b3c689337c203cd3a2 to your computer and use it in GitHub Desktop.
Save LittleYenMin/53a46af982b470b3c689337c203cd3a2 to your computer and use it in GitHub Desktop.
Scrapy爬蟲第五章的Parse code
def parse(self, response):
for block in response.xpath('//ul[@id="newslistul"]//li'):
href = block.xpath('.//a[contains(@class, "tit")]/@href').extract_first()
print(href)
# 跳頁
a_next = response.xpath('//a[contains(@class, "p_next")]/@href').extract_first()
if a_next: yield response.follow(a_next, callback=self.parse)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment