Skip to content

Instantly share code, notes, and snippets.

@civic
Created November 14, 2017 02:24
Show Gist options
  • Save civic/16c4c8e0f4507d1ea53809d7c7f32177 to your computer and use it in GitHub Desktop.
Save civic/16c4c8e0f4507d1ea53809d7c7f32177 to your computer and use it in GitHub Desktop.
はてなブックマークをrequests,BeatifulSoupでスクレイピング
from bs4 import BeautifulSoup
import requests
import time
url = 'http://b.hatena.ne.jp/search/text?safe=on&q=Python&users=50'
for n in range(3): # 3ページで中断
res = requests.get(url)
soup = BeautifulSoup(res.content, features='lxml')
if len(soup.select('.pager-next')) > 0:
posts = soup.select('.search-result')
for post in posts:
title = post.select_one('h3').text.strip()
date = post.select_one('.created').text.strip()
bookmarks = post.select_one('.users span').text.strip()
print(title, date, bookmarks)
btn_qs = soup.select_one('a.pager-next').attrs['href']
url = 'http://b.hatena.ne.jp/search/text' + btn_qs
time.sleep(3)
else:
break
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment