Skip to content

Instantly share code, notes, and snippets.

@ingyunson
Created November 3, 2018 15:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ingyunson/afb48eaee70d2f6564619389975adb24 to your computer and use it in GitHub Desktop.
Save ingyunson/afb48eaee70d2f6564619389975adb24 to your computer and use it in GitHub Desktop.
Yahoo ニューズ·クローラー
import requests
from bs4 import BeautifulSoup as bs
url = 'https://news.yahoo.co.jp/'
news = requests.get(url)
html = news.text
soup = bs(html, 'lxml')
headline_title = []
headline_url = []
news_head = soup.select('#epTabTop > ul.topics > li.topTpi > div > h1 > a')
headline_title.append(news_head[0].text.replace("写真",""))
headline_url.append(news_head[0].get('href'))
for i in range(2,9):
news_info = soup.select('#epTabTop > ul.topics > li:nth-of-type({0}) > div > p > a'.format(i))
headline_title.append(news_info[0].text.replace("写真",""))
headline_url.append(news_info[0].get('href'))
print(headline_title, headline_url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment