Skip to content

Instantly share code, notes, and snippets.

@paperlefthand
Created May 27, 2018 03:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save paperlefthand/887ea3f2de5c6682d7dc5de290488488 to your computer and use it in GitHub Desktop.
Save paperlefthand/887ea3f2de5c6682d7dc5de290488488 to your computer and use it in GitHub Desktop.
Wikipediaから日本の男優名と女優名の一覧を取得
import requests
from bs4 import BeautifulSoup
import csv
import time
import lxml
base_url = 'https://en.wikipedia.org/wiki/'
items = ["actors", "actresses"]
for i in items:
target_url = base_url + "List_of_Japanese_" + i
target_html = requests.get(target_url).text
soup = BeautifulSoup(target_html, 'lxml')
# print(target_url)
names = soup.select('div.mw-parser-output > h2 + ul > li > a')
# To CSV
acts = []
print(' getting names of %s ...' % i)
for name in names:
acts.append(name.get_text())
# print(name.string)
time.sleep(1)
# To File
with open('%s.csv' % i, 'w') as f:
writer = csv.writer(f, lineterminator='\n')
writer.writerow(['name'])
# 最終行は"See also"項目の"List of Japanese ..."なので取り除く
for name in acts[:-1]:
writer.writerow([name])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment