Skip to content

Instantly share code, notes, and snippets.

@obafgkm44
Last active March 28, 2019 09:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save obafgkm44/a2614fabe8d7cd3842932560eea4f999 to your computer and use it in GitHub Desktop.
Save obafgkm44/a2614fabe8d7cd3842932560eea4f999 to your computer and use it in GitHub Desktop.
スクレイピングの練習 必要なタグを抽出する
import requests
import time
from bs4 import BeautifulSoup
response = requests.get('https://toiguru.jp/toeic-vocabulary-list#smoothplay1')
soup = BeautifulSoup(response.text,'lxml')
words = soup.findAll('td')
for word in words:
#不要なものを空白に置き換える。
word = str(words).replace('<td>', '').replace('</td>', '').replace('<br/>', ':')
f = open('english_words.txt', 'w',encoding='UTF-8')
f.write(word)
f.close()
print(word)
#スクレイピングマナー
time.sleep(1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment