Skip to content

Instantly share code, notes, and snippets.

@jangsoopark
Created November 28, 2017 05:54
Show Gist options
  • Save jangsoopark/9ff709cac84c4b338a90c1a4c600a6ed to your computer and use it in GitHub Desktop.
Save jangsoopark/9ff709cac84c4b338a90c1a4c600a6ed to your computer and use it in GitHub Desktop.
from bs4 import BeautifulSoup
import requests
url = 'http://finance.naver.com/news/news_read.nhn?article_id=0003970730&office_id=008&mode=search&query=%BB%EF%BC%BA&page=1'
response = requests.get(url)
web_data = response.content
soup = BeautifulSoup(web_data, 'html5lib')
article_info = soup.find('div', {'class': 'article_info'})
article_title = article_info.find('h3').text.strip()
article_date = article_info.find('span', {'class': 'article_date'}).text.strip()
article_content = soup.find('div', {'class': 'articleCont'})
for element in article_content.find_all('div', {'class': 'link_news'}):
element.decompose()
for element in article_content.find_all('a'):
element.decompose()
article_content = article_content.text.strip()
print(article_title)
print(article_date)
print(article_content)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment