Skip to content

Instantly share code, notes, and snippets.

@bdewilde
Last active December 15, 2015 22:29
Show Gist options
  • Save bdewilde/5333045 to your computer and use it in GitHub Desktop.
Save bdewilde/5333045 to your computer and use it in GitHub Desktop.
bare-bones example for scraping article text from a website
import bs4
import requests
# GET html from NYT server, and parse it
response = requests.get('http://www.nytimes.com/2013/04/07/opinion/sunday/friedman-weve-wasted-our-timeout.html')
soup = bs4.BeautifulSoup(response.text)
article = ''
# select all tags containing article text, then extract the text from each
paragraphs = soup.find_all('p', itemprop='articleBody')
for paragraph in paragraphs:
article += paragraph.get_text()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment