Skip to content

Instantly share code, notes, and snippets.

@pvanallen
Last active March 2, 2018 19:22
Show Gist options
  • Save pvanallen/5b9dc8de738ef9e1c278f6fff596cce6 to your computer and use it in GitHub Desktop.
Save pvanallen/5b9dc8de738ef9e1c278f6fff596cce6 to your computer and use it in GitHub Desktop.
from bs4 import BeautifulSoup
import requests
# get the webpage
r = requests.get("http://www.nytimes.com")
# get the HTML source from that page
html_doc = r.text
# turn the source into a bs4 "soup" object
soup = BeautifulSoup(html_doc, 'lxml')
# narrow down to the div on the page that contains our content
section = soup.find("div", class_="a-column")
# get the first h2, and the link text within that h2
firstHeading = (section.h2.a).get_text()
# remove the line breaks
firstHeading = firstHeading.replace('\n',' ')
# display the final text
print(firstHeading)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment