Last active
March 2, 2018 19:22
-
-
Save pvanallen/5b9dc8de738ef9e1c278f6fff596cce6 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from bs4 import BeautifulSoup | |
import requests | |
# get the webpage | |
r = requests.get("http://www.nytimes.com") | |
# get the HTML source from that page | |
html_doc = r.text | |
# turn the source into a bs4 "soup" object | |
soup = BeautifulSoup(html_doc, 'lxml') | |
# narrow down to the div on the page that contains our content | |
section = soup.find("div", class_="a-column") | |
# get the first h2, and the link text within that h2 | |
firstHeading = (section.h2.a).get_text() | |
# remove the line breaks | |
firstHeading = firstHeading.replace('\n',' ') | |
# display the final text | |
print(firstHeading) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment