Skip to content

Instantly share code, notes, and snippets.

@mprat
Last active April 14, 2020 10:57
Show Gist options
  • Save mprat/df2969142a75b668456c to your computer and use it in GitHub Desktop.
Save mprat/df2969142a75b668456c to your computer and use it in GitHub Desktop.
# import the requests Python library for programmatically making HTTP requests
# after installing it according to these instructions:
# http://docs.python-requests.org/en/latest/user/install/#install
import requests
# import the BeautifulSoup Python library according to these instructions:
# http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-beautiful-soup
# use this syntax as described on the documentation page:
# http://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
from bs4 import BeautifulSoup
# the URL of the NY Times website we want to parse
base_url = 'http://www.nytimes.com'
# the syntax (according to the documentation) for how to
# "load" a webpage through Python
r = requests.get(base_url)
# how to decode the text of the HTML of the NY Times homepage
# website. r comes from the requests request above
soup = BeautifulSoup(r.text)
# find and loop through all elements on the page with the
# class name "story-heading"
for story_heading in soup.find_all(class_="story-heading"):
# for the story headings that are links, print out the text
# and format it nicely
# for the others, take the contents out and format it nicely
if story_heading.a:
print(story_heading.a.text.replace("\n", " ").strip())
else:
print(story_heading.contents[0].strip())
import requests
from bs4 import BeautifulSoup
base_url = 'http://www.nytimes.com'
r = requests.get(base_url)
soup = BeautifulSoup(r.text)
for story_heading in soup.find_all(class_="story-heading"):
if story_heading.a:
print(story_heading.a.text.replace("\n", " ").strip())
else:
print(story_heading.contents[0].strip())
@kristaps-m
Copy link

kristaps-m commented Apr 14, 2020

In dd.mm.yyyy 14.04.2020 i get and error and do not know hot to fix it!

TEST.py:21: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 21 of the file TEST.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

soup = BeautifulSoup(r.text)

Python 3.8.1 // Text Editor: Geany

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment