Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save durgaswaroop/ac68526242401fafbbad6201498ff05e to your computer and use it in GitHub Desktop.
Save durgaswaroop/ac68526242401fafbbad6201498ff05e to your computer and use it in GitHub Desktop.
Webscraping with python | Introduction
from bs4 import BeautifulSoup as bs
import urllib.request as ureq
# Website url
freblogg_url = 'http://freblogg.com'
# Fetch the website
website = ureq.urlopen(freblogg_url).read()
# Parse the html of the site with soup
soup = bs(website, "html.parser")
# Get all the headers
headers = soup.find_all('h2', {'class':'post-title entry-title'})
# Extract title text from each header
titles = list(map(lambda h: h.text.strip(), headers))
# Extract the url and the title of each article
titles_and_links = dict(map(lambda h: (h.text.strip(), h.find('a')['href']), headers))
print(titles_and_links)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment