Skip to content

Instantly share code, notes, and snippets.

@benekastah
Created February 17, 2020 18:51
Show Gist options
  • Save benekastah/6ae76a1db77d4f8fa2ca5fd32cb6e476 to your computer and use it in GitHub Desktop.
Save benekastah/6ae76a1db77d4f8fa2ca5fd32cb6e476 to your computer and use it in GitHub Desktop.
Basic web scraping using BeautifulSoup
from bs4 import BeautifulSoup
import requests
def scrape_page(url):
r = requests.get("http://" + url)
data = r.text
soup = BeautifulSoup(data)
for link in soup.find_all('a'):
print(link.get('href'))
# do the next line for pages you want to scrape:
# scrape_page(lint.get('href'))
# DON'T DO THIS FOR ALL LINKS. it could take you anywhere in the internet.
# Check if the url is a page you are interested in first.
scrape_page("http://example.com")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment