Skip to content

Instantly share code, notes, and snippets.

@gnufs
Created March 31, 2012 21:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gnufs/2268656 to your computer and use it in GitHub Desktop.
Save gnufs/2268656 to your computer and use it in GitHub Desktop.
A rudimentary web scraper
import sys, requests
from bs4 import BeautifulSoup
def scrape(url='http://example.com'):
try:
html = requests.get(url).content
except:
print "URL doesn\'t load"
exit()
page = BeautifulSoup(html)
try:
bodytext = page.body.findAll(text=True)
for s in bodytext:
print s
except:
print "Page could not be read."
if len(sys.argv) != 2:
url = raw_input("Please enter a URL: ")
else:
url = sys.argv[1]
if url.lower().startswith('http'):
scrape(url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment