Skip to content

Instantly share code, notes, and snippets.

@decause
Created March 22, 2012 01:33
Show Gist options
  • Save decause/2155040 to your computer and use it in GitHub Desktop.
Save decause/2155040 to your computer and use it in GitHub Desktop.
Silly Scraper
import urllib2
import re
from BeautifulSoup import BeautifulSoup
"""
User inputs URL
User inputs email
User inputs keyword
User provides Frequency?
Scrape URL, all of it
Find keywords
save v1
When freq = x
Scrape URL
Find keywords
save v2
diff v1 v2
send message with the diff
"""
# User inputs URL
page = urllib2.urlopen("http://labor.ny.gov/app/warn/")
# Scrape URL, all of it
soup = BeautifulSoup(page)
# Find keywords
warns_nyc = soup.findAll(text=re.compile("New York City"))
for warn in warns_nyc:
print warn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment