Skip to content

Instantly share code, notes, and snippets.

@michaelfward
Created March 23, 2015 21:35
Show Gist options
  • Save michaelfward/09c4ed30c1a615e84cc6 to your computer and use it in GitHub Desktop.
Save michaelfward/09c4ed30c1a615e84cc6 to your computer and use it in GitHub Desktop.
Yelp Review Scraper
#note: must provide raw HTML file as argument. I just used wget on the business' URL
#yelp data scraper
#programmed by michael ward
#programmed for tater & joes cafe
#h3xc0ntr0l@gmail.com
reg = /.*<p itemprop="description".*>(.*)<\/p>/
user = /.*<meta itemprop="author" content="(.*)">.*/
exit unless ARGV.length == 1
fd = File.open(ARGV[0], "r");
fd_out = File.open("output", "a+")
fd.each_line do |line|
unless (user =~ line) != 0
data = Regexp.last_match
fd_out.write("#{data[1]}: ")
end
unless (reg =~ line) != 0
found +=1
d = Regexp.last_match
fd_out.write("#{d[1]}\n\n")
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment