Skip to content

Instantly share code, notes, and snippets.

@dvmonroe
Created October 26, 2015 04:47
Show Gist options
  • Save dvmonroe/696a3aa1c50d3692b005 to your computer and use it in GitHub Desktop.
Save dvmonroe/696a3aa1c50d3692b005 to your computer and use it in GitHub Desktop.
Chicago Upcoming Events web crawler
require 'simple-rss'
require 'open-uri'
require 'pry'
require 'mechanize'
require 'elasticsearch'
client = Elasticsearch::Client.new log: true
mechanize = Mechanize.new
rss = SimpleRSS.parse open('http://www.choosechicago.com/includes/cfcs/syndication/RSS/rssManager.cfc?method=showFeed&feedType=events&e_catID=0&regionid=0&e_sdate=10-20-2015&e_edate=10-26-2015')
rss.items.each do | item |
# could get proxy list
# mechanize.set_proxy '78.186.178.153', 8080
page = mechanize.get(item.link)
new_string = page.at('.detail').text.strip.split("settings")[1].split(" =")
event_object = JSON.parse(new_string.last.gsub(';', ''))
symbolized = event_object['markers'][0].inject({}){|pair,(k,v)| pair[k.to_sym] = v; pair}
final_hash = item.merge(symbolized)
if prices = page.search('.rightPrice')
final_hash[:prices] = []
prices.each do | price |
price_array = price.text.strip.gsub("\r\n\t\t\t\t", " ").split(" ")
price_hash = {}
price_hash[price_array.first.downcase.to_sym] = price_array.last
final_hash[:prices].push(price_hash)
end
end
final_hash[:zip] = final_hash[:zip].to_s
client.create index: 'chicago', type: 'choose-chicago', body: final_hash
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment