Skip to content

Instantly share code, notes, and snippets.

@kujenga
Created June 19, 2014 04:20
Show Gist options
  • Save kujenga/a8d4a2c01883f04deeca to your computer and use it in GitHub Desktop.
Save kujenga/a8d4a2c01883f04deeca to your computer and use it in GitHub Desktop.
Middlebury RSS feed scraper
require 'nokogiri'
require 'feedjira'
#url = ARGV[0]
url = "http://25livepub.collegenet.com/calendars/all-campus-events.rss"
feed = Feedjira::Feed.fetch_and_parse url
feed.entries.each { |entry|
entry_h = {}
entry_h[:title] = entry.title
entry_h[:url] = entry.url
html = Nokogiri::HTML(entry.summary)
entry_h[:summary] = {}
html.xpath("//b").each { |t|
val = t.next.text.match(/[[:alnum:]]/) ? t.next : t.next.next
entry_h[:summary][t.text] = val.text
}
puts ''
puts entry_h
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment