Skip to content

Instantly share code, notes, and snippets.

@mikker
Created December 28, 2008 12:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikker/40453 to your computer and use it in GitHub Desktop.
Save mikker/40453 to your computer and use it in GitHub Desktop.
Fetch the Queen's new-years speeches into a SQLite3 database... WHY WOULDN'T YOU?
# Fetch the Queen's new-years speeches into a SQLite3 database... WHY WOULD YOU?
require "rubygems"
require "open-uri"
require "hpricot"
require "dm-core"
class Speech
include DataMapper::Resource
property :id, Serial
property :year, String
property :body, Text
end
DataMapper.setup(:default,
"sqlite3:///#{Dir.pwd}/data.db")
DataMapper.auto_migrate!
class String
def utfy(from='ISO-8859-1')
require "iconv"
Iconv.conv('utf-8', from, self)
end
end
index = Hpricot(open("http://kongehuset.dk/publish.php?dogtag=k_dk_aktuelt_taler"))
links = (index / "div#content a").select { |a| a.html =~ /tale/i }
links.each do |tale|
h = Hpricot(open(tale.get_attribute("href")))
year, body = [tale.html.utfy,
(h / "div#content").to_html.chop.utfy.gsub!(/(<[^>]*>)/s,"")]
Speech.new(:year => year, :body => body).save
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment