Skip to content

Instantly share code, notes, and snippets.

@brianleroux
Created May 9, 2009 07:40
Show Gist options
  • Save brianleroux/109175 to your computer and use it in GitHub Desktop.
Save brianleroux/109175 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'open-uri'
require 'hpricot'
require 'dm-core'
require 'dm-timestamps'
class Page
include DataMapper::Resource
property :id, Serial
property :url, String, :nullable=>false, :lazy=>false
property :html, Text, :nullable=>false, :lazy=>false
timestamps :at
end
class WikiTravelOffline
def initialize
DataMapper.setup(:default, "sqlite3://#{Dir.pwd}/wikitravel.sqlite3")
DataMapper.auto_upgrade!
pages.each do |page|
Page.create(:url=>page, :html=>Hpricot(open(page)))
end
end
def pages
base_uri = 'http://wikitravel.org'
all_pages_listing = "#{ base_uri }/en/Special:Allpages"
root_urls = []
final_urls = []
Hpricot(open(all_pages_listing)).search("//table[@class='allpageslist']/tr/td/a").each do |element|
root_urls << "#{ base_uri }#{ element.attributes['href'] }"
end
root_urls.uniq.each do |url|
Hpricot(open(url)).search("//div[@class='allpagesredirect']/a").each do |element|
final_urls << "#{ base_uri }#{ element.attributes['href'] }"
end
end
final_urls
end
end
WikiTravelOffline.new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment