Skip to content

Instantly share code, notes, and snippets.

@todesking
Created July 2, 2010 15:11
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save todesking/461478 to your computer and use it in GitHub Desktop.
Save todesking/461478 to your computer and use it in GitHub Desktop.
「行旅死亡人データベース」のスクレイパ
require 'rubygems'
require 'uri'
require 'net/http'
require 'nokogiri'
class Corpse
def initialize(uri=URI.parse('http://theoria.s284.xrea.com/corpse/index.html'))
@uri=uri
@contents=nil
end
def document
if @document.nil?
@document=Nokogiri(Net::HTTP.get(@uri))
end
@document
end
def contents
if @contents.nil?
entries=document/"*[@class='content']"
@contents=entries.map{|e|parse_contents(e)}.flatten(1)
end
return @contents
end
def parse_contents(entry)
permalink=(entry/"*[@class='footer']//a").first['href']
entry_contents=[]
current_content={:description=>[],:data=>{},:permalink=>permalink}
entry.children.each{|node|
case node.name
when 'dl'
current_data_name=''
node.children.each{|dlnode|
case dlnode.name
when 'dt'
current_data_name=dlnode.inner_text.strip
current_content[:data][current_data_name]||=[]
when 'dd'
current_content[:data][current_data_name].push dlnode.inner_text.strip
end
}
when 'p'
current_content[:description].push node.inner_text
when 'hr'
entry_contents.push current_content
current_content={:description=>[],:data=>{},:permalink=>permalink}
end
}
entry_contents.push current_content unless current_content[:data].empty?
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment