Skip to content

Instantly share code, notes, and snippets.

@nebuta
Created November 8, 2011 07:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nebuta/1347236 to your computer and use it in GitHub Desktop.
Save nebuta/1347236 to your computer and use it in GitHub Desktop.
Get aozora bunko
#aozoraget.rb
require 'rubygems'
require 'hpricot'
require 'open-uri'
for i in 1..13
toc = "http://www.aozora.gr.jp/index_pages/sakuhin_a#{i}.html"
puts "Opening: " + toc
html = IO.read(toc)
html.scan(/<a href="\.\.\/(cards\/.+?)"/){|str|
card = "http://www.aozora.gr.jp/" + str[0]
IO.read(card).scan(/<a href="\.(\/files\/.+?\.html)"/){|s|
card =~ /(.+)\/.+?\.html/
content_url = $1 + s[0]
content_url =~ /.+\/(.+?\.html)/
outfile = $1
open("./database/"+outfile,'w'){|out|
puts outfile
out.write IO.read(content_url)
}
break
}
}
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment