Skip to content

Instantly share code, notes, and snippets.

@ksoda
Created February 1, 2018 14:03
Show Gist options
  • Save ksoda/76d2524fc760aee5df9ed73d2cbb13d2 to your computer and use it in GitHub Desktop.
Save ksoda/76d2524fc760aee5df9ed73d2cbb13d2 to your computer and use it in GitHub Desktop.
Web scraping sample 青空文庫
require 'rubygems'
require 'bundler'
Bundler.require
require 'open-uri'
aozora_uri = 'http://www.aozora.gr.jp/cards/000148/files/789_14547.html'
file = 'sample.txt'
unless File.exist?(file)
html = open(aozora_uri, 'r:Shift_JIS').read.encode('utf-8', universal_newline: true)
text = Oga.parse_html(html).css('.main_text').text.gsub(/(.*?)/, '')
File.write(file, text)
puts "#{file} created"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment