Skip to content

Instantly share code, notes, and snippets.

@kkosuge
Created December 29, 2015 04:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kkosuge/ace91fb4f4e9fe30c013 to your computer and use it in GitHub Desktop.
Save kkosuge/ace91fb4f4e9fe30c013 to your computer and use it in GitHub Desktop.
取ってきた HTML のエンコーディング化かさないようにするやつ
require 'open-uri'
require 'nkf'
require 'nokogiri'
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.10586'
html = open(url, 'User-Agent' => user_agent).read
unless html.encoding.name == 'UTF-8'
html.encode!('UTF-8', NKF.guess(html).name, invalid: :replace, undef: :replace)
end
doc = Nokogiri::HTML(html)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment