Skip to content

Instantly share code, notes, and snippets.

@radiospiel
Created March 12, 2013 23:28
Show Gist options
  • Save radiospiel/5148046 to your computer and use it in GitHub Desktop.
Save radiospiel/5148046 to your computer and use it in GitHub Desktop.
class Nokogiri::HTML::Document
def meta_encoding
content_type = css("meta[http-equiv=content-type]").each do |meta|
break meta.attribute("content").value
end
return unless content_type
content_type.split("; ").each do |part|
next unless part =~ /^charset=(.*)/
return $1
end
nil
end
end
module Nokogiri::HTML
def self.with_meta_encoding(data)
doc = Nokogiri.HTML(data)
meta_encoding = doc.meta_encoding
return doc unless meta_encoding && doc.encoding != meta_encoding
# try to reread with meta_encoding
doc2 = Nokogiri.HTML(data, nil, meta_encoding)
return doc2 if doc2.encoding == meta_encoding
# rereading failed, return original document
doc
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment