Skip to content

Instantly share code, notes, and snippets.

@tekei
Created August 22, 2012 20:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tekei/3429134 to your computer and use it in GitHub Desktop.
Save tekei/3429134 to your computer and use it in GitHub Desktop.
HTML 文字実体参照への変換
# 数値文字参照ではなく、文字実体参照への変換
# なお、実際使う際はメソッド名"charref"は変更してください。
CGI.class_eval do
class << self
attr_accessor :char_ref
alias :orig_unescapeHTML :unescapeHTML
def unescapeHTML(str)
result = orig_unescapeHTML(str)
result.gsub!(/&\w*;/) { |ent| @char_ref[ent]}
result
end
end
end
String.class_eval do
class << self
attr_accessor :char_ref
end
def charref
result = ""
self.each_codepoint do |c|
if String.char_ref.include? c then
result << "&#{String.char_ref[c]};"
elsif ((c >= 0x00 && c <= 0x1F) || (c == 0x7F)) then
result << "&##{c.to_i};"
else
result << c
end
end
result
end
end
String.char_ref = {}
CGI.char_ref = {}
IO.foreach("#{File.dirname(__FILE__)}/html_charref") do |line|
s = line.split(' ')
String.char_ref[s[0].to_i] = s[1]
CGI.char_ref["&#{s[1]};"] = ("%c" % s[0].to_i)
end
@tekei
Copy link
Author

tekei commented Sep 1, 2012

HTML 4の文字実体参照も必要になったため、修正
gist: 3561627 のテーブルファイルを利用

@tekei
Copy link
Author

tekei commented Sep 19, 2012

CGI::unescapeHTMLでの文字実体参照にも対応

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment