Skip to content

Instantly share code, notes, and snippets.

@norman
Last active September 9, 2021 20:48
Show Gist options
  • Star 24 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save norman/2577536 to your computer and use it in GitHub Desktop.
Save norman/2577536 to your computer and use it in GitHub Desktop.
HTML entities? We don't need no stinkin' HTML entities.
# coding: utf-8
#
# Encode any codepoint outside the ASCII printable range to an HTML character
# reference (https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_reference_overview).
def encode(string)
string.each_codepoint.inject("") do |buffer, cp|
cp = "&#x#{cp.to_s(16)};" unless cp >= 0x20 && cp <= 0x7E
buffer << cp
end
end
puts encode "Japan"
# => "Japan"
puts encode "日本"
# => "&#x65e5;&#x672c;"
puts encode "Japón"
# => "Jap&#xf3;n"
@norman
Copy link
Author

norman commented May 2, 2012

Note that this assumes the input string is Unicode. If you want to do this with Latin 1 or some other encoding, you'd have to recode it first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment