Skip to content

Instantly share code, notes, and snippets.

@caleywoods
Forked from xijo/encoding_repairer.rb
Created April 20, 2021 20:40
Show Gist options
  • Save caleywoods/331edac8b9148672c662f89f09b39beb to your computer and use it in GitHub Desktop.
Save caleywoods/331edac8b9148672c662f89f09b39beb to your computer and use it in GitHub Desktop.
Repair utf-8 strings that contain iso-8599 encoded utf-8 characters
class EncodingRepairer
REPLACEMENTS = {
"€" => "€", "‚" => "‚", "„" => "„", "…" => "…", "ˆ" => "ˆ",
"‹" => "‹", "‘" => "‘", "’" => "’", "“" => "“", "â€" => "”",
"•" => "•", "–" => "–", "—" => "—", "Ëœ" => "˜", "â„¢" => "™",
"›" => "›", "Å“" => "œ", "Å’" => "Œ", "ž" => "ž", "Ÿ" => "Ÿ",
"Å¡" => "š", "Ž" => "Ž", "¡" => "¡", "¢" => "¢", "£" => "£",
"¤" => "¤", "Â¥" => "¥", "¦" => "¦", "§" => "§", "¨" => "¨",
"©" => "©", "ª" => "ª", "«" => "«", "¬" => "¬", "®" => "®",
"¯" => "¯", "°" => "°", "±" => "±", "²" => "²", "³" => "³",
"´" => "´", "µ" => "µ", "¶" => "¶", "·" => "·", "¸" => "¸",
"¹" => "¹", "º" => "º", "»" => "»", "¼" => "¼", "½" => "½",
"¾" => "¾", "¿" => "¿", "À" => "À", "Â" => "Â", "Ã" => "Ã",
"Ä" => "Ä", "Ã…" => "Å", "Æ" => "Æ", "Ç" => "Ç", "È" => "È",
"É" => "É", "Ê" => "Ê", "Ë" => "Ë", "ÃŒ" => "Ì", "ÃŽ" => "Î",
"Ñ" => "Ñ", "Ã’" => "Ò", "Ó" => "Ó", "Ô" => "Ô", "Õ" => "Õ",
"Ö" => "Ö", "×" => "×", "Ø" => "Ø", "Ù" => "Ù", "Ú" => "Ú",
"Û" => "Û", "Ãœ" => "Ü", "Þ" => "Þ", "ß" => "ß", "á" => "á",
"â" => "â", "ã" => "ã", "ä" => "ä", "Ã¥" => "å", "æ" => "æ",
"ç" => "ç", "è" => "è", "é" => "é", "ê" => "ê", "ë" => "ë",
"ì" => "ì", "í" => "í", "î" => "î", "ï" => "ï", "ð" => "ð",
"ñ" => "ñ", "ò" => "ò", "ó" => "ó", "ô" => "ô", "õ" => "õ",
"ö" => "ö", "÷" => "÷", "ø" => "ø", "ù" => "ù", "ú" => "ú",
"û" => "û", "ü" => "ü", "ý" => "ý", "þ" => "þ", "ÿ" => "ÿ"
}
def repair(value)
value or return
value.gsub!(Regexp.new(REPLACEMENTS.keys * ?|), REPLACEMENTS)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment