Skip to content

Instantly share code, notes, and snippets.

@mitio
Created December 20, 2013 17:43
Show Gist options
  • Save mitio/8058544 to your computer and use it in GitHub Desktop.
Save mitio/8058544 to your computer and use it in GitHub Desktop.
Fix broken cyrillic, originally written in windows-1251 (cp1251), but wrongly interpreted as valid UTF-8 and encoded as such.
# Windows-1251 encoded as UTF-8
def decode_broken_cyrillic(s)
offset = 848
utf8_chars = s.chars.map do |char|
if char.ord > 127
[char.ord + offset].pack('U')
else
char
end
end
utf8_chars.join('')
end
# Example - should print out the following:
# В такъв дух ви искаме! Вт акъв дух ви иск аме?
puts decode_broken_cyrillic 'Â òàêúâ äóõ âè èñêàìå! Âò àêúâ äóõ âè èñê àìå?'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment