Skip to content

Instantly share code, notes, and snippets.

@abinoam
Last active August 29, 2015 14:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abinoam/224cbffd5cae7f591b10 to your computer and use it in GitHub Desktop.
Save abinoam/224cbffd5cae7f591b10 to your computer and use it in GitHub Desktop.
https://www.ruby-forum.com/topic/4980931 - Gist of an irb session messing around with encodings
file = File.open "acao_e_reacao_utf16_le.txt"
=> #<File:acao_e_reacao_utf16_le.txt>
file.methods.grep /enc/
=> [:external_encoding, :internal_encoding, :set_encoding]
file.external_encoding
=> #<Encoding:UTF-8>
file.internal_encoding
=> nil
str = file.read
=> "A\u0000\xE7\u0000\xE3\u0000o\u0000 \u0000e\u0000 \u0000R\u0000e\u0000a\u0000\xE7\u0000\xE3\u0000o\u0000"
str.encoding
=> #<Encoding:UTF-8>
str.encode(Encoding::UTF_8)
=> "A\u0000\xE7\u0000\xE3\u0000o\u0000 \u0000e\u0000 \u0000R\u0000e\u0000a\u0000\xE7\u0000\xE3\u0000o\u0000"
str.encode(Encoding::UTF_8, Encoding::UTF_16)
Encoding::InvalidByteSequenceError: "A\x00" on UTF-16
from (irb):8:in `encode'
from (irb):8
from /home/abinoam/.rvm/rubies/ruby-2.1.1/bin/irb:11:in `<main>'
str.encode(Encoding::UTF_8, Encoding::UTF_16BE)
=> "䄀漀 攀 刀攀愀漀"
str.encode(Encoding::UTF_8, Encoding::UTF_16LE)
=> "Ação e Reação" # THE RIGHT SOURCE ENCODING
# Another approach
# Set the encoding on the opening of the file
file = File.open "acao_e_reacao_utf16_le.txt", "rb:utf-16le"
=> #<File:acao_e_reacao_utf16_le.txt>
file.external_encoding
=> #<Encoding:UTF-16LE>
file.internal_encoding
=> nil
str = file.read
=> "A\u00E7\u00E3o e Rea\u00E7\u00E3o"
str.encoding
=> #<Encoding:UTF-16LE>
str.encode(Encoding::UTF_8)
=> "Ação e Reação"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment