Skip to content

Instantly share code, notes, and snippets.

@yfyf
Last active December 18, 2015 14:19
Show Gist options
  • Save yfyf/5796345 to your computer and use it in GitHub Desktop.
Save yfyf/5796345 to your computer and use it in GitHub Desktop.
Unicode handling
%% If your shell is setup correctly then inputing unicode chars should work like this:
1> A = "žžžūvis".
[382,382,382,363,118,105,115]
%% Contains integers > 255,
%% hence this is a list of codepoints, good!
%% None of these will work:
17> <<"žžžūvis">>.
** exception error: bad argument
18> <<"žžžūvis"/utf8>>.
** exception error: bad argument
%% Do this instead:
6> B = unicode:characters_to_binary(A).
<<"žžžūvis">>
%% This might seem bad, but it's just the funny formatting of the Erlang shell,
%% it's actually correct! To get it printed correctly use ~ts:
15> io:format("~ts~n", [B]).
žžžūvis
ok
%% To inspect the actual bytes, do use the `w` flag,
%% which prints the "raw" data, instead of trying to procude printable chars
8> io:format("~w~n", [B]).
<<197,190,197,190,197,190,197,171,118,105,115>>
%% Yay, a UTF-8 encoded binary (no integers > 255 present).
%% The usual way to mess up things:
9> BAD = binary_to_list(B).
"žžžūvis"
10> io:format("~w~n", [BAD]).
[197,190,197,190,197,190,197,171,118,105,115]
%% Oh no, no longer a list of codepoints, but a list of UTF-8 bytes!
%% This will fail explicitly rather misguiding you.
11> BAD2 = list_to_binary(A).
** exception error: bad argument
in function list_to_binary/1
called as list_to_binary([382,382,382,363,118,105,115])
%% Same with format:
16> io:format("~s~n", [A]).
** exception error: bad argument
in function io:format/3
called as io:format(<0.25.0>,"~s~n",[[382,382,382,363,118,105,115]])
%% Another great way to mess up is UTF-8 encode a list of UTF-8 bytes
%% instead of a list of Unicode codepoints:
12> BeyondHorrible = unicode:characters_to_binary(BAD).
<<195,133,194,190,195,133,194,190,195,133,194,190,195,133,
194,171,118,105,115>>
13> io:format("~s~n", [BeyondHorrible]).
žžžūvis
ok
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment