Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
user=> [0xE2 0x80 0x99]
[226 128 153]
user=> (String. (into-array Integer/TYPE [0xE2 0x80 0x99]) 0 3)
"???"
user=>
@blakesmith

This comment has been minimized.

Copy link
Owner Author

blakesmith commented Jan 3, 2012

This was a problem with making UTF-8 strings from arrays of bytes in Clojure. The JVM only has signed byte primitives (max value 127). This doesn't work for multibyte unicode byte values that can go up to 255. The solution to this ended up requiring unchecked byte coercion... something like this:

blake@Blake-Smiths-MacBook-Pro:~/projects » JAVA_TOOL_OPTIONS=-Dfile.encoding=utf-8 lein repl
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf-8
REPL started; server listening on localhost:44445.
user=> (String. (into-array Byte/TYPE (map #(.byteValue %) [0xE2 0x80 0x99])) "UTF-8")
"’"
user=> 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.