Skip to content

Instantly share code, notes, and snippets.

@blakesmith
Created December 17, 2011 20:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save blakesmith/1491274 to your computer and use it in GitHub Desktop.
Save blakesmith/1491274 to your computer and use it in GitHub Desktop.
user=> [0xE2 0x80 0x99]
[226 128 153]
user=> (String. (into-array Integer/TYPE [0xE2 0x80 0x99]) 0 3)
"???"
user=>
@blakesmith
Copy link
Author

This was a problem with making UTF-8 strings from arrays of bytes in Clojure. The JVM only has signed byte primitives (max value 127). This doesn't work for multibyte unicode byte values that can go up to 255. The solution to this ended up requiring unchecked byte coercion... something like this:

blake@Blake-Smiths-MacBook-Pro:~/projects » JAVA_TOOL_OPTIONS=-Dfile.encoding=utf-8 lein repl
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=utf-8
REPL started; server listening on localhost:44445.
user=> (String. (into-array Byte/TYPE (map #(.byteValue %) [0xE2 0x80 0x99])) "UTF-8")
"’"
user=> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment