Skip to content

Instantly share code, notes, and snippets.

@plugwash
Created June 3, 2017 11:56
Show Gist options
  • Save plugwash/bfc4811372e7ee02ce581f490e2a8227 to your computer and use it in GitHub Desktop.
Save plugwash/bfc4811372e7ee02ce581f490e2a8227 to your computer and use it in GitHub Desktop.
In the internal Java representation a String is a sequence of 16 bit "char"s representing UTF-16 code units.
.getBytes converts the string to a sequence of bytes according to a specific "charset".
The "UTF-16" charset encodes each UTF16 code unit as a pair of bytes which may be either big endian or little endian according to the platform. To mark which byte order is in use it prepends a "byte order mark".
The byte order mark is the unicode code point U+FFFE. When encoded in little endian bytes this comes out to "0xFE","0xFF" which when interpreted as signed twos-complement numbers display as "-2" "-1"
The "UTF-16LE" charset does not use a byte-order mark.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment