Created
June 3, 2017 11:56
-
-
Save plugwash/bfc4811372e7ee02ce581f490e2a8227 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In the internal Java representation a String is a sequence of 16 bit "char"s representing UTF-16 code units. | |
.getBytes converts the string to a sequence of bytes according to a specific "charset". | |
The "UTF-16" charset encodes each UTF16 code unit as a pair of bytes which may be either big endian or little endian according to the platform. To mark which byte order is in use it prepends a "byte order mark". | |
The byte order mark is the unicode code point U+FFFE. When encoded in little endian bytes this comes out to "0xFE","0xFF" which when interpreted as signed twos-complement numbers display as "-2" "-1" | |
The "UTF-16LE" charset does not use a byte-order mark. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment