paulyc/utf16dumbestthingever.txt

## utf16dumbestthingever.txt
UTF-16 is the dumbest thing ever. It's the kind of thing only a committee could love.
All strings should be stored as UTF-8.

Supposedly UTF-16 encodes all characters as 2 bytes, so that unlike UTF-8, a string can be
easily indexed without having to read the whole string.

Except for those pesky extended/astral plane characters. Which you can't possibly hope to
avoid, especially considering that EMOJI are astral plane characters, requiring FOUR BYTES
to store in UTF-16. OR IN UTF-8! So you still have to parse the whole string to find a character
index due to those pesky surrogage pairs. Advantage nullified.

Furthermore, UTF-16 is UTF-16, except when it isn't. Because UTF-16 strings are stored
differently on little-endian and big-endian machines! Everyone loves to ignore big-endian
architectures these days, but still, who knows what you're going to get? You don't want your
program to crash and burn just because someone fed it a big-endian UTF-16 string,
and someone most definitely will try.

But UTF-16 strings can't be treated just like the ASCII formatting of this file in code,
because they are invariably full of null bytes, so we have to rewrite all our string
processing code that normally handles ASCII or UTF-8 just to handle UTF-16!

So in summary, to the responsible programmer, UTF-16 has all the disadvantages of UTF-8,
an extra disadvantage, and none of the advantages.
	UTF-16 is the dumbest thing ever. It's the kind of thing only a committee could love.
	All strings should be stored as UTF-8.

	Supposedly UTF-16 encodes all characters as 2 bytes, so that unlike UTF-8, a string can be
	easily indexed without having to read the whole string.

	Except for those pesky extended/astral plane characters. Which you can't possibly hope to
	avoid, especially considering that EMOJI are astral plane characters, requiring FOUR BYTES
	to store in UTF-16. OR IN UTF-8! So you still have to parse the whole string to find a character
	index due to those pesky surrogage pairs. Advantage nullified.

	Furthermore, UTF-16 is UTF-16, except when it isn't. Because UTF-16 strings are stored
	differently on little-endian and big-endian machines! Everyone loves to ignore big-endian
	architectures these days, but still, who knows what you're going to get? You don't want your
	program to crash and burn just because someone fed it a big-endian UTF-16 string,
	and someone most definitely will try.

	But UTF-16 strings can't be treated just like the ASCII formatting of this file in code,
	because they are invariably full of null bytes, so we have to rewrite all our string
	processing code that normally handles ASCII or UTF-8 just to handle UTF-16!

	So in summary, to the responsible programmer, UTF-16 has all the disadvantages of UTF-8,
	an extra disadvantage, and none of the advantages.