This article has moved to the official .NET Docs site.
See https://docs.microsoft.com/dotnet/standard/base-types/character-encoding-introduction.
This article has moved to the official .NET Docs site.
See https://docs.microsoft.com/dotnet/standard/base-types/character-encoding-introduction.
I think this is a very good write-up. There's one aspect that I disagree with however, and that's the recommendation to use char
instead when you're sure that the character will be representable as a single UTF-16 code unit. I think this is an unnecessary complication to the mental model, and also makes it harder to switch the backing encoding of a string (say, to a Utf8String
) without breaking code. I think that going forward, it makes more sense to avoid treating char
as an entire character, even when it is known to be. When searching for a character in a string, users shouldn't have to look up whether or not that character is in the BMP when it is simpler to just use Rune
.
3 years ago, I wrote an article about Unicode history (Unicode itself and .NET characters) in Japanese. Diagrams/illustrations in the article are drawn by using PowerPoint. I hope this pptx helps you.