Skip to content

Instantly share code, notes, and snippets.

@acevif
Last active June 10, 2023 09:25
Show Gist options
  • Save acevif/016dafe3664cbb5bfe23efbb35e83cba to your computer and use it in GitHub Desktop.
Save acevif/016dafe3664cbb5bfe23efbb35e83cba to your computer and use it in GitHub Desktop.

文字コード・Unicodeメモ



Unicode

  • Grapheme Cluster
    • legacy grapheme cluster と extended grapheme cluster がある
    • legacy grapheme cluster は互換性のためなので、通常は、extended grapheme cluster を使えば良い。

結合文字列 Combining Character Sequence

コードポイント

  • utf8
  • utf16
  • utf32

今の理解: utf8やutf16のコードユニットの列→(utf32==コードポイントの列)→(結合文字列==Grapheme Cluster)の列

BOM

サロゲートペア surrogate pairs utf16の2つで1つのコードポイントを表す

Cocoaのstring objects(NSString系)はutf16基準。 character = “16-bit platform-endian UTF-16”

https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/Strings/Articles/stringsClusters.html


Shift JIS, CP932, Windows-31J(MS932), JIS X 0208

https://qiita.com/kasei-san/items/cfb993786153231e5413


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment