Skip to content

Instantly share code, notes, and snippets.

@poppycocker
Last active October 5, 2015 07:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save poppycocker/2769328 to your computer and use it in GitHub Desktop.
Save poppycocker/2769328 to your computer and use it in GitHub Desktop.
UTF-8 Hex String(of 1character) To Unicode code point value
/*-----------------------------------------------------------------------------------------
[UCS-2 (UCS-4)] [codepoint bit pattern] [1st byte] [2nd byte] [3rd byte] [4th byte]
U+ 0000.. U+007F 00000000-0xxxxxxx 0xxxxxxx
U+ 0080.. U+07FF 00000xxx-xxyyyyyy 110xxxxx 10yyyyyy
U+ 0800.. U+FFFF xxxxyyyy-yyzzzzzz 1110xxxx 10yyyyyy 10zzzzzz
U+10000..U+1FFFFF 00000000-000wwwxx 11110www 10xxxxxx 10yyyyyy 10zzzzzz
-xxxxyyyy-yyzzzzzzz
------------------------------------------------------------------------------------------*/
// e.g.
// [in] 0xC6A9 ('Σ', U+01A9)
// [out] 0d0425 (=0x1A9)
function getCodePoint(hex) {
var bytes, n, shift1st, codePoint;
// requires less than 4bytes
bytes = Math.floor(Math.log(hex) / Math.log(0xFF) + 1);
if (bytes > 4) {
return -1;
}
// mask 1st byte
shift1st = (bytes === 1) ? 0 : (bytes + 1);
codePoint = (hex >> ((bytes - 1) * 8)) & (0xFF >> shift1st);
for (n = 1; n < bytes; n++) {
codePoint <<= 6;
// 2nd-4th byte: mask 0x00111111
codePoint += (hex >> ((bytes - 1 - n) * 8)) & 0x3F;
}
return codePoint;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment