Skip to content

Instantly share code, notes, and snippets.

@VenkataRaju
Last active August 11, 2020 12:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save VenkataRaju/8cc19ed0bc6e6ef092a07c2e962574b5 to your computer and use it in GitHub Desktop.
Save VenkataRaju/8cc19ed0bc6e6ef092a07c2e962574b5 to your computer and use it in GitHub Desktop.
Java Unicode Notes
U+FFFF (Uniicode repesentation) is same as 0xFFFF (Hexa decimal)
To convert "U+FFFF" to code point
int codepoint=Integer.parseInt(str.substring(2), 16);
Basic Multilingual Pane (BMP) = U+00000 ( 0) to U+FFFF ( 65535) (1111111111111111 (16 1s) in Binary)
Supplementary chars = U+10000 (65536) to U+10FFFF (1114111)
All unicode characters = U+00000 ( 0) to U+10FFFF (1114111)
An unicode character in Java represented by two UTF-16 characters
First character is called High Surrogate, second Low Surrogate
High Surrogate = U+D800 (55296) to U+DBFF (56319)
Low Surrogate = U+DC00 (56320) to U+DFFF (57343)
A small set of code points are guaranteed never to be used for encoding characters,
although applications may make use of these code points internally if they wish.
There are 66 of these noncharacters:
U+FDD0 (64976) – U+FDEF (65007)
and any code point ending in the value FFFE or FFFF (i.e., U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, … U+10FFFE, U+10FFFF)
// Only BMP characters can be represented as char values in Java
int zeroWidthNonBreakingSpace = '';
--------------------------------
function getSurrogatePair(astralCodePoint) {
let highSurrogate =
Math.floor((astralCodePoint - 0x10000) / 0x400) + 0xD800;
let lowSurrogate = (astralCodePoint - 0x10000) % 0x400 + 0xDC00;
return [highSurrogate, lowSurrogate];
}
getSurrogatePair(0x1F600); // => [0xDC00, 0xDFFF]
function getAstralCodePoint(highSurrogate, lowSurrogate) {
return (highSurrogate - 0xD800) * 0x400
+ lowSurrogate - 0xDC00 + 0x10000;
}
getAstralCodePoint(0xD83D, 0xDE00); // => 0x1F600
https://dmitripavlutin.com/what-every-javascript-developer-should-know-about-unicode/
-------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment