Last active
August 11, 2020 12:34
-
-
Save VenkataRaju/8cc19ed0bc6e6ef092a07c2e962574b5 to your computer and use it in GitHub Desktop.
Java Unicode Notes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
U+FFFF (Uniicode repesentation) is same as 0xFFFF (Hexa decimal) | |
To convert "U+FFFF" to code point | |
int codepoint=Integer.parseInt(str.substring(2), 16); | |
Basic Multilingual Pane (BMP) = U+00000 ( 0) to U+FFFF ( 65535) (1111111111111111 (16 1s) in Binary) | |
Supplementary chars = U+10000 (65536) to U+10FFFF (1114111) | |
All unicode characters = U+00000 ( 0) to U+10FFFF (1114111) | |
An unicode character in Java represented by two UTF-16 characters | |
First character is called High Surrogate, second Low Surrogate | |
High Surrogate = U+D800 (55296) to U+DBFF (56319) | |
Low Surrogate = U+DC00 (56320) to U+DFFF (57343) | |
A small set of code points are guaranteed never to be used for encoding characters, | |
although applications may make use of these code points internally if they wish. | |
There are 66 of these noncharacters: | |
U+FDD0 (64976) – U+FDEF (65007) | |
and any code point ending in the value FFFE or FFFF (i.e., U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, … U+10FFFE, U+10FFFF) | |
// Only BMP characters can be represented as char values in Java | |
int zeroWidthNonBreakingSpace = ''; | |
-------------------------------- | |
function getSurrogatePair(astralCodePoint) { | |
let highSurrogate = | |
Math.floor((astralCodePoint - 0x10000) / 0x400) + 0xD800; | |
let lowSurrogate = (astralCodePoint - 0x10000) % 0x400 + 0xDC00; | |
return [highSurrogate, lowSurrogate]; | |
} | |
getSurrogatePair(0x1F600); // => [0xDC00, 0xDFFF] | |
function getAstralCodePoint(highSurrogate, lowSurrogate) { | |
return (highSurrogate - 0xD800) * 0x400 | |
+ lowSurrogate - 0xDC00 + 0x10000; | |
} | |
getAstralCodePoint(0xD83D, 0xDE00); // => 0x1F600 | |
https://dmitripavlutin.com/what-every-javascript-developer-should-know-about-unicode/ | |
------------------------------- |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment