This gist contains maps "not verifiable" GCCS characters documented in HKSCS-2004 Annex IV to Unihan or U-source records. The mapping for U-source characters should be mostly correct as their Adobe-CNS1 sources match the range of Adobe-CNS1-1, a supplement created for GCCS support.
This voluntary enrichment activity is followed by cake rewards. The mapping
generated is released into public domain. For a more authoritive description
of characters written in IDS, please refer to
TR45, which was found accidentally by
googling for the IDS sequence for 9FE6
on this page.
You can find a font with support for such code points at GlyphWiki.
GCCS | EUDC PUA | Unicode 9.0 |
---|---|---|
9EAC | U+ED2B | U+8C9B |
9EC4 | U+ED43 | UTC-00877 |
9EF4 | U+ED73 | UTC-00879 |
9F4E | U+ED8C | UTC-00880 |
9FAD | U+EDC9 | UTC-00882 |
9FB1 | U+EDCD | U+2B473 |
9FC0 | U+EDDC | UTC-00883 |
9FC8 | U+EDE4 | UTC-00884 |
9FDA | U+EDF6 | UTC-00886 |
9FE6 | U+EE02 | UTC-00887 |
9FEA | U+EE06 | UTC-00888 |
9FEF | U+EE0B | UTC-00889 |
A054 | U+EE2F | UTC-00890 |
A057 | U+EE32 | U+2AE67 |
A05A | U+EE35 | UTC-00891 |
A062 | U+EE3D | UTC-00892 |
A072 | U+EE4D | UTC-00893 |
A0A5 | U+EE5E | UTC-00894 |
A0AD | U+EE66 | UTC-00895 |
A0AF | U+EE68 | UTC-00896 |
A0D3 | U+EE8C | UTC-00897 |
A0E1 | U+EE9A | UTC-00898 |
See http://kanji-database.sourceforge.net/charcode/big5.html.
def big5_eudc_pua(byteseq: str):
H = int(byteseq[0:2], 16)
L = int(byteseq[2:4], 16)
if L < 0x40 or (L > 0x7e and L < 0xa1) or L == 0xff:
raise ValueError(byteseq) # Not valid Big5
_eudc_row = lambda L: (L - 0x40) if (L < 0x80) else (L - 0x62)
if H >= 0x81 and H <= 0x8D:
return 0xeeb8 + (157 * (H - 0x81)) + _eudc_row(L)
elif H >= 0x8E and H <= 0xA0:
return 0xe311 + (157 * (H - 0x8e)) + _eudc_row(L)
elif (H >= 0xC7 or (H == 0xC6 and L >= 0xA1)) and H <= 0xC8:
return 0xf672 + (157 * (H - 0xc6)) + _eudc_row(L)
elif H >= 0xFA and H <= 0xFE:
return 0xe000 + (157 * (H - 0xfa)) + _eudc_row(L)
else:
return None # DummyVal