Skip to content

Instantly share code, notes, and snippets.

@Artoria2e5
Last active October 3, 2016 16:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Artoria2e5/63fa5747c7093238ddcc1458a7582aca to your computer and use it in GitHub Desktop.
Save Artoria2e5/63fa5747c7093238ddcc1458a7582aca to your computer and use it in GitHub Desktop.
Unicode mappings for GCCS "not verifiable" characters

This gist contains maps "not verifiable" GCCS characters documented in HKSCS-2004 Annex IV to Unihan or U-source records. The mapping for U-source characters should be mostly correct as their Adobe-CNS1 sources match the range of Adobe-CNS1-1, a supplement created for GCCS support.

This voluntary enrichment activity is followed by cake rewards. The mapping generated is released into public domain. For a more authoritive description of characters written in IDS, please refer to TR45, which was found accidentally by googling for the IDS sequence for 9FE6 on this page.

You can find a font with support for such code points at GlyphWiki.

GCCS EUDC PUA Unicode 9.0
9EAC U+ED2B U+8C9B
9EC4 U+ED43 UTC-00877
9EF4 U+ED73 UTC-00879
9F4E U+ED8C UTC-00880
9FAD U+EDC9 UTC-00882
9FB1 U+EDCD U+2B473
9FC0 U+EDDC UTC-00883
9FC8 U+EDE4 UTC-00884
9FDA U+EDF6 UTC-00886
9FE6 U+EE02 UTC-00887
9FEA U+EE06 UTC-00888
9FEF U+EE0B UTC-00889
A054 U+EE2F UTC-00890
A057 U+EE32 U+2AE67
A05A U+EE35 UTC-00891
A062 U+EE3D UTC-00892
A072 U+EE4D UTC-00893
A0A5 U+EE5E UTC-00894
A0AD U+EE66 UTC-00895
A0AF U+EE68 UTC-00896
A0D3 U+EE8C UTC-00897
A0E1 U+EE9A UTC-00898

Useful GlyphWiki Links

Big5-EUDC to PUA

See http://kanji-database.sourceforge.net/charcode/big5.html.

def big5_eudc_pua(byteseq: str):
    H = int(byteseq[0:2], 16)
    L = int(byteseq[2:4], 16)
    if L < 0x40 or (L > 0x7e and L < 0xa1) or L == 0xff:
        raise ValueError(byteseq)  # Not valid Big5

    _eudc_row = lambda L: (L - 0x40) if (L < 0x80) else (L - 0x62)
    if H >= 0x81 and H <= 0x8D:
        return 0xeeb8 + (157 * (H - 0x81)) + _eudc_row(L)
    elif H >= 0x8E and H <= 0xA0:
        return 0xe311 + (157 * (H - 0x8e)) + _eudc_row(L)
    elif (H >= 0xC7 or (H == 0xC6 and L >= 0xA1)) and H <= 0xC8:
        return 0xf672 + (157 * (H - 0xc6)) + _eudc_row(L)
    elif H >= 0xFA and H <= 0xFE:
        return 0xe000 + (157 * (H - 0xfa)) + _eudc_row(L)
    else:
        return None  # DummyVal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment