Perhaps the most prominent, if not the most important, property of any Unicode character is its name. These immutable labels are much more descriptive than their hexadecimal codes.
But these names aren’t used often in most programming languages. Why is that? Perhaps the biggest reason why is the [sheer size of the list of names][UnicodeData.txt
], weighing 1.769 MB; embedding such a file in every language runtime would impose a large burden on resource-constrained environments such as mobile devices. But can this burden be mitigated with data compression? How compressible is this list? Can we create a [succinct data structure][succinct] from this list?
Precedent for this project comes from GNU Uniname, a GNU Project utility, written by Bill Poser of British Columbia, which can look up the characters of names and the names of characters. Its source may be read in [uniname.c
][] and [test-uninames.c
][]. According to its documentation, the size of the compiled program