Skip to content

Instantly share code, notes, and snippets.

@schierlm
Last active August 31, 2020 19:53
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save schierlm/aa37036335528b9b12bb to your computer and use it in GitHub Desktop.
Save schierlm/aa37036335528b9b12bb to your computer and use it in GitHub Desktop.
Text document containing all characters of the Multilingual European Subsets of Unicode and some other common Unicode subsets (and a small Java program to verify the file has not been garbled)
Common Unicode Subsets
======================
ASCII
~~~~~
Not exactly known as a Unicode subset; the Unicode character set starts with
ASCII, though; therefore, ASCII is the smallest widely-used subset of
Unicode.
|Latin uppercase letters |0041-5A(26)|ABCDEFGHIJKLMNOPQRSTUVWXYZ|-----|
|Latin lowercase letters |0061-7A(26)|abcdefghijklmnopqrstuvwxyz|-----|
|Decimal digits |0030-39(10)|0123456789|---------------------|
|Symbols and special characters |0020-2F(16)| !"#$%&'()*+,-./|---------------|
|'-> |003A-40,5B-60,7B-7E(17)|:;<=>?@[\]^_`{|}~|--------------|
ASCII is defined as:
>00 20-7E
>#95
Multilingual European Subset 1 (MES-1)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On top of ASCII, this charset contains common Latin letters and symbols used
in Europe (or by European character sets):
|Latin-1 symbols |00A0-BF(32)| ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿|
|Latin-1 uppercase letters |00C0-DF(32)|ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß|
|Latin-1 lowercase letters |00E0-FF(32)|àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ|
|Latin extended |0100-13(20)|ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒē|-----------|
|'-> |0116-2B(22)|ĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪī|---------|
|'-> |012E-4D(32)|ĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌō|
|'-> |0150-67(24)|ŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧ|-------|
|'-> |0168-7E(23)|ŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽž|--------|
|Accents |02C7-C7(01)|ˇ|------------------------------|
|'-> |02D8-DB,DD-DD(05)|˘˙˚˛˝|--------------------------|
|Typographic special characters |2015-15(01)|―|------------------------------|
|'-> |2018-19,1C-1D(04)|‘’“”|---------------------------|
|Euro symbol |20AC-AC(01)|€|------------------------------|
|Trademark symbol |2122-22(01)|™|------------------------------|
|Ohm symbol |2126-26(01)|Ω|------------------------------|
|Vulgar fractions |215B-5E(04)|⅛⅜⅝⅞|---------------------------|
|Arrow symbols |2190-93(04)|←↑→↓|---------------------------|
|Musical note symbol |266A-6A(01)|♪|------------------------------|
MES-1 is defined as:
>00 20-7E A0-FF
>01 00-13 16-2B 2E-4D 50-7E
>02 C7 D8-DB DD
>20 15 18-19 1C-1D AC
>21 22 26 5B-5E 90-93
>26 6A
>#335
Multilingual European Subset 2 (MES-2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On top of MES-1, this contains more "exotic" Latin European characters
as well as Greek and Cyrillic ones, and more symbols:
|Latin extended |0114-15(02)|Ĕĕ|-----------------------------|
|'-> |012C-2D,4E-4F(04)|ĬĭŎŏ|---------------------------|
|'-> |0192-92,FA-FF(07)|ƒǺǻǼǽǾǿ|------------------------|
|'-> |1E80-85,F2-F3(08)|ẀẁẂẃẄẅỲỳ|-----------------------|
|'-> (*) |01DE-EF(18)|ǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯ|-------------|
|'-> (*) |0218-1B,1E-1F(06)|ȘșȚțȞȟ|-------------------------|
|'-> (*) |1E02-03,0A-0B,1E-1F,40-41(08)|ḂḃḊḋḞḟṀṁ|-----------------------|
|'-> (*) |1E56-57,60-61,6A-6B(06)|ṖṗṠṡṪṫ|-------------------------|
|More exotic Latin letters |017F-7F(01)|ſ|------------------------------|
|'-> (*) |018F-8F,B7-B7(02)|ƏƷ|-----------------------------|
|'-> (*) |0259-59,7C-7C,92-92(03)|əɼʒ|----------------------------|
|'-> (*) |1E9B-9B(01)|ẛ|------------------------------|
|Latin Modifier letters |02C6-C6(01)|ˆ|------------------------------|
|'-> |02C9-C9,DC-DC(02)|ˉ˜|-----------------------------|
|'-> (*) |02BB-BD,EE-EE(04)|ʻʼʽˮ|---------------------------|
|Greek uppercase letters |0391-A1(17)|ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡ|--------------|
|'-> |03A3-A9(07)|ΣΤΥΦΧΨΩ|------------------------|
|Greek lowercase letters |03B1-C9(25)|αβγδεζηθικλμνξοπρςστυφχψω|------|
|Greek extended |0384-8A(07)|΄΅Ά·ΈΉΊ|------------------------|
|'-> |038C-8C,8E-90(04)|ΌΎΏΐ|---------------------------|
|'-> |03AA-B0,CA-CE(12)|ΪΫάέήίΰϊϋόύώ|-------------------|
|'-> (*) |0374-75,7A-7A,7E-7E(04)|ʹ͵ͺ;|---------------------------|
|'-> (*) |03D7-D7,DA-E1(09)|ϗϚϛϜϝϞϟϠϡ|----------------------|
|'-> (*) |1F00-15,18-1D(28)|ἀἁἂἃἄἅἆἇἈἉἊἋἌἍἎἏἐἑἒἓἔἕἘἙἚἛἜἝ|---|
|'-> (*) |1F20-3F(32)|ἠἡἢἣἤἥἦἧἨἩἪἫἬἭἮἯἰἱἲἳἴἵἶἷἸἹἺἻἼἽἾἿ|
|'-> (*) |1F40-45,48-4D,50-57(20)|ὀὁὂὃὄὅὈὉὊὋὌὍὐὑὒὓὔὕὖὗ|-----------|
|'-> (*) |1F59-59,5B-5B,5D-5D(03)|ὙὛὝ|----------------------------|
|'-> (*) |1F5F-7D(31)|ὟὠὡὢὣὤὥὦὧὨὩὪὫὬὭὮὯὰάὲέὴήὶίὸόὺύὼώ||
|'-> (*) |1F80-9F(32)|ᾀᾁᾂᾃᾄᾅᾆᾇᾈᾉᾊᾋᾌᾍᾎᾏᾐᾑᾒᾓᾔᾕᾖᾗᾘᾙᾚᾛᾜᾝᾞᾟ|
|'-> (*) |1FA0-B4(21)|ᾠᾡᾢᾣᾤᾥᾦᾧᾨᾩᾪᾫᾬᾭᾮᾯᾰᾱᾲᾳᾴ|----------|
|'-> (*) |1FB6-C4,C6-D3(29)|ᾶᾷᾸᾹᾺΆᾼ᾽ι᾿῀῁ῂῃῄῆῇῈΈῊΉῌ῍῎῏ῐῑῒΐ|--|
|'-> (*) |1FD6-DB,DD-EF(25)|ῖῗῘῙῚΊ῝῞῟ῠῡῢΰῤῥῦῧῨῩῪΎῬ῭΅`|------|
|'-> (*) |1FF2-F4,F6-FE(12)|ῲῳῴῶῷῸΌῺΏῼ´῾|-------------------|
|Cyrillic |0400-1F(32)|ЀЁЂЃЄЅІЇЈЉЊЋЌЍЎЏАБВГДЕЖЗИЙКЛМНОП|
|'-> |0420-3F(32)|РСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмноп|
|'-> |0440-5F(32)|рстуфхцчшщъыьэюяѐёђѓєѕіїјљњћќѝўџ|
|'-> |0490-91(02)|Ґґ|-----------------------------|
|'-> (*) |0492-B1(32)|ҒғҔҕҖҗҘҙҚқҜҝҞҟҠҡҢңҤҥҦҧҨҩҪҫҬҭҮүҰұ|
|'-> (*) |04B2-C4,C7-C8(21)|ҲҳҴҵҶҷҸҹҺһҼҽҾҿӀӁӂӃӄӇӈ|----------|
|'-> (*) |04CB-CC,D0-EB(30)|ӋӌӐӑӒӓӔӕӖӗӘәӚӛӜӝӞӟӠӡӢӣӤӥӦӧӨөӪӫ|-|
|'-> (*) |04EE-F5,F8-F9(10)|ӮӯӰӱӲӳӴӵӸӹ|---------------------|
|Typographic symbols |2013-14(02)|–—|-----------------------------|
|'-> |2017-17,1A-1B,1E-1E,20-22(07)|‗‚‛„†‡•|------------------------|
|'-> |2026-26,30-30,32-33,39-3A(06)|…‰′″‹›|-------------------------|
|'-> |203C-3C,3E-3E,44-44,7F-7F(04)|‼‾⁄ⁿ|---------------------------|
|'-> (*) |204A-4A,82-82(02)|⁊₂|-----------------------------|
|Currency symbols |20A3-A4(02)|₣₤|-----------------------------|
|'-> |20A7-A7(01)|₧|------------------------------|
|'-> (*) |20AF-AF(01)|₯|------------------------------|
|Business symbols |2105-05(01)|℅|------------------------------|
|'-> |2116-16(01)|№|------------------------------|
|Arrow symbols |2194-95(02)|↔↕|-----------------------------|
|'-> |21A8-A8(01)|↨|------------------------------|
|Mathematical symbols |2202-02(01)|∂|------------------------------|
|'-> |2206-06,0F-0F,11-12,19-1A(06)|∆∏∑−∙√|-------------------------|
|'-> |221E-1F,29-29,2B-2B(04)|∞∟∩∫|---------------------------|
|'-> |2248-48,60-61,64-65(05)|≈≠≡≤≥|--------------------------|
|'-> |2302-02,10-10,20-21(04)|⌂⌐⌠⌡|---------------------------|
|'-> (*) |2200-00,03-03,08-09(04)|∀∃∈∉|---------------------------|
|'-> (*) |2227-28,2A-2A,59-59(04)|∧∨∪≙|---------------------------|
|'-> (*) |2282-83,95-95,97-97(04)|⊂⊃⊕⊗|---------------------------|
|'-> (*) |2329-2A(02)|〈〉|-----------------------------|
|Box drawing characters |2500-00(01)|─|------------------------------|
|'-> |2502-02,0C-0C,10-10(03)|│┌┐|----------------------------|
|'-> |2514-14,18-18,1C-1C(03)|└┘├|----------------------------|
|'-> |2524-24,2C-2C,34-34,3C-3C(04)|┤┬┴┼|---------------------------|
|'-> |2550-6C(29)|═║╒╓╔╕╖╗╘╙╚╛╜╝╞╟╠╡╢╣╤╥╦╧╨╩╪╫╬|--|
|Block graphic characters |2580-80(01)|▀|------------------------------|
|'-> |2584-84,88-88,8C-8C(03)|▄█▌|----------------------------|
|'-> |2590-93(04)|▐░▒▓|---------------------------|
|Shapes |25A0-A0(01)|■|------------------------------|
|'-> |25AC-AC(01)|▬|------------------------------|
|'-> |25B2-B2,BA-BA,BC-BC,C4-C4(04)|▲►▼◄|---------------------------|
|'-> |25CA-CB,D8-D9(04)|◊○◘◙|---------------------------|
|Miscellaneous symbols |263A-3C(03)|☺☻☼|----------------------------|
|'-> |2640-40,42-42(02)|♀♂|-----------------------------|
|'-> |2660-60,63-63,65-66(04)|♠♣♥♦|---------------------------|
|'-> |266B-6B(01)|♫|------------------------------|
|Ligatures |FB01-02(02)|fifl|-----------------------------|
|Replacement character (*) |FFFD-FD(01)|�|------------------------------|
MES-2 is defined as:
>00 20-7E A0-FF
>01 00-7F 8F 92 B7 DE-EF FA-FF
>02 18-1B 1E-1F 59 7C 92 BB-BD C6-C7 C9 D8-DD EE
>03 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D7 DA-E1
>04 00-5F 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9
>1E 02-03 0A-0B 1E-1F 40-41 56-57 60-61 6A-6B 80-85 9B F2-F3
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB
>1F DD-EF F2-F4 F6-FE
>20 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 4A 7F 82 A3-A4 A7 AC AF
>21 05 16 22 26 5B-5E 90-95 A8
>22 00 02-03 06 08-09 0F 11-12 19-1A 1E-1F 27-2B 48 59 60-61 64-65 82-83 95 97
>23 02 10 20-21 29-2A
>25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0 AC B2 BA BC C4
>25 CA-CB D8-D9
>26 3A-3C 40 42 60 63 65-66 6A-6B
>FB 01-02
>FF FD
>#1052
Windows Glyph List 4 (WGL-4)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A superset of MES-1 and mostly a subset of MES-2 (everything not marked with
(*) above), but with a few additional characters; this was defined by
Microsoft as the character set that is supposed to be displayable on all mayor
Windows versions without installing additional fonts.
|Special letters |2113-13(01)|ℓ|------------------------------|
|'-> |212E-2E(01)|℮|------------------------------|
|Special symbols |2215-15(01)|∕|------------------------------|
|'-> |25A1-A1,AA-AB,CF-CF,E6-E6(05)|□▪▫●◦|--------------------------|
WGL4 is defined as:
>00 20-7E A0-FF
>01 00-7F 92 FA-FF
>02 C6-C7 C9 D8-DD
>03 84-8A 8C 8E-A1 A3-CE
>04 00-5F 90-91
>1E 80-85 F2-F3
>20 13-15 17-1E 20-22 26 30 32-33 39-3A 3C 3E 44 7F A3-A4 A7 AC
>21 05 13 16 22 26 2E 5B-5E 90-95 A8
>22 02 06 0F 11-12 15 19-1A 1E-1F 29 2B 48 60-61 64-65
>23 02 10 20-21
>25 00 02 0C 10 14 18 1C 24 2C 34 3C 50-6C 80 84 88 8C 90-93 A0-A1 AA-AC B2 BA
>25 BC C4 CA-CB CF D8-D9 E6
>26 3A-3C 40 42 60 63 65-66 6A-6B
>FB 01-02
>#*655
Multilingual European Subset 3 (MES-3) and its variants
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
MES-3 contains even more characters. There are several version of this subset,
MES-3A is an open subset (that may receive more characters if they are added
to the respective code ranges), so is not included in this file here.
MES-3B and MES-3KS are two fixed subsets. The latter does not contain some
characters that are not used by languages of European origin, and is therefore
shown first here (as difference to MES-2 and WGL):
|MES-3KS |0180-81(02)|ƀƁ|-----------------------------|
|'-> |018B-8C(02)|Ƌƌ|-----------------------------|
|'-> |0195-95(01)|ƕ|------------------------------|
|'-> |019A-9B(02)|ƚƛ|-----------------------------|
|'-> |019E-9F(02)|ƞƟ|-----------------------------|
|'-> |01A2-A3(02)|Ƣƣ|-----------------------------|
|'-> |01A6-A6(01)|Ʀ|------------------------------|
|'-> |01AA-AB(02)|ƪƫ|-----------------------------|
|'-> |01B5-B6(02)|Ƶƶ|-----------------------------|
|'-> |01B8-BB(04)|Ƹƹƺƻ|---------------------------|
|'-> |01BE-CC(15)|ƾƿǀǁǂǃDŽDždžLJLjljNJNjnj|----------------|
|'-> |01D5-D6(02)|Ǖǖ|-----------------------------|
|'-> |01F0-F7(08)|ǰDZDzdzǴǵǶǷ|-----------------------|
|'-> |0200-17(24)|ȀȁȂȃȄȅȆȇȈȉȊȋȌȍȎȏȐȑȒȓȔȕȖȗ|-------|
|'-> |021C-1D(02)|Ȝȝ|-----------------------------|
|'-> |0224-27(04)|ȤȥȦȧ|---------------------------|
|'-> |022A-33(10)|ȪȫȬȭȮȯȰȱȲȳ|---------------------|
|'-> |0250-58(09)|ɐɑɒɓɔɕɖɗɘ|----------------------|
|'-> |025A-79(32)|ɚɛɜɝɞɟɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯɰɱɲɳɴɵɶɷɸɹ|
|'-> |027A-7B(02)|ɺɻ|-----------------------------|
|'-> |027D-91(21)|ɽɾɿʀʁʂʃʄʅʆʇʈʉʊʋʌʍʎʏʐʑ|----------|
|'-> |0293-AD(27)|ʓʔʕʖʗʘʙʚʛʜʝʞʟʠʡʢʣʤʥʦʧʨʩʪʫʬʭ|----|
|'-> |02B0-BA(11)|ʰʱʲʳʴʵʶʷʸʹʺ|--------------------|
|'-> |02BE-C5(08)|ʾʿˀˁ˂˃˄˅|-----------------------|
|'-> |02C8-C8(01)|ˈ|------------------------------|
|'-> |02CA-D7(14)|ˊˋˌˍˎˏːˑ˒˓˔˕˖˗|-----------------|
|'-> |02DE-ED(16)|˞˟ˠˡˢˣˤ˥˦˧˨˩˪˫ˬ˭|---------------|
|'-> |0300-1F(32)|̛̖̗̘̙̜̝̞̟̀́̂̃̄̅̆̇̈̉̊̋̌̍̎̏̐̑̒̓̔̕̚|
|'-> |0320-3F(32)|̴̵̶̷̸̡̢̧̨̠̣̤̥̦̩̪̫̬̭̮̯̰̱̲̳̹̺̻̼̽̾̿|
|'-> |0340-4E(15)|͇͈͉͍͎̀́͂̓̈́͆͊͋͌ͅ|----------------|
|'-> |0360-62(03)|͢͠͡|----------------------------|
|'-> |03D0-D6(07)|ϐϑϒϓϔϕϖ|------------------------|
|'-> |03E2-F3(18)|ϢϣϤϥϦϧϨϩϪϫϬϭϮϯϰϱϲϳ|-------------|
|'-> |0460-7F(32)|ѠѡѢѣѤѥѦѧѨѩѪѫѬѭѮѯѰѱѲѳѴѵѶѷѸѹѺѻѼѽѾѿ|
|'-> |0480-86(07)|Ҁҁ҂҃҄҅҆|------------------------|
|'-> |0488-89(02)|҈҉|-----------------------------|
|'-> |048C-8F(04)|ҌҍҎҏ|---------------------------|
|'-> |04EC-ED(02)|Ӭӭ|-----------------------------|
|'-> |0531-50(32)|ԱԲԳԴԵԶԷԸԹԺԻԼԽԾԿՀՁՂՃՄՅՆՇՈՉՊՋՌՍՎՏՐ|
|'-> |0551-56(06)|ՑՒՓՔՕՖ|-------------------------|
|'-> |0559-5F(07)|ՙ՚՛՜՝՞՟|------------------------|
|'-> |0561-80(32)|աբգդեզէըթժիլխծկհձղճմյնշոչպջռսվտր|
|'-> |0581-87(07)|ցւփքօֆև|------------------------|
|'-> |0589-8A(02)|։֊|-----------------------------|
|'-> |10D0-EF(32)|აბგდევზთიკლმნოპჟრსტუფქღყშჩცძწჭხჯ|
|'-> |10F0-F6(07)|ჰჱჲჳჴჵჶ|------------------------|
|'-> |10FB-FB(01)|჻|------------------------------|
|'-> |1E00-01(02)|Ḁḁ|-----------------------------|
|'-> |1E04-09(06)|ḄḅḆḇḈḉ|-------------------------|
|'-> |1E0C-1D(18)|ḌḍḎḏḐḑḒḓḔḕḖḗḘḙḚḛḜḝ|-------------|
|'-> |1E20-3F(32)|ḠḡḢḣḤḥḦḧḨḩḪḫḬḭḮḯḰḱḲḳḴḵḶḷḸḹḺḻḼḽḾḿ|
|'-> |1E42-55(20)|ṂṃṄṅṆṇṈṉṊṋṌṍṎṏṐṑṒṓṔṕ|-----------|
|'-> |1E58-5F(08)|ṘṙṚṛṜṝṞṟ|-----------------------|
|'-> |1E62-69(08)|ṢṣṤṥṦṧṨṩ|-----------------------|
|'-> |1E6C-7F(20)|ṬṭṮṯṰṱṲṳṴṵṶṷṸṹṺṻṼṽṾṿ|-----------|
|'-> |1E86-9A(21)|ẆẇẈẉẊẋẌẍẎẏẐẑẒẓẔẕẖẗẘẙẚ|----------|
|'-> |2000-12(19)|           ​‌‍‎‏‐‑‒|------------|
|'-> |2016-16(01)|‖|------------------------------|
|'-> |201F-1F(01)|‟|------------------------------|
|'-> |2023-25(03)|‣․‥|----------------------------|
|'-> |2027-2F(09)|‧

‪‫‬‭‮ |----------------------|
‭The previous line contains right-to-left separators and may look strange.
|'-> |2031-31(01)|‱|------------------------------|
|'-> |2034-38(05)|‴‵‶‷‸|--------------------------|
|'-> |203B-3B(01)|※|------------------------------|
|'-> |203D-3D(01)|‽|------------------------------|
|'-> |203F-43(05)|‿⁀⁁⁂⁃|--------------------------|
|'-> |2045-46(02)|⁅⁆|-----------------------------|
|'-> |2048-49(02)|⁈⁉|-----------------------------|
|'-> |204B-4D(03)|⁋⁌⁍|----------------------------|
|'-> |206A-70(07)|⁰|------------------------|
|'-> |2074-7E(11)|⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾|--------------------|
|'-> |2080-81(02)|₀₁|-----------------------------|
|'-> |2083-8E(12)|₃₄₅₆₇₈₉₊₋₌₍₎|-------------------|
|'-> |20A0-A2(03)|₠₡₢|----------------------------|
|'-> |20A5-A6(02)|₥₦|-----------------------------|
|'-> |20A8-AB(04)|₨₩₪₫|---------------------------|
|'-> |20AD-AE(02)|₭₮|-----------------------------|
|'-> |20D0-E3(20)|⃒⃓⃘⃙⃚⃐⃑⃔⃕⃖⃗⃛⃜⃝⃞⃟⃠⃡⃢⃣|-----------|
|'-> |2100-04(05)|℀℁ℂ℃℄|--------------------------|
|'-> |2106-12(13)|℆ℇ℈℉ℊℋℌℍℎℏℐℑℒ|------------------|
|'-> |2114-15(02)|℔ℕ|-----------------------------|
|'-> |2117-21(11)|℗℘ℙℚℛℜℝ℞℟℠℡|--------------------|
|'-> |2123-25(03)|℣ℤ℥|----------------------------|
|'-> |2127-2D(07)|℧ℨ℩KÅℬℭ|------------------------|
|'-> |212F-3A(12)|ℯℰℱℲℳℴℵℶℷℸℹ℺|-------------------|
|'-> |2153-5A(08)|⅓⅔⅕⅖⅗⅘⅙⅚|-----------------------|
|'-> |215F-7E(32)|⅟ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫⅬⅭⅮⅯⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻⅼⅽⅾ|
|'-> |217F-83(05)|ⅿↀↁↂↃ|--------------------------|
|'-> |2196-A7(18)|↖↗↘↙↚↛↜↝↞↟↠↡↢↣↤↥↦↧|-------------|
|'-> |21A9-C8(32)|↩↪↫↬↭↮↯↰↱↲↳↴↵↶↷↸↹↺↻↼↽↾↿⇀⇁⇂⇃⇄⇅⇆⇇⇈|
|'-> |21C9-E8(32)|⇉⇊⇋⇌⇍⇎⇏⇐⇑⇒⇓⇔⇕⇖⇗⇘⇙⇚⇛⇜⇝⇞⇟⇠⇡⇢⇣⇤⇥⇦⇧⇨|
|'-> |21E9-F3(11)|⇩⇪⇫⇬⇭⇮⇯⇰⇱⇲⇳|--------------------|
|'-> |2201-01(01)|∁|------------------------------|
|'-> |2204-05(02)|∄∅|-----------------------------|
|'-> |2207-07(01)|∇|------------------------------|
|'-> |220A-0E(05)|∊∋∌∍∎|--------------------------|
|'-> |2210-10(01)|∐|------------------------------|
|'-> |2213-14(02)|∓∔|-----------------------------|
|'-> |2216-18(03)|∖∗∘|----------------------------|
|'-> |221B-1D(03)|∛∜∝|----------------------------|
|'-> |2220-26(07)|∠∡∢∣∤∥∦|------------------------|
|'-> |222C-47(28)|∬∭∮∯∰∱∲∳∴∵∶∷∸∹∺∻∼∽∾∿≀≁≂≃≄≅≆≇|---|
|'-> |2249-58(16)|≉≊≋≌≍≎≏≐≑≒≓≔≕≖≗≘|---------------|
|'-> |225A-5F(06)|≚≛≜≝≞≟|-------------------------|
|'-> |2262-63(02)|≢≣|-----------------------------|
|'-> |2266-81(28)|≦≧≨≩≪≫≬≭≮≯≰≱≲≳≴≵≶≷≸≹≺≻≼≽≾≿⊀⊁|---|
|'-> |2284-94(17)|⊄⊅⊆⊇⊈⊉⊊⊋⊌⊍⊎⊏⊐⊑⊒⊓⊔|--------------|
|'-> |2296-96(01)|⊖|------------------------------|
|'-> |2298-B7(32)|⊘⊙⊚⊛⊜⊝⊞⊟⊠⊡⊢⊣⊤⊥⊦⊧⊨⊩⊪⊫⊬⊭⊮⊯⊰⊱⊲⊳⊴⊵⊶⊷|
|'-> |22B8-D7(32)|⊸⊹⊺⊻⊼⊽⊾⊿⋀⋁⋂⋃⋄⋅⋆⋇⋈⋉⋊⋋⋌⋍⋎⋏⋐⋑⋒⋓⋔⋕⋖⋗|
|'-> |22D8-F1(26)|⋘⋙⋚⋛⋜⋝⋞⋟⋠⋡⋢⋣⋤⋥⋦⋧⋨⋩⋪⋫⋬⋭⋮⋯⋰⋱|-----|
|'-> |2300-01(02)|⌀⌁|-----------------------------|
|'-> |2303-0F(13)|⌃⌄⌅⌆⌇⌈⌉⌊⌋⌌⌍⌎⌏|------------------|
|'-> |2311-1F(15)|⌑⌒⌓⌔⌕⌖⌗⌘⌙⌚⌛⌜⌝⌞⌟|----------------|
|'-> |2322-28(07)|⌢⌣⌤⌥⌦⌧⌨|------------------------|
|'-> |232B-4A(32)|⌫⌬⌭⌮⌯⌰⌱⌲⌳⌴⌵⌶⌷⌸⌹⌺⌻⌼⌽⌾⌿⍀⍁⍂⍃⍄⍅⍆⍇⍈⍉⍊|
|'-> |234B-6A(32)|⍋⍌⍍⍎⍏⍐⍑⍒⍓⍔⍕⍖⍗⍘⍙⍚⍛⍜⍝⍞⍟⍠⍡⍢⍣⍤⍥⍦⍧⍨⍩⍪|
|'-> |236B-7B(17)|⍫⍬⍭⍮⍯⍰⍱⍲⍳⍴⍵⍶⍷⍸⍹⍺⍻|--------------|
|'-> |237D-9A(30)|⍽⍾⍿⎀⎁⎂⎃⎄⎅⎆⎇⎈⎉⎊⎋⎌⎍⎎⎏⎐⎑⎒⎓⎔⎕⎖⎗⎘⎙⎚|-|
|'-> |2440-4A(11)|⑀⑁⑂⑃⑄⑅⑆⑇⑈⑉⑊|--------------------|
|'-> |2501-01(01)|━|------------------------------|
|'-> |2503-0B(09)|┃┄┅┆┇┈┉┊┋|----------------------|
|'-> |250D-0F(03)|┍┎┏|----------------------------|
|'-> |2511-13(03)|┑┒┓|----------------------------|
|'-> |2515-17(03)|┕┖┗|----------------------------|
|'-> |2519-1B(03)|┙┚┛|----------------------------|
|'-> |251D-23(07)|┝┞┟┠┡┢┣|------------------------|
|'-> |2525-2B(07)|┥┦┧┨┩┪┫|------------------------|
|'-> |252D-33(07)|┭┮┯┰┱┲┳|------------------------|
|'-> |2535-3B(07)|┵┶┷┸┹┺┻|------------------------|
|'-> |253D-4F(19)|┽┾┿╀╁╂╃╄╅╆╇╈╉╊╋╌╍╎╏|------------|
|'-> |256D-7F(19)|╭╮╯╰╱╲╳╴╵╶╷╸╹╺╻╼╽╾╿|------------|
|'-> |2581-83(03)|▁▂▃|----------------------------|
|'-> |2585-87(03)|▅▆▇|----------------------------|
|'-> |2589-8B(03)|▉▊▋|----------------------------|
|'-> |258D-8F(03)|▍▎▏|----------------------------|
|'-> |2594-95(02)|▔▕|-----------------------------|
|'-> |25A2-A9(08)|▢▣▤▥▦▧▨▩|-----------------------|
|'-> |25AD-B1(05)|▭▮▯▰▱|--------------------------|
|'-> |25B3-B9(07)|△▴▵▶▷▸▹|------------------------|
|'-> |25BB-BB(01)|▻|------------------------------|
|'-> |25BD-C3(07)|▽▾▿◀◁◂◃|------------------------|
|'-> |25C5-C9(05)|◅◆◇◈◉|--------------------------|
|'-> |25CC-CE(03)|◌◍◎|----------------------------|
|'-> |25D0-D7(08)|◐◑◒◓◔◕◖◗|-----------------------|
|'-> |25DA-E5(12)|◚◛◜◝◞◟◠◡◢◣◤◥|-------------------|
|'-> |25E7-F7(17)|◧◨◩◪◫◬◭◮◯◰◱◲◳◴◵◶◷|--------------|
|'-> |2600-13(20)|☀☁☂☃☄★☆☇☈☉☊☋☌☍☎☏☐☑☒☓|-----------|
|'-> |2619-38(32)|☙☚☛☜☝☞☟☠☡☢☣☤☥☦☧☨☩☪☫☬☭☮☯☰☱☲☳☴☵☶☷☸|
|'-> |2639-39(01)|☹|------------------------------|
|'-> |263D-3F(03)|☽☾☿|----------------------------|
|'-> |2641-41(01)|♁|------------------------------|
|'-> |2643-5F(29)|♃♄♅♆♇♈♉♊♋♌♍♎♏♐♑♒♓♔♕♖♗♘♙♚♛♜♝♞♟|--|
|'-> |2661-62(02)|♡♢|-----------------------------|
|'-> |2664-64(01)|♤|------------------------------|
|'-> |2667-69(03)|♧♨♩|----------------------------|
|'-> |266C-71(06)|♬♭♮♯♰♱|-------------------------|
|'-> |FB00-00(01)|ff|------------------------------|
|'-> |FB03-06(04)|ffifflſtst|---------------------------|
|'-> |FB13-17(05)|ﬓﬔﬕﬖﬗ|--------------------------|
|'-> |FE20-23(04)|︠︡︢︣|---------------------------|
|'-> |FFF9-FC(04)||---------------------------|
MES-3KS is defined as:
>00 20-7E A0-FF
>01 00-81 8B-8C 8F 92 95 9A-9B 9E-9F A2-A3 A6 AA-AB B5-BB BE-CC D5-D6 DE-F7
>01 FA-FF
>02 00-1F 24-27 2A-33 50-AD B0-EE
>03 00-4E 60-62 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D0-D7 DA-F3
>04 00-86 88-89 8C-8F 90-C4 C7-C8 CB-CC D0-ED EE-F5 F8-F9
>05 31-56 59-5F 61-87 89-8A
>10 D0-F6 FB
>1E 00-9B F2-F3
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB DD-EF
>1F F2-F4 F6-FE
>20 00-46 48-4D 6A-70 74-8E A0-AF D0-E3
>21 00-3A 53-83 90-F3
>22 00-F1
>23 00-7B 7D-9A
>24 40-4A
>25 00-95 A0-F7
>26 00-13 19-6F 70-71
>FB 00-06 13-17
>FE 20-23
>FF F9-FD
>#2671
Here are the characters that are missing from MES-3KS but are included in
MES-3B:
|MES-3B |0182-8A(09)|ƂƃƄƅƆƇƈƉƊ|----------------------|
|'-> |018D-8E(02)|ƍƎ|-----------------------------|
|'-> |0190-91(02)|ƐƑ|-----------------------------|
|'-> |0193-94(02)|ƓƔ|-----------------------------|
|'-> |0196-99(04)|ƖƗƘƙ|---------------------------|
|'-> |019C-9D(02)|ƜƝ|-----------------------------|
|'-> |01A0-A1(02)|Ơơ|-----------------------------|
|'-> |01A4-A5(02)|Ƥƥ|-----------------------------|
|'-> |01A7-A9(03)|ƧƨƩ|----------------------------|
|'-> |01AC-B4(09)|ƬƭƮƯưƱƲƳƴ|----------------------|
|'-> |01BC-BD(02)|Ƽƽ|-----------------------------|
|'-> |01CD-D4(08)|ǍǎǏǐǑǒǓǔ|-----------------------|
|'-> |01D7-DD(07)|ǗǘǙǚǛǜǝ|------------------------|
|'-> |01F8-F9(02)|Ǹǹ|-----------------------------|
|'-> |0222-23(02)|Ȣȣ|-----------------------------|
|'-> |0228-29(02)|Ȩȩ|-----------------------------|
|'-> |1EA0-BF(32)|ẠạẢảẤấẦầẨẩẪẫẬậẮắẰằẲẳẴẵẶặẸẹẺẻẼẽẾế|
|'-> |1EC0-DF(32)|ỀềỂểỄễỆệỈỉỊịỌọỎỏỐốỒồỔổỖỗỘộỚớỜờỞở|
|'-> |1EE0-F1(18)|ỠỡỢợỤụỦủỨứỪừỬửỮữỰự|-------------|
|'-> |1EF4-F9(06)|ỴỵỶỷỸỹ|-------------------------|
MES-3B is defined as:
>00 20-7E A0-FF
>01 00-FF
>02 00-1F 22-33 50-AD B0-EE
>03 00-4E 60-62 74-75 7A 7E 84-8A 8C 8E-A1 A3-CE D0-D7 DA-F3
>04 00-86 88-89 8C-C4 C7-C8 CB-CC D0-F5 F8-F9
>05 31-56 59-5F 61-87 89-8A
>10 D0-F6 FB
>1E 00-9B A0-F9
>1F 00-15 18-1D 20-45 48-4D 50-57 59 5B 5D 5F-7D 80-B4 B6-C4 C6-D3 D6-DB DD-EF
>1F F2-F4 F6-FE
>20 00-46 48-4D 6A-70 74-8E A0-AF D0-E3
>21 00-3A 53-83 90-F3
>22 00-F1
>23 00-7B 7D-9A
>24 40-4A
>25 00-95 A0-F7
>26 00-13 19-71
>FB 00-06 13-17
>FE 20-23
>FF F9-FD
>#2819
Some unrelated subsets:
~~~~~~~~~~~~~~~~~~~~~~~
There are also subsets used by Adobe to define glyph names, which are AGL and
AGLFN. Not shown here (yet), but (for now) only their definition is provided
for reference:
AGLFN:
>00 20-7E A1-AC AE-B1 B4-B4 B6-B8 BA-FF
>01 00-7F 92-92 A0-A1 AF-B0 E6-E7 FA-FF
>02 18-19 BC-BD C6-C7 D8-DD
>03 00-01 03-03 09-09 23-23 84-8A 8C-8C 8E-A1 A3-CE D1-D2 D5-D6
>04 01-0C 0E-4F 51-5C 5E-5F 62-63 72-75 90-91 D9-D9
>05 B0-B9 BB-C3 D0-EA F0-F2
>06 0C-0C 1B-1B 1F-1F 21-3A 40-52 60-6A 6D-6D 79-79 7E-7E 86-86 88-88 91-91
>06 98-98 A4-A4 AF-AF BA-BA D2-D2 D5-D5
>1E 80-85 F2-F3
>20 0C-0F 12-15 17-1E 20-22 24-26 2C-2E 30-30 32-33 39-3A 3C-3C 44-44 A1-A1
>20 A3-A4 A7-A7 AA-AC
>21 05-05 11-11 13-13 16-16 18-18 1C-1C 1E-1E 22-22 2E-2E 35-35 53-54 5B-5E
>21 90-95 A8-A8 B5-B5 D0-D4
>22 00-00 02-03 05-05 07-09 0B-0B 0F-0F 11-12 17-17 1A-1A 1D-20 27-2B 34-34
>22 3C-3C 45-45 48-48 60-61 64-65 82-84 86-87 95-95 97-97 A5-A5 C5-C5
>23 02-02 10-10 20-21 29-2A
>25 00-00 02-02 0C-0C 10-10 14-14 18-18 1C-1C 24-24 2C-2C 34-34 3C-3C 50-6C
>25 80-80 84-84 88-88 8C-8C 90-93 A0-A1 AA-AC B2-B2 BA-BA BC-BC C4-C4 CA-CB
>25 CF-CF D8-D9 E6-E6
>26 3A-3C 40-40 42-42 60-60 63-63 65-66 6A-6B
>#?835
AGL:
>00 01-7F A0-FF
>01 00-F5 FA-FF
>02 00-19 50-61 63-69 6B-73 75-75 77-7F 81-8E 90-98 9A-9B 9D-9E A0-A8 B0-B2
>02 B4-DE E0-E0 E3-E9
>03 00-25 27-45 60-61 74-75 7A-7A 7E-7E 84-8A 8C-8C 8E-A1 A3-CE D0-D6 DA-DA
>03 DC-DC DE-DE E0-E0 E2-F3
>04 01-0C 0E-4F
>05 31-87 89-89 90-C4 C7-C8 CB-CC D0-EB EE-F5 F8-F9
>06 0C-0C 1B-1B 1F-1F 21-3A 40-52 60-6D 79-79 7E-7E 86-86 88-88 91-91 98-98
>06 A4-A4 AF-AF BA-BA C1-C1 D1-D2 D5-D5 F0-F9
>09 01-03 05-39 3C-4D 50-54 58-70 81-83 85-8C 8F-90 93-A8 AA-B0 B2-B2 B6-B9
>09 BC-BC BE-C4 C7-C8 CB-CD D7-D7 DC-DD DF-E3 E6-FA
>0A 02-02 05-0A 0F-10 13-28 2A-30 32-32 35-36 38-39 3C-3C 3E-42 47-48 4B-4D
>0A 59-5C 5E-5E 66-74 81-83 85-8B 8D-8D 8F-91 93-A8 AA-B0 B2-B3 B5-B9 BC-BC
>0A BE-C5 C7-C9 CB-CD D0-D0 E0-E0 E6-EF
>0E 01-3A 3F-5B
>1E 00-9B A0-F9
>20 02-02 0B-10 12-1E 20-22 24-26 2C-2E 30-30 32-33 35-35 39-3C 3E-3E 42-42
>20 44-44 70-70 74-7A 7C-89 8D-8E A1-A4 A7-A7 A9-AC
>21 03-03 05-05 09-09 11-11 13-13 16-16 18-18 1C-1C 1E-1E 21-22 26-26 2B-2B
>21 2E-2E 35-35 53-54 5B-5E 60-6B 70-7B 90-99 A8-A8 B5-B5 BC-BC C0-C0 C4-C6
>21 CD-CD CF-D4 DE-EA
>22 00-00 02-03 05-09 0B-0C 0F-0F 11-13 15-15 17-17 19-1A 1D-20 23-23 25-2C
>22 2E-2E 34-37 3C-3D 43-43 45-45 48-48 4C-4C 50-53 60-62 64-67 6A-6B 6E-73
>22 76-77 79-7B 80-87 8A-8B 95-97 99-99 A3-A5 BF-BF C5-C5 CE-CF DA-DB EE-EE
>23 02-03 05-05 10-10 12-12 18-18 20-21 25-27 29-2B
>24 23-23 60-E9
>25 00-00 02-02 0C-0C 10-10 14-14 18-18 1C-1C 24-24 2C-2C 34-34 3C-3C 50-6C
>25 80-80 84-84 88-88 8C-8C 90-93 A0-A1 A3-AC B2-B7 B9-BA BC-BD BF-C1 C3-C4
>25 C6-CC CE-D1 D8-D9 E2-E6 EF-EF
>26 05-06 0E-0F 1C-1F 2F-2F 3A-3C 40-42 60-6D 6F-6F
>27 13-13 8A-92 9E-9E
>30 00-19 1C-1E 20-29 36-36 41-94 9B-9E A1-FE
>31 05-29 31-8E
>32 00-1C 20-40 42-43 60-7B 7F-7F 8A-90 94-94 96-96 98-99 9D-9E A3-A9
>33 00-00 03-03 05-05 0D-0D 14-16 18-18 1E-1E 22-23 26-27 2A-2B 31-31 33-33
>33 36-36 39-39 3B-3B 42-42 47-47 49-4A 4D-4E 51-51 57-57 7B-CB CD-D6 D8-D8
>33 DB-DD
>53 44-44
>F6 BE-C0 C3-FF
>F7 21-21 24-24 26-26 30-39 3F-3F 60-7A A1-A2 A8-A8 AF-AF B4-B4 B8-B8 BF-BF
>F7 E0-F6 F8-FF
>F8 84-99 E5-FF
>FB 00-04 1F-20 2A-36 38-3C 3E-3E 40-41 43-44 46-4F 57-59 67-69 6B-6D 7B-7D
>FB 89-89 8B-8B 8D-8D 93-95 9F-9F A4-A5 A7-A9 AF-AF
>FC 08-08 0B-0C 0E-0E 48-48 4B-4B 4E-4E 58-58 5E-62 6D-6D 73-73 8D-8D 94-94
>FC 9F-9F A1-A2 A4-A4 C9-CC D1-D2 D5-D5 DD-DD
>FD 3E-3F 88-88 F2-F2 FA-FA
>FE 30-44 49-50 52-52 54-55 59-5F 61-66 69-6B 82-82 84-84 86-86 88-88 8A-8C
>FE 8E-8E 90-92 94-94 96-98 9A-9C 9E-A0 A2-A4 A6-A8 AA-AA AC-AC AE-AE B0-B0
>FE B2-B4 B6-B8 BA-BC BE-C0 C2-C4 C6-C8 CA-CC CE-D0 D2-D4 D6-D8 DA-DC DE-E0
>FE E2-E4 E6-E8 EA-EC EE-EE F0-F0 F2-FC FF-FF
>FF 01-5E 61-9F E0-E1 E3-E3 E5-E6
>#?3548
import java.io.*;
import java.util.regex.*;
public class SubsetTextVerifier {
static String line;
static int lineNumber;
public static void main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("C:\\Users\\Michi\\Desktop\\subsets.txt"));
boolean[][] seen = new boolean[256][], nostars = new boolean[256][], current = new boolean[256][];
Pattern rulePattern = Pattern.compile("([0-9A-F]{4}-[0-9A-F]{2}(?:,[0-9A-F]{2}-[0-9A-F]{2})*)\\(([0-9]{2})\\)");
Pattern definitionPattern = Pattern.compile(">[0-9A-F]{2}( [0-9A-F]{2}(-[0-9A-F]{2})?)+");
while ((line = br.readLine()) != null) {
lineNumber++;
if (line.startsWith("|")) {
if (line.length() != 78)
fail("Invalid line length " + line.length());
String[] parts = line.split("\\|", 4);
boolean star = parts[1].contains("(*)");
Matcher m = rulePattern.matcher(parts[2]);
if (!m.matches())
fail("Invalid rule: '" + parts[2] + "'");
int count = Integer.parseInt(m.group(2));
StringBuilder chars = new StringBuilder();
String ranges = m.group(1);
int base = Integer.parseInt(ranges.substring(0, 2), 16) * 0x100;
for (int i = 2; i < ranges.length(); i += 6) {
int from = Integer.parseInt(ranges.substring(i, i + 2), 16);
int to = Integer.parseInt(ranges.substring(i + 3, i + 5), 16);
for (int j = from; j <= to; j++) {
char ch = (char) (base + j);
chars.append(ch);
addChar(seen, ch);
if (!star)
addChar(nostars, ch);
}
}
if (chars.length() != count)
fail("Invalid count: " + count + " (should be " + chars.length() + ")");
chars.append('|');
while (chars.length() < 32)
chars.append('-');
if (chars.length() < 33)
chars.append('|');
if (!chars.toString().equals(parts[3]))
fail("Invalid character list '" + parts[3] + "' should be '" + chars + "'");
} else if (line.startsWith(">#")) {
boolean[][] check = seen;
if (line.charAt(2) == '*') {
check = nostars;
line = line.substring(1);
} else if (line.charAt(2) == '?') {
check = current;
line = line.substring(1);
}
int count = Integer.parseInt(line.substring(2));
for (int i = 0; i < check.length; i++) {
if (check[i] == null ^ current[i] == null) {
fail("U+" + Integer.toHexString(i) + "xx is missing from " + (check[i] != null ? "rules" : "definitions"));
}
if (check[i] == null)
continue;
for (int j = 0; j < check[i].length; j++) {
if (check[i][j] != current[i][j])
fail("U+" + Integer.toHexString(i * 0x100 + j) + " is missing from " + (check[i][j] ? "rules" : "definitions"));
if (check[i][j])
count--;
}
}
if (count != 0)
fail("Count off by " + count);
current = new boolean[256][];
} else if (line.startsWith(">")) {
if (!definitionPattern.matcher(line).matches())
fail("Invalid definition");
int base = Integer.parseInt(line.substring(1, 3), 16) * 0x100;
for (int i = 4; i < line.length(); i += 3) {
int from = Integer.parseInt(line.substring(i, i + 2), 16), to = from;
if (i + 2 < line.length() && line.charAt(i + 2) == '-') {
i += 3;
to = Integer.parseInt(line.substring(i, i + 2), 16);
}
for (int j = from; j <= to; j++) {
addChar(current, (char) (base + j));
}
}
}
}
br.close();
}
private static void addChar(boolean[][] flags, char ch) throws IOException {
if (flags[ch >> 8] == null)
flags[ch >> 8] = new boolean[256];
if (flags[ch >> 8][ch & 0xFF])
fail("Add char twice: U+" + Integer.toHexString(ch));
flags[ch >> 8][ch & 0xFF] = true;
}
private static IOException fail(String message) throws IOException {
throw new IOException("In line " + lineNumber + ": " + line + "\r\n" + message);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment