Skip to content

Instantly share code, notes, and snippets.

@Qman11010101
Created January 18, 2023 18:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Qman11010101/98b5f3234371c85ae1610e40d1a3b793 to your computer and use it in GitHub Desktop.
Save Qman11010101/98b5f3234371c85ae1610e40d1a3b793 to your computer and use it in GitHub Desktop.
Unicode XML Converter
import xml.etree.ElementTree as ET
import json
tree = ET.parse("ucd.all.flat.xml")
root = tree.getroot()
repertoire = root[1]
finaldict = {}
for c in repertoire:
attr = c.attrib
codepoint = attr.get("cp")
if codepoint == "" or codepoint == None:
continue
name = attr.get("na")
if name == "":
aliases = []
for a in c:
if a.attrib.get("type") == "control":
aliases.append(a.attrib.get("alias"))
name = ", ".join(aliases)
print("U+" + codepoint, name)
finaldict[codepoint] = name
with open("unicodedict.json", "w", encoding="utf-8") as f:
json.dump(finaldict, f, indent=2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment