Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
XSLT to convert http://www.w3.org/Math/characters/unicode.xml into Python Dictionary
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="1.0">
<xsl:template match="/charlist">
<xsl:text>
unicode_to_latex = {
</xsl:text>
<xsl:for-each select="character">
<xsl:variable name="codepoint" select="./@id"/>
<xsl:if test="string-length(latex)&gt;1">
<xsl:text> u"\u</xsl:text><xsl:value-of select="substring($codepoint, 3)" /><xsl:text>": "</xsl:text><xsl:value-of select="replace(replace(latex, '\\', '\\\\'), '&quot;', '\\&quot;')"/><xsl:text>",
</xsl:text>
</xsl:if>
</xsl:for-each>
<xsl:text>}
</xsl:text>
</xsl:template>
</xsl:stylesheet>
gely commented Dec 21, 2016

FYI, this gives incorrect unicode chars for codepoints >= 0x10000 due to substring($codepoint, 3) removing a digit. For example:

u"\uD504": "\\mathfrak{A}",

should instead be:

u"\U0001D504": "\\mathfrak{A}",

It also fails for composed characters with more than one codepoint. I don't know enough XSLT to fix it, unfortunately. I may just have to use xml.etree.ElementTree to do this.

-Geoff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment