Skip to content

Instantly share code, notes, and snippets.

@ssp
Last active June 29, 2022 02:15
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ssp/4511872 to your computer and use it in GitHub Desktop.
Save ssp/4511872 to your computer and use it in GitHub Desktop.
XSL to determine all tag paths in an XML file.
<?xml version="1.0"?>
<!--
Stylesheet to list all tag-name paths in an XML file, including attributes.
2013 Sven-S. Porst <porst@sub.uni-goettingen.de>
-->
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" />
<xsl:template match="node()">
<xsl:for-each select="ancestor-or-self::*">
<xsl:value-of select="concat('/',name(.))"/>
</xsl:for-each>
<xsl:text>&#xA;</xsl:text>
<xsl:apply-templates select="*|@*"/>
</xsl:template>
<xsl:template match="@*">
<xsl:for-each select="ancestor::*">
<xsl:value-of select="concat('/',name(.))"/>
</xsl:for-each>
<xsl:text>/@</xsl:text>
<xsl:value-of select="name(.)"/>
<xsl:text>&#xA;</xsl:text>
</xsl:template>
</xsl:transform>

Create a sorted list of occurring tag paths:

xsltproc xmlpaths.xsl input.xml | sort --unique

… with a tag count

xsltproc xmlpaths.xsl input.xml | sort | unique -c

Sorted list of tag paths of all XML files within the current folder (slow)

find . -name "*.xml" | xargs xsltproc ~/Desktop/xmlpaths.xsl | sort --unique

Other strategy, extract and store tag names file-by-file, use parallel processing, unique tags from each file first, then concatenate and unique again.

mkdir /tmp/xsltproct find . -name ".xml" | xargs -P 8 -I FILE xsltproc --output /tmp/xslttmp/FILE.out xmlpaths.xsl FILE | sort --unique find /tmp/xslttmp/ -name ".out" | xargs -P 8 -I FILE sort --unique --output=FILE.uniq FILE find . -name "*.out.uniq" | xargs cat > cat.all sort --unique cat.all

@rivierasolutions
Copy link

Excellent work! You've made my day a lot easier :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment