Skip to content

Instantly share code, notes, and snippets.

@bencomp
Last active January 4, 2016 21:59
Show Gist options
  • Save bencomp/8684769 to your computer and use it in GitHub Desktop.
Save bencomp/8684769 to your computer and use it in GitHub Desktop.
XSLT conversion and test strategy
#!/bin/bash
# Generalised conversion and validation script for XSLT conversions:
# 1. Convert all XMLs in an input directory to new files in an output directory
# 2. Validate all output XMLs against an XML Schema
# 3. Summarise the validation results
# (4. Fix XSLT and re-run)
# Input XML files are in `./inputxml`
# Output XML files go in `./outputxml`
# stderr output of the conversion go in `./conversionlogs` (only non-empty files are kept)
# stderr output of the validation go in `./validationlogs`
# Filtered validation logs go in `./filteredvalidationlogs`
# Summary of validation errors is written to `./validation-errors.txt`
for filename in `ls inputxml`; do
# Convert XML
echo "Converting $filename..."
java -jar ~/Applications/saxon9he.jar -s:"inputxml/$filename" \
-xsl:"in-out.xsl" \
-o:"outputxml/$filename" 2> "conversionlogs/$filename.log"
[[ ! -s "conversionlogs/$filename.log" ]] && rm "conversionlogs/$filename.log"
# Validate XML
echo "Checking $filename..."
xmllint --noout --postvalid --schema "schema.xsd" \
"outputxml/$filename" 2> "validationlogs/$filename.log"
# Filter validation results (if you want to ignore certain validation errors)
sed -E -f "filtervalidation.txt" "validationlogs/$filename.log" > "filteredvalidationlogs/$filename.log"
done
# Filter all filtered validation logs: keep one line per error type, per file
sed -nE -e "s/^.*outputxml\/([0-9]+\.xml:)[0-9]+: /\1 /p" filteredvalidationlogs/*.xml.log | sort | uniq > validation-errors.txt
# Example set of sed commands to filter validation logs.
# Ignore the 'no DTD found' message; we're validating against an XML Schema
/no DTD found/d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment