Skip to content

Instantly share code, notes, and snippets.

@markpbaggett
Created March 7, 2024 19:21
Show Gist options
  • Save markpbaggett/a31a4d314cbf1f84cc491e6087d6d38c to your computer and use it in GitHub Desktop.
Save markpbaggett/a31a4d314cbf1f84cc491e6087d6d38c to your computer and use it in GitHub Desktop.
My Saxon Helper
#!/bin/bash
# Check if correct number of arguments are provided
if [ "$#" -ne 3 ]; then
echo "Usage: $0 input_xml output_hocr type"
exit 1
fi
input_xml="$1"
output_hocr="$2"
type="$3"
# Check if input XML file exists
if [ ! -f "$input_xml" ]; then
echo "Input XML file does not exist: $input_xml"
exit 1
fi
# Check and set XSL file based on the type
if [ "$type" = "altotohocr" ]; then
xsl_file="/Users/markbaggett/code/hOCR-to-ALTO/alto__hocr.xsl"
elif [ "$type" = "hocrtotxt" ]; then
xsl_file="/Users/markbaggett/code/hOCR-to-ALTO/hocr__text.xsl"
elif [ "$type" = "altototxt" ]; then
xsl_file="/Users/markbaggett/code/hOCR-to-ALTO/alto__text.xsl"
else
echo "Invalid type: $type"
exit 1
fi
# Check if XSL file exists
if [ ! -f "$xsl_file" ]; then
echo "XSL file does not exist: $xsl_file"
exit 1
fi
# Run the transformation using Saxon
java -jar ~/bin/saxon9he.jar "$input_xml" "$xsl_file" > "$output_hocr"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment