Skip to content

Instantly share code, notes, and snippets.

@dhbradshaw
Created November 10, 2021 12:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dhbradshaw/614cf8e9a364b2725f44c8879cad65cb to your computer and use it in GitHub Desktop.
Save dhbradshaw/614cf8e9a364b2725f44c8879cad65cb to your computer and use it in GitHub Desktop.
Validate all the xml from an Excel xlsx file

Validate Excel .xlsx file xml using linux

Unzip

A .xlsx file is roughly a zipped bundle of xml files. To unzip them, just use unzip.

Here we send the unzipped contents of myfile.xlsx to the myfile_unzipped directory, which will be created on the fly.

unzip myfile.xlsx -d myfile_unzipped

Validate

Go to the directory into which the file was unzipped.

cd myfile_unzipped

Now use find to feed all xml files to xmllint.

find -type f -name "*.xml" -exec xmllint --noout {} \;

Any xml errors will be printed out with a message like this:

./broken_example.xml:2: parser error : EndTag: '</' not found
ntType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment