Skip to content

Instantly share code, notes, and snippets.

@dleicht
Created September 24, 2020 21:37
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dleicht/c291abf5f55d2438ce858c44d5cac1b4 to your computer and use it in GitHub Desktop.
Save dleicht/c291abf5f55d2438ce858c44d5cac1b4 to your computer and use it in GitHub Desktop.
Pandoc docx2html batch script with media extraction
# pandoc docx2html script v0.1
# ----------------------------
# This will use pandoc to convert all docx (including media) in the current folder to html.
# Pandoc will put any media in a "media" subfolder, because it's the docx file's design.
# Unfortunately, iterating over the files, pandoc will overwrite media files without any warning.
# Thus it is necessary to put them into separate folders. That's where the counter comes into play.
FILES=*.docx
COUNTER=1
for f in $FILES
do
filename="${f%.*}"
echo "CONVERTING: $f to $filename.html"
`pandoc "$f" -t html -o "$filename.html" --extract-media ./$COUNTER`
echo "DONE: $filename.html"
((COUNTER=COUNTER+1))
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment