Created
September 24, 2020 21:37
-
-
Save dleicht/c291abf5f55d2438ce858c44d5cac1b4 to your computer and use it in GitHub Desktop.
Pandoc docx2html batch script with media extraction
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# pandoc docx2html script v0.1 | |
# ---------------------------- | |
# This will use pandoc to convert all docx (including media) in the current folder to html. | |
# Pandoc will put any media in a "media" subfolder, because it's the docx file's design. | |
# Unfortunately, iterating over the files, pandoc will overwrite media files without any warning. | |
# Thus it is necessary to put them into separate folders. That's where the counter comes into play. | |
FILES=*.docx | |
COUNTER=1 | |
for f in $FILES | |
do | |
filename="${f%.*}" | |
echo "CONVERTING: $f to $filename.html" | |
`pandoc "$f" -t html -o "$filename.html" --extract-media ./$COUNTER` | |
echo "DONE: $filename.html" | |
((COUNTER=COUNTER+1)) | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment