Skip to content

Instantly share code, notes, and snippets.

Last active April 5, 2021 12:04
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save arthurattwell/ea6fa1764f989398f659ab619b654e1f to your computer and use it in GitHub Desktop.
Save arthurattwell/ea6fa1764f989398f659ab619b654e1f to your computer and use it in GitHub Desktop.
Batch file to convert HTML files to Word docx with Pandoc
:: This batch file converts HTML files in a folder to docx.
:: It requires Pandoc, and a list of files to convert
:: named file-list, in which each file is on a separate line,
:: and contains no spaces in the filename.
:: Don't show these commands to the user
@ECHO off
:: Set the title of the window
TITLE Convert html to docx
:: This thing that's necessary.
Setlocal enabledelayedexpansion
:: What're we doing?
ECHO Converting to .docx...
:: Loop through the list of files in file-list
:: and convert them each from .html to .docx.
:: We end up with the same filenames,
:: with .docx extensions appended.
FOR /F "tokens=*" %%F IN (file-list) DO (
pandoc %%F -f html -t docx -s -o %%F.docx
:: What are we doing next?
ECHO Fixing file extensions...
:: What are we finding and replacing?
SET find=.html
SET replace=
:: Loop through all .docx files and remove the .html
:: from those filenames pandoc created.
FOR %%# in (.\*.docx) DO (
Set "File=%%~nx#"
Ren "%%#" "!File:%find%=%replace%!"
:: Whassup?
ECHO Done.
:: Let the user exit deliberately
SET exit=
SET /p exit=Hit return to exit...
IF "%repeat%"=="" GOTO:eof
GOTO exit
Copy link

@Pooja5757 Hmm, not sure, sorry. Is it possible that your file-list has a file extension, like file-list.txt, or that it's not in the same folder as the files you're converting? The script assumes no file extension.

Copy link

Hey @arthurattwell ..yes it has file ext .txt. it is in the same folder as the file to be converted. Not sure why system is unable to find the file

Copy link

@Pooja5757 Ah, the file must not have any file extension. Alternatively, you can change file-list to file-list.txt in the script.

Copy link

@arthurattwell i did try with file-list.txt : "pandoc: first: openBinaryFile: does not exist (No such file or directory)" :(

Copy link

Hi arthurattwell....Thank you for the code, now im able to convert html to doc....not sure what was the error but i re installed the pandoc and tried again and it worked

Copy link

But I faced one more issue - my html file was ANSI and not UTF-8 when i changed it, it worked....But I have many html files which are ANSI and not UTF-8 encoded, any idea how to export html to word with having ANSI encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment