Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Batch file to convert HTML files to Word docx with Pandoc
:: This batch file converts HTML files in a folder to docx.
:: It requires Pandoc, and a list of files to convert
:: named file-list, in which each file is on a separate line,
:: and contains no spaces in the filename.
::
:: Don't show these commands to the user
@ECHO off
:: Set the title of the window
TITLE Convert html to docx
:: This thing that's necessary.
Setlocal enabledelayedexpansion
:: What're we doing?
ECHO Converting to .docx...
:: Loop through the list of files in file-list
:: and convert them each from .html to .docx.
:: We end up with the same filenames,
:: with .docx extensions appended.
FOR /F "tokens=*" %%F IN (file-list) DO (
pandoc %%F -f html -t docx -s -o %%F.docx
)
:: What are we doing next?
ECHO Fixing file extensions...
:: What are we finding and replacing?
SET find=.html
SET replace=
:: Loop through all .docx files and remove the .html
:: from those filenames pandoc created.
FOR %%# in (.\*.docx) DO (
Set "File=%%~nx#"
Ren "%%#" "!File:%find%=%replace%!"
)
:: Whassup?
ECHO Done.
:: Let the user exit deliberately
:exit
SET exit=
SET /p exit=Hit return to exit...
IF "%repeat%"=="" GOTO:eof
GOTO exit
@arthurattwell
Copy link
Author

arthurattwell commented Mar 30, 2021

@Pooja5757 Ah, the file must not have any file extension. Alternatively, you can change file-list to file-list.txt in the script.

@Pooja5757
Copy link

Pooja5757 commented Mar 30, 2021

@arthurattwell i did try with file-list.txt : "pandoc: first: openBinaryFile: does not exist (No such file or directory)" :(

@Pooja5757
Copy link

Pooja5757 commented Apr 5, 2021

Hi arthurattwell....Thank you for the code, now im able to convert html to doc....not sure what was the error but i re installed the pandoc and tried again and it worked

@Pooja5757
Copy link

Pooja5757 commented Apr 5, 2021

But I faced one more issue - my html file was ANSI and not UTF-8 encoded..so when i changed it, it worked....But I have many html files which are ANSI and not UTF-8 encoded, any idea how to export html to word with having ANSI encoding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment