Skip to content

Instantly share code, notes, and snippets.

@stanrb
Last active April 30, 2018 14:53
Show Gist options
  • Save stanrb/bb34f581f14079bbeeb7f537db3ddf95 to your computer and use it in GitHub Desktop.
Save stanrb/bb34f581f14079bbeeb7f537db3ddf95 to your computer and use it in GitHub Desktop.
This Windows bat script converts all of the DOCX files in a directory to Git-flavored Markdown and extracts their images. It uses Pandoc. Wrap is properly preserved and it works with spaces in the filenames.
REM Windows Convert DOCX to MD by Stan Bogdanov - stanrb.com - @StanRB
REM This Windows bat script converts all of the DOCX files in a directory to Git-flavored Markdown and extracts their images. Wrap is properly preserved and it works with spaces in the filenames.
REM You can't change the name of the folder the images are extracted to (it will be called "media") because of the DOCX container itself.
REM Converting is done in the current directory and you should have the pandoc executable in there.
REM You can just double-click the bat file to execute it. The file ends with a command to keep the command window open in case you want to inspect output.
REM The script will create a new directory called "converted" and sub-directories in there for each file. This way, each subdirectory will properly contain the media folder with images and the markdown file itself. You can optionally change this by removing the second mkdir command and changing the extrat-media path.
REM %%~ni returns the filename of %%i and not the extension.
mkdir converted
for %%i in (*.docx) do mkdir "converted\%%~ni" && pandoc --wrap=preserve --extract-media="converted/%%~ni" -f docx -t gfm "%%~ni.docx" > "converted/%%~ni/%%~ni.md"
cmd /k
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment