Skip to content

Instantly share code, notes, and snippets.

@aembleton
Forked from vdavez/docx2md.md
Last active May 17, 2023 07:04
Show Gist options
  • Star 81 You must be signed in to star a gist
  • Fork 23 You must be signed in to fork a gist
  • Save aembleton/1eb889bc443996a508df to your computer and use it in GitHub Desktop.
Save aembleton/1eb889bc443996a508df to your computer and use it in GitHub Desktop.
Convert a Word Document into MD

Converting a Word Document to Markdown in One Move

The Problem

A lot of important government documents are created and saved in Microsoft Word (*.docx). But Microsoft Word is a proprietary format, and it's not really useful for presenting documents on the web. So, I wanted to find a way to convert a .docx file into markdown.

Installing Pandoc

On a mac you can use homebrew by running the command brew install pandoc.

The Solution

As it turns out, there are several open-source tools that allow for conversion between file types. Pandoc is one of them, and it's powerful. In fact, pandoc's website says "If you need to convert files from one markup format into another, pandoc is your swiss-army knife." Pandoc can convert from markdown into .docx, and it also works in the other direction.

Example

Say you have the Council Rules in a Word Document named "test.docx." (For a real-life example, visit http://github.com/vzvenyach/Council_Rules/). Now, you run the following at the command line:

pandoc -f docx -t markdown -o test.md test.docx

Out is a beautiful markdown file. Admittedly, there's a bit of junk at the top with the Table of Contents. I deleted this when I rendered it nicely with strapdown.js. In the end, here's my nicely rendered version of the Rules.

@TinasheMzondiwa
Copy link

Thank you for this!

@tyfyyhs
Copy link

tyfyyhs commented Dec 20, 2019

Thank you!!!!!!

@mateja82
Copy link

Doesnt work too good with Tables, all other stuff is great!

@dor2000
Copy link

dor2000 commented Jun 23, 2020

I am new to markdown and trying to import a Word document to an outline webapp (RemNote) that imports markdowns.
A question: If the Word document has hierarchical headings H1, H2, H3, does the markdown document comes with headings too? Or does in come indented (the text between two H1 indented below, the text below two H2 with a double indentation, etc)?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment