Skip to content

Instantly share code, notes, and snippets.

@mbohun
Forked from davecoutts/word_doc_to_docx.py
Last active April 20, 2023 05:48
Show Gist options
  • Save mbohun/bdb0792d058781e57e3a505a653fff99 to your computer and use it in GitHub Desktop.
Save mbohun/bdb0792d058781e57e3a505a653fff99 to your computer and use it in GitHub Desktop.
Convert Word 'doc' files to 'docx', 'pdf', other format using win32com to automate Microsoft Word
# Convert Microsoft Word 'doc' files to 'docx' format by opening and
# saving Word files using win32com to automate Microsoft Word.
#
# The script walks a directory structure and converts all '.doc' files found.
# Original 'doc' and new 'docx' files are saved in the same directory.
#
# This Word automation method has been found to work where OFC.exe and
# wordconv.exe do not.
#
# Tested using Windows 7, Word 2013, python 2.7.10, pywin32-219.win-amd64-py2.7
# We need to test this with 64bit arch, and Office 365
import os.path
import win32com.client
baseDir = 'E:\Docs' # Starting directory for directory walk
word = win32com.client.Dispatch("Word.application")
for dir_path, dirs, files in os.walk(baseDir):
for file_name in files:
file_path = os.path.join(dir_path, file_name)
file_name, file_extension = os.path.splitext(file_path)
if file_extension.lower() == '.doc': #
docx_file = '{0}{1}'.format(file_path, 'x')
if not os.path.isfile(docx_file): # Skip conversion where docx file already exists
print('Converting: {0}'.format(file_path))
try:
wordDoc = word.Documents.Open(file_path, False, False, False)
wordDoc.SaveAs2(docx_file, FileFormat = 16)
wordDoc.Close()
except Exception:
print('Failed to Convert: {0}'.format(file_path))
word.Quit()
@Abhisheknarsing
Copy link

Thanks man !

@mbohun
Copy link
Author

mbohun commented Apr 14, 2020

@Abhisheknarsing Depending on what you are trying to do, you could perhaps like the "pure"/"native" windows powershell version better:
https://gist.github.com/mbohun/bb2688e2f67fbda9b48703c516330e5c#test-script

ONLY you would replace the output file format from PDF (17) to whatever is the DOCX constant 16?, on the line:

$document.SaveAs([ref] $pdf_filename, [ref] 17)

@Anonymous6598
Copy link

When converting from py to exe, does it require terilminal?

@Anonymous6598
Copy link

Because docx2pdf requires terminal to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment