-
-
Save davecoutts/a6c377d754cf97008f28 to your computer and use it in GitHub Desktop.
# Convert Microsoft Word 'doc' files to 'docx' format by opening and | |
# saving Word files using win32com to automate Microsoft Word. | |
# | |
# The script walks a directory structure and converts all '.doc' files found. | |
# Original 'doc' and new 'docx' files are saved in the same directory. | |
# | |
# This Word automation method has been found to work where OFC.exe and | |
# wordconv.exe do not. | |
# | |
# Tested using Windows 7, Word 2013, python 2.7.10, pywin32-219.win-amd64-py2.7 | |
import os.path | |
import win32com.client | |
baseDir = 'E:\Docs' # Starting directory for directory walk | |
word = win32com.client.Dispatch("Word.application") | |
for dir_path, dirs, files in os.walk(baseDir): | |
for file_name in files: | |
file_path = os.path.join(dir_path, file_name) | |
file_name, file_extension = os.path.splitext(file_path) | |
if file_extension.lower() == '.doc': # | |
docx_file = '{0}{1}'.format(file_path, 'x') | |
if not os.path.isfile(docx_file): # Skip conversion where docx file already exists | |
print('Converting: {0}'.format(file_path)) | |
try: | |
wordDoc = word.Documents.Open(file_path, False, False, False) | |
wordDoc.SaveAs2(docx_file, FileFormat = 16) | |
wordDoc.Close() | |
except Exception: | |
print('Failed to Convert: {0}'.format(file_path)) | |
word.Quit() |
Hi everyone,
What value should I put for FileFormat to convert to ".doc" rather than ".docx" in the line below?
wordDoc.SaveAs2(docx_file, FileFormat = 16)
Assuming I make the proper changes to the code in other lines. Thank you.
Hi Asalim11
I have not tried it myself but the Microsoft reference below suggests file format '0'.
https://docs.microsoft.com/en-us/office/vba/api/word.wdsaveformat
I hope this helps.
Worked like a charm. Thank you.
Hi Asalim11
You are welcome.
After looking at the script in relation to your question I thought it needed a bit of an update.
If you look for 'microsoft_doc_converter.py' on https://gist.github.com/davecoutts you will now find an updated python3 version that handles Word and Excel conversions.
https://gist.github.com/davecoutts/0e981c3b5f765320561aa6ca78ddebd2
Thank you Dave. Highly appreciated.
This method to convert doc to docx format. Doc file table to convert docx table board line missing. Can you help
I have found the error.
You need to insert
file_path = os.path.abspath(file_path)
and also
docx_file = os.path.abspath(docx_file)
in line 26.
The full code would look like this: