Skip to content

Instantly share code, notes, and snippets.

@csbailey5t
Created July 23, 2019 20:36
Show Gist options
  • Save csbailey5t/6a87c1e6ecb215b1bd96a3ec43784dd4 to your computer and use it in GitHub Desktop.
Save csbailey5t/6a87c1e6ecb215b1bd96a3ec43784dd4 to your computer and use it in GitHub Desktop.
Convert docx files in a directory to plain text
import glob
import docx
def get_text(fn):
doc = docx.Document(fn)
fulltext = []
for para in doc.paragraphs:
fulltext.append(para.text)
return "\n".join(fulltext)
fns = glob.glob("*.docx")
for fn in fns:
text = get_text(fn)
fn_core = fn.split(".")[0]
with open('plain_text/{}.txt'.format(fn_core), 'w') as g:
g.write(text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment