Skip to content

Instantly share code, notes, and snippets.

@jalbertbowden
Created February 12, 2021 21:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jalbertbowden/9e5b3e6bf5229f2a5a440555832be4c2 to your computer and use it in GitHub Desktop.
Save jalbertbowden/9e5b3e6bf5229f2a5a440555832be4c2 to your computer and use it in GitHub Desktop.
Scrape document.xml with Python's BeautifulSoup
from bs4 import BeautifulSoup
docx_form = 'form_example/word/document.xml'
infile = open(docx_form, 'r')
contents = infile.read()
soup = BeautifulSoup(contents, 'xml')
xps = soup.find_all('wps:txbx')
for xp in xps:
print(xp.get_text())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment