Skip to content

Instantly share code, notes, and snippets.

@oliver-batey
Last active January 20, 2021 23:27
Show Gist options
  • Save oliver-batey/8bf497c7d9d76e999676c3b624d953dc to your computer and use it in GitHub Desktop.
Save oliver-batey/8bf497c7d9d76e999676c3b624d953dc to your computer and use it in GitHub Desktop.
How to use the common interface to parse different file types
import parse_file as dp
#define paths to test files
txt_path = 'test_txt.txt'
docx_path = 'test_docx.docx'
pdf_path = 'test_pdf.pdf'
html_path = 'test_html.html'
pptx_path = 'test_pptx.pptx'
file_paths = [txt_path,docx_path,pdf_path,html_path,pptx_path]
#instantiate a DocParser object
parser = dp.DocParser()
for path in file_paths:
#call the parse method on each file path and print the result
string = parser.parse(path)
print(f'Parsing {path}:\n{string}\n')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment