Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jsoma/430e3fc6b70aa1d91640dd563d8f6128 to your computer and use it in GitHub Desktop.
Save jsoma/430e3fc6b70aa1d91640dd563d8f6128 to your computer and use it in GitHub Desktop.
How to use pdfminer.six, PaddleOCR and OpenAI's GPT to OCR and extract text from PDFs and save them into a CSV (or Excel) file for later analysis.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment