Skip to content

Instantly share code, notes, and snippets.

@hebrides
Last active December 11, 2022 02:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hebrides/4a56f3572b373342d094739ad9e96f1e to your computer and use it in GitHub Desktop.
Save hebrides/4a56f3572b373342d094739ad9e96f1e to your computer and use it in GitHub Desktop.
PDF to CSV
#!bin/bash
# A bit of a hack... useful for old bank docs and PDFs as a
# first step to converting to CSV before importing to a
# spreadsheet or accounting application
# Requires Imagemagick and Tesseract:
# brew install imagemagick
# brew install tesseract
# PDF -> PNG -> TXT
convert -density 150 THE_PDF_FILE.pdf 2021-IMAGE_FILE-%04d.png
l *.png | xargs -n 1 -I FILE tesseract FILE - -l eng --psm 6 >> output.txt
rm -v 2021-IMAGE_FILE-*.png
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment