Last active
December 11, 2022 02:47
-
-
Save hebrides/4a56f3572b373342d094739ad9e96f1e to your computer and use it in GitHub Desktop.
PDF to CSV
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!bin/bash | |
# A bit of a hack... useful for old bank docs and PDFs as a | |
# first step to converting to CSV before importing to a | |
# spreadsheet or accounting application | |
# Requires Imagemagick and Tesseract: | |
# brew install imagemagick | |
# brew install tesseract | |
# PDF -> PNG -> TXT | |
convert -density 150 THE_PDF_FILE.pdf 2021-IMAGE_FILE-%04d.png | |
l *.png | xargs -n 1 -I FILE tesseract FILE - -l eng --psm 6 >> output.txt | |
rm -v 2021-IMAGE_FILE-*.png |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment