Skip to content

Instantly share code, notes, and snippets.

@JamesSaxon
Created December 12, 2020 17:37
Show Gist options
  • Save JamesSaxon/c75bf3aad5ed33d07761873f6ac626c7 to your computer and use it in GitHub Desktop.
Save JamesSaxon/c75bf3aad5ed33d07761873f6ac626c7 to your computer and use it in GitHub Desktop.
#!/bin/bash
mkdir -p images/
if [[ $# -ne 1 ]]; then
echo provide the pdf file as a single argument.
exit 1
fi
f=$1
echo $f
o=$(echo $f | sed "s/.pdf//")
pdfimages $f images/$o
ls images/$o-*ppm > tmp
echo "debug_file /dev/null" > quiet.conf
tesseract tmp $o.ocr -l eng --dpi 200 pdf txt quiet.conf
rm tmp images/$o-*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment