Skip to content

Instantly share code, notes, and snippets.

@nongio
Created December 10, 2018 07:08
Show Gist options
  • Save nongio/936b52294e0277108075cdaca856c46e to your computer and use it in GitHub Desktop.
Save nongio/936b52294e0277108075cdaca856c46e to your computer and use it in GitHub Desktop.
pdftk infile.pdf burst output $tmpdir/page_%03d.pdf
page=0
imagetype="png"
for file in $tmpdir/*.pdf
do
image=$file.$imagetype
convert -density $density -depth $depth $file $image
rm $file
page=`expr $page + 1`
tessoptions="--tessdata-dir "$tessdatadir" -l "$language" pdf"
tesseract $image $image $tessoptions
rm $image
done
pdftk $tmpdir/*.pdf cat output $tmpdir/tmp.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment