Skip to content

Instantly share code, notes, and snippets.

@isoboroff
Created June 28, 2023 20:39
Show Gist options
  • Save isoboroff/fee8edd1f1d63e8b781cdf20417cb626 to your computer and use it in GitHub Desktop.
Save isoboroff/fee8edd1f1d63e8b781cdf20417cb626 to your computer and use it in GitHub Desktop.
Convert Postscript files to PDF with embedded OCR
#!/bin/bash
psfile=$1
tmpfoo=`basename $0`
TMPDIR=`mktemp -d /tmp/${tempfoo}.XXXXXX` || exit 1
echo $TMPDIR
gs -o $TMPDIR/%05d.png -sDEVICE=png16m -r300 -dPDFFitPage=true $psfile
ls $TMPDIR/*png | parallel 'tesseract -l eng {} {.} pdf'
gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dFIXEDMEDIA -sPAPERSIZE=letter -dPDFFitPage -sOutputFile=${psfile/.ps/.pdf} $(ls $TMPDIR/*pdf)
rm -rf $TMPDIR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment