Skip to content

Instantly share code, notes, and snippets.

Last active February 14, 2024 08:27
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save kueda/c02b9f3f5a0f03f41524 to your computer and use it in GitHub Desktop.
Save kueda/c02b9f3f5a0f03f41524 to your computer and use it in GitHub Desktop.
OS X bash script that turns a collection of images into an OCR'd PDF
# img2pdf
# OS X bash script that turns a collection of images into an OCR'd PDF
# Adapted from,
# where it was in turn adapted from
# from
# bash tut:
# Linux PDF,OCR:
# Dealing w/ alpha:
# Install
# brew install tesseract --HEAD
# brew install imagemagick
# brew install ghostscript
# chmod +x img2pdf
# Usage
# ./img2pdf *.gif
# If you have a mix of extensions:
# ./img2pdf *.{gif,jpeg}
echo Creating a searchable PDF for $y
x=`basename "$y"`
# process each page
for f in $@; do
echo $f
echo "\tConverting to TIFF..."
convert $f -background white -flatten +matte $f.tiff
echo "\tTesseract OCR..."
# echo "tesseract -l eng -psm 3 $f.tiff ${f%.*} pdf"
tesseract -l eng -psm 3 $f.tiff ${f%.*} pdf 1>/dev/null 2>&1
echo "\tCleanup..."
rm $f.tiff
rm ${f%.*}.txt
mv ${f%.*}.pdf $f.tmp.pdf
echo "Combining all pages into a single PDF..."
gs -dCompatibilityLevel=1.4 -dNOPAUSE -dQUIET -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=${name}.searchable.pdf *.tmp.pdf
rm *.tmp.pdf
echo "Created $name.searchable.pdf"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment