Skip to content

Instantly share code, notes, and snippets.

@mrampton
Created March 7, 2015 19:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mrampton/68322fa815aacc175877 to your computer and use it in GitHub Desktop.
Save mrampton/68322fa815aacc175877 to your computer and use it in GitHub Desktop.
bash script to convert a PDF of images into a searchable PDF
#!/bin/bash
if [ ""$#"" -ne 1 ]; then
echo "usage: ocrify <filename.pdf>"
exit 0
fi
FILENAME=$1
OCRFILE=${FILENAME//./_OCR.}
convert -density 300 $FILENAME -type Grayscale -compress lzw -background white +matte -depth 32 page_%05d.tif
for i in page_*.tif; do echo $i; tesseract $i $(basename $i .tif) pdf; done
pdftk page_*.pdf cat output $OCRFILE
rm -rf page_*{tif,pdf}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment