Skip to content

Instantly share code, notes, and snippets.

@davemenninger
Created July 17, 2013 18:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save davemenninger/6022994 to your computer and use it in GitHub Desktop.
Save davemenninger/6022994 to your computer and use it in GitHub Desktop.
Uses tesseract, hocr2pdf, and pdfconcat to build a OCR'ed ( searchable ) PDF from a dir full of tif files. Inteded use is with the output of a diybookscanner and ScanTailor.
#!/bin/bash
for img in *.tif; do tesseract $img $img hocr; done
for img in *.tif; do hocr2pdf -i $img -o $img.pdf < $img.html; done
pdfconcat -o merged.pdf ./*.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment