Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Uses tesseract, hocr2pdf, and pdfconcat to build a OCR'ed ( searchable ) PDF from a dir full of tif files. Inteded use is with the output of a diybookscanner and ScanTailor.
#!/bin/bash
for img in *.tif; do tesseract $img $img hocr; done
for img in *.tif; do hocr2pdf -i $img -o $img.pdf < $img.html; done
pdfconcat -o merged.pdf ./*.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment