davemenninger/gist:6022994

Created July 17, 2013 18:12

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/davemenninger/6022994.js"></script>
Save davemenninger/6022994 to your computer and use it in GitHub Desktop.

Download ZIP

Uses tesseract, hocr2pdf, and pdfconcat to build a OCR'ed ( searchable ) PDF from a dir full of tif files. Inteded use is with the output of a diybookscanner and ScanTailor.

Raw

gistfile1.sh

	#!/bin/bash

	for img in *.tif; do tesseract $img $img hocr; done

	for img in *.tif; do hocr2pdf -i $img -o $img.pdf < $img.html; done

	pdfconcat -o merged.pdf ./*.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment