Skip to content

Instantly share code, notes, and snippets.

@jarmitage
Last active October 13, 2015 09:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jarmitage/bd49881dab7a52d3f0e9 to your computer and use it in GitHub Desktop.
Save jarmitage/bd49881dab7a52d3f0e9 to your computer and use it in GitHub Desktop.
#!/bin/bash
# Split and convert the PDF with ImageMagick convert
convert -density 300 input.pdf -type Grayscale -compress lzw -background white +matte -depth 32 page_%05d.tif
# OCR the pages with Tesseract
for i in page_*.tif; do echo $i; tesseract $i $(basename $i .tif) pdf; done
# Join your individual PDF files into a single, searchable PDF with pdftk
pdftk page_*.pdf cat output merged.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment