Skip to content

Instantly share code, notes, and snippets.

@m1roff
Created January 15, 2024 14:58
Show Gist options
  • Save m1roff/3879819059fb9d980be6a64c30e01b16 to your computer and use it in GitHub Desktop.
Save m1roff/3879819059fb9d980be6a64c30e01b16 to your computer and use it in GitHub Desktop.
Creating PDF from images with OCR
#!/bin/bash
# Creating PDF from images with OCR
#
# brew install tesseract tesseract-lang pdftk-java
#
mkdir -p pdf_output
for file in $(ls RU_*.jpg | awk -F'[_\.]' '{ printf "%s\t%s\n", $2, $0 }' | sort -k1,1n | cut -f2-); do
echo "Processing $file..."
filename=$(basename "$file" .jpg)
# Creating OCR
tesseract "$file" "pdf_output/$filename" -l rus+eng+deu pdf
done
echo "Merging PDF files..."
pdftk pdf_output/*.pdf cat output combined.pdf
echo "Finished: book created combined.pdf"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment