Skip to content

Instantly share code, notes, and snippets.

@eye9poob
Created March 18, 2018 18:23
Show Gist options
  • Save eye9poob/e24fe8ca3999835fc9c7861dc62d1bce to your computer and use it in GitHub Desktop.
Save eye9poob/e24fe8ca3999835fc9c7861dc62d1bce to your computer and use it in GitHub Desktop.
Finds text in an image and puts both the image and the text into a PDF file
#!/bin/bash
# by ..:: crazyjunkie ::..
#Finds text in an image and puts
#both the image and the text into a PDF file
#requirements
#apt-get install xpdf tesseract enscript ghostscript poppler-utils
img="$1"
base="$(echo "$img"|cut -d\. -f1)"
echo "Working with $base..."
#image to pdf
convert "$img" "${base}_1.pdf"
#find text in img
tesseract "$img" -|sed '/^\s*$/d' > "$base.txt"
#pandoc "$base.txt" -o "${base}_2.pdf"
enscript -p "${base}.ps" "${base}.txt"
ps2pdf "${base}.ps" "${base}_2.pdf"
#create final PDF
pdfunite "${base}_1.pdf" "${base}_2.pdf" "${base}.pdf"
#clean up
rm "${base}.ps" "${base}.txt" "${base}_1.pdf" "${base}_2.pdf"
#display output
xpdf "${base}.pdf"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment