Skip to content

Instantly share code, notes, and snippets.

@kleinernik
Created May 12, 2020 13:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kleinernik/933f8202f292edc46464985b374f5e28 to your computer and use it in GitHub Desktop.
Save kleinernik/933f8202f292edc46464985b374f5e28 to your computer and use it in GitHub Desktop.
#!/bin/bash
if [ -z "$1" ]
then
echo "No argument supplied"
exit 1
fi
if [[ -r $1 ]]; then
echo "$1 exist"
file_path=$(realpath -e "$1")
else
echo "$1 can't be read"
exit 1
fi
tempdir=$(mktemp -d)
current_dir=$PWD
echo "Created tempdir to work in: $tempdir"
echo "Converting $file_path"
cd $tempdir
pdftoppm -r 300 "$file_path" temp
digits_last_out_file=$(ls -1 | awk 'END{print}' | sed -n 's/temp-\(.*\)\.ppm/\1/p')
size=${#digits_last_out_file}
unpaper -v temp-%0${size}d.ppm utemp-%0${size}d.ppm
out_filename=$(echo "$file_path" | sed 's/.pdf/_ocr/'i )
echo "$(find utemp-*.ppm)" | tesseract -l deu - "$out_filename" pdf
cd $current_dir
rm -r $tempdir
echo "done"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment