Skip to content

Instantly share code, notes, and snippets.

@ablwr

ablwr/make_ocr.sh

Last active May 14, 2020
Embed
What would you like to do?
make_ocr.sh
#!/bin/bash
basefolder="/home/ashley/Development/personal/vasulka-archive-archive/ocr"
for i in $(find * -iname '*.pdf');
do
if [ `dirname $i` != "." ]
then
dirpath="${i%/*}"
dir_arr=(`echo $dirpath | tr "/" "\n"`)
path=""
for x in "${dir_arr[@]}"
do
if [ -z "$path" ]
then
path=$x
mkdir -p $basefolder$path
else
path=$path"/"$x
mkdir -p $basefolder$path
fi
done
ext="."${i##*.}
output=${i/$ext/".txt"}
if [ ! -f $basefolder$output ] || [ $i -nt $basefolder$output ]
then
echo $i
pdftotext -enc ASCII7 $i $basefolder$output
fi
else
ext="."${i##*.}
output=${i/$ext/".txt"}
if [ ! -f $basefolder$output ] || [ $i -nt $basefolder$output ]
then
echo $i
pdftotext -enc ASCII7 $i $basefolder$output
fi
fi
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.