Skip to content

Instantly share code, notes, and snippets.

@vicgc
Forked from Fingel/img2txt.sh
Created May 21, 2014 13:03
Show Gist options
  • Save vicgc/120566b3a394b1ff288a to your computer and use it in GitHub Desktop.
Save vicgc/120566b3a394b1ff288a to your computer and use it in GitHub Desktop.
#!/bin/bash
#Converts images to text using tesseract (package tesseract-ocr & tesseract-ocr-eng)
function usage
{
echo "img2txt -i <input directory> -o <output directory> --concat"
}
function concat
{
o=$1
for f in $o/*
do
cat $f >> $o/concatenated.txt
done
}
function convert
{
i=$1
o=$2
mkdir $o
echo "Converting and placing files into $o"
for f in $i/*
do
test -f $f || continue
echo "processing file $f... $o/${f##*/}.txt"
tesseract $f $o/${f##*/} &> /dev/null
done
echo "done."
}
inputdir=
outputdir=
concatenate=
while [ "$1" != "" ]; do
case $1 in
-i | --input ) shift
inputdir=$1
;;
-o | --output ) shift
outputdir=$1
;;
-c | --concat ) concat=1
;;
-h | --help ) usage
exit
;;
* ) usage
exit 1
esac
shift
done
convert $inputdir $outputdir
if [ $concat == 1 ]; then
concat $outputdir
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment