kortina/tesseract.md

## tesseract.md

      
    Raw
  

              tesseract.md
            
          
    Today, I was looking for some screenshots I wanted to use for a presentation, and rather than looking through all 1478 of my uploaded camera photos ( ls -1 ~/Dropbox/Camera\ Uploads/ | wc -l ), I decided to write a quick bash script to use the tesseract OCR tool to help me out.
I wanted to use https://github.com/jbochi/python-tesseract so first I installed the dependencies.
sudo pip install PIL
brew install tesseract
cd ~/Dropbox/git/
git clone git@github.com:jbochi/python-tesseract.git
~/Dropbox/git/python-tesseract/tesseract.py ~/Dropbox/Camera\ Uploads/2013-06-06\ 21.55.03.png 
chmod 700 ~/Desktop/find-images-with-text.sh 

A quick test to make sure this is working:

Then I wrote this script:
/Users/kortina/Desktop/find-images-with-text.sh
#!/bin/bash

query=$1
directory_to_search=$2

cd "$2"
for f in *; do
    txt=`~/Dropbox/git/python-tesseract/tesseract.py "$f" 2>/dev/null`
    echo $txt | grep -i -q "$1" && echo -e "$f\n$txt"
done

Next, I made the script excecutable:
chmod 700 /Users/kortina/Desktop/find-images-with-text.sh

And ran it:
~/Desktop/find-images-with-text.sh ride ~/Dropbox/Camera\ Uploads

Pretty sweet that these tools existed and I could do all of this in like 15 minutes.