Today, I was looking for some screenshots I wanted to use for a presentation, and rather than looking through all 1478 of my uploaded camera photos ( ls -1 ~/Dropbox/Camera\ Uploads/ | wc -l
), I decided to write a quick bash script to use the tesseract OCR tool to help me out.
I wanted to use https://github.com/jbochi/python-tesseract so first I installed the dependencies.
sudo pip install PIL
brew install tesseract
cd ~/Dropbox/git/
git clone git@github.com:jbochi/python-tesseract.git
~/Dropbox/git/python-tesseract/tesseract.py ~/Dropbox/Camera\ Uploads/2013-06-06\ 21.55.03.png
chmod 700 ~/Desktop/find-images-with-text.sh
A quick test to make sure this is working:
Then I wrote this script:
/Users/kortina/Desktop/find-images-with-text.sh
#!/bin/bash
query=$1
directory_to_search=$2
cd "$2"
for f in *; do
txt=`~/Dropbox/git/python-tesseract/tesseract.py "$f" 2>/dev/null`
echo $txt | grep -i -q "$1" && echo -e "$f\n$txt"
done
Next, I made the script excecutable:
chmod 700 /Users/kortina/Desktop/find-images-with-text.sh
And ran it:
~/Desktop/find-images-with-text.sh ride ~/Dropbox/Camera\ Uploads
Pretty sweet that these tools existed and I could do all of this in like 15 minutes.