Skip to content

Instantly share code, notes, and snippets.

@outofband
Created August 11, 2021 17:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save outofband/c7be79ea4b3b92c45d38937c6e3aea5c to your computer and use it in GitHub Desktop.
Save outofband/c7be79ea4b3b92c45d38937c6e3aea5c to your computer and use it in GitHub Desktop.

OCR'ing your screenshots on macOS so Spotlight can find embedded text (from @lemonodor)

Install Prerequisites:

brew install tesseract tesseract-lang xmlstarlet

Dump this into a file called ocr-data in your $PATH and chmod +x it:

#!/bin/bash

set -e

CONTENTS=$(tesseract -l eng -c language_model_penalty_non_dict_word=0.8 --tessdata-dir /usr/local/share/tessdata/ "$1" stdout | xml esc)

hex=$((cat <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<string>$CONTENTS</string>
</plist>
EOF
) | plutil -convert binary1 - -o - | xxd -p | tr -d '\n')
xattr -w -x com.apple.metadata:kMDItemFinderComment "$hex" "$1"
mdimport "$1"

Run it on all your screenshots (change ~/Desktop to your screenshots folder if it's different):

find ~/Desktop -name "Screen Shot*" -print0 | xargs -0 -P 4 -n 1 ocr-shot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment