|CONTENTS=$(tesseract -c language_model_penalty_non_dict_word=0.8 --tessdata-dir /usr/local/share/tessdata/ "$1" stdout -l eng | xml esc)|
|<?xml version="1.0" encoding="UTF-8"?>|
|<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">|
|) | plutil -convert binary1 - -o - | xxd -p | tr -d '\n')|
|xattr -w -x com.apple.metadata:kMDItemFinderComment "$hex" "$1"|
You can run this automatically for screenshots by enabling this Folder Action with Automator. The only tweak is to fully qualify the paths to
Workflow is here (don't forget to
If you screenshot individual windows, the alpha channel prevents Tesseract from scanning properly. Also a lot of UI text is too small to accurately scan. To solve this I preprocessed with ImageMagick like so:
Testing with a screenshot of my Terminal, I got better results with
For @amake tip you have to
This script is very helpful - thank you for sharing it.
I'm running macos 10.13.6 and needed to modify ocr-shot.sh to specify
ocr-shot.sh is now working as designed.
macos-native and Firefox screenshots are 72 dpi. I'll look into where I can set that explicitly.