Skip to content

Instantly share code, notes, and snippets.

@RhetTbull
Last active November 22, 2023 16:58
  • Star 23 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
Star You must be signed in to star a gist
Embed
What would you like to do?
Use Apple's Vision framework from Python to detect text in images
""" Use Apple's Vision Framework via PyObjC to detect text in images """
import pathlib
import Quartz
import Vision
from Cocoa import NSURL
from Foundation import NSDictionary
# needed to capture system-level stderr
from wurlitzer import pipes
def image_to_text(img_path, lang="eng"):
input_url = NSURL.fileURLWithPath_(img_path)
with pipes() as (out, err):
# capture stdout and stderr from system calls
# otherwise, Quartz.CIImage.imageWithContentsOfURL_
# prints to stderr something like:
# 2020-09-20 20:55:25.538 python[73042:5650492] Creating client/daemon connection: B8FE995E-3F27-47F4-9FA8-559C615FD774
# 2020-09-20 20:55:25.652 python[73042:5650492] Got the query meta data reply for: com.apple.MobileAsset.RawCamera.Camera, response: 0
input_image = Quartz.CIImage.imageWithContentsOfURL_(input_url)
vision_options = NSDictionary.dictionaryWithDictionary_({})
vision_handler = Vision.VNImageRequestHandler.alloc().initWithCIImage_options_(
input_image, vision_options
)
results = []
handler = make_request_handler(results)
vision_request = Vision.VNRecognizeTextRequest.alloc().initWithCompletionHandler_(handler)
error = vision_handler.performRequests_error_([vision_request], None)
return results
def make_request_handler(results):
""" results: list to store results """
if not isinstance(results, list):
raise ValueError("results must be a list")
def handler(request, error):
if error:
print(f"Error! {error}")
else:
observations = request.results()
for text_observation in observations:
recognized_text = text_observation.topCandidates_(1)[0]
results.append([recognized_text.string(), recognized_text.confidence()])
return handler
def main():
import sys
import pathlib
img_path = pathlib.Path(sys.argv[1])
if not img_path.is_file():
sys.exit("Invalid image path")
img_path = str(img_path.resolve())
results = image_to_text(img_path)
print(results)
if __name__ == "__main__":
main()
@okpatil4u
Copy link

Hello, thanks for this code. Is there a way to catch the bounding boxes ?

@RhetTbull
Copy link
Author

@okpatil4u It's possible, but I've not written the python code. Take a look here to see the sample code on getting the bounding rectacngle.

Also, for a more robust implementation of this example, see here

@okpatil4u
Copy link

Thanks @RhetTbull !
I will check.

@lakeparkXPA
Copy link

Hi, nice work! I was wondering is there a way to detect other languages plus english?

@RhetTbull
Copy link
Author

RhetTbull commented Jul 6, 2023

Hi, nice work! I was wondering is there a way to detect other languages plus english?

Yes. See the implementation of this in my textinator app which shows how to get the list of supported languages and set the language.

@lakeparkXPA
Copy link

Thank you @RhetTbull! I will check it out.

@psungho
Copy link

psungho commented Jul 19, 2023

I recently found this and found it quite useful.

I was planning on OCRing about 10000 pdfs with apple's api. your code works well. however I'm a bit stuck on how to multithread/parallel process it. concurrent.futures does not seemingly work. if there any suggestion you would make for this?

@RhetTbull
Copy link
Author

@psungho I'm not sure how well the pyobjc stuff works with python's threads. I would try multiprocessing (spawn multiple separate python processes each running the vision framework).

@psungho
Copy link

psungho commented Jul 19, 2023

doesn't really seem to be friendly. keep getting things like Object ID x,0 ref repaired where x is a number

@psungho
Copy link

psungho commented Jul 20, 2023

I guess in theory you could use NSThreads instead? @RhetTbull

Not sure how much of a performance improvement it will bring. Relatively a new obj-c coder (in fact learning it for a project I have). What I want to do is OCR a bunch of pdfs concurrently -- maybe there is some alternate solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment