Hello again!
Let's start with a simple but dangerous mistake I made last week. Turns out, copying an std::string
object character by character into a char *
array isn't a good thing to do. This technique may seem simple and innocent enough but is prone to security risks like buffer overflow attacks. Using an inbuilt library function to accomplish these tasks is always safer! Well, almost always. strcpy()
was replaced with strcpy_s()
since it is safer. So I switched over to using that and we were ready to go!
I spent most of this week working around the restrictions NVDA's contentRecog
module has. The module seems to have been written with just the OCR in mind and so it doesn't lend itself too well to any other kind of add-on/feature. For example, it is hard-coded to present recognition results in the form of a virtual window. Another issue is that the recognition result itself isn't very accessible so it cannot be stored for processing or any other use. For my add-on, I'd like the result to be present in a few other ways and even store the result so the user can access it as many times as they want. The virtual window is useful in that it lets users access the result character by character, word by word and even copy it. However, it would be nice if the result was just read out once it was available. It would also be useful if I could draw bounding boxes around the objects detected and announce them when the user moves the mouse or their finger over the box. I tried three solutions to overcome these challenges:
- Pass a
resultHandler
function tocontentRecog.recogUI.recognizeNavigatorObject
which can be called from a function similar tocontentRecog.recogUI._recogOnResult
. Let's call this new function_myRecogOnResult
._myRecogOnResult
is then passed to therecognizer
(Object of class derived fromcontentRecog.ContentRecognizer
) object'srecognize
method to be called when a result is ready. This solution is simple but not too flexible. - (My mentor Reef came up with this one) Use a
resultHandlerClass
class instead of theresultHandler
function. This allows for more flexibility with handling the result. However, this solution was a little complex. TheresultHandlerClass
is passed as an argument while initializing therecognizer
which also includes agetResultHandler
method, which returns an object ofresultHandlerClass
and contains the result as a member. A call to thegetResultHandler
method is made by_recogOnResult
and the returnedhandler
object can then be used by future developers. You can check out the code of this solution here. (Because of course you can't just read a description of a solution and understand it without looking at the code :p) - This solution was inspired by the previous one and aimed to simplify it and give a little more control to add-on developers. With this solution, the call to
recognizer.recognize
is made by the user's script and thehandler
object is returned. It essentially takes out the extra stack layer ofcontentRecog.recogUI.recognizeNavigatorObject
out of the equation and the code to handle the result need to be written entirely in theresultHandlerClass
's__init__
method. Unfortunately with this solution, you lose the capability of running only one recognition result thatcontentRecog.recogUI._recogOnResult
provides. You can check out this solution here.
In the end, I decided to go with the second solution. There wasn't any direct way to store the result so I had to get a little "hacky". By defining the resultHandlerClass
es in a file with a global variable, I can store the result in the global variable and have access to it at any time!
Next, I moved on to the problem of drawing bounding boxes around the detected objects. I figured it's a lot harder to draw the boxes than it is to convert in-image coordinates to screen-coordinates so that's the problem I chose to tackle first. I first tried seeing if NVDA's Focus Highlight feature could be used/copied/modified for my purposes but the code was too difficult to understand so I started exploring other options. WxPython
lets you draw directly on the screen or ee=ven just create GUI elements so I started experimenting with that. The first thing I tried was to use wx.ScreenDC()
to draw a rectangle. It worked but the rectangle disappeared in milliseconds. This was because the wx.ScreenDc
object went out of scope and so I started the process on a separate thread and put that in an infinite loop so that the thread never ended and the variable never went out of scope. Yes, I know it's not a good design and the results were a little glitchy so I had to try something else. Creating a new screen-sized window and drawing rectangles on that was my next solution. It worked perfectly well except that making the window transparent meant anything drawn on it would become transparent too. Still looking into ways of making just the background of the window transparent.