Skip to content

Instantly share code, notes, and snippets.

@ShubhamJain7
Created July 25, 2020 18:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ShubhamJain7/dd9bbc7104dbf38fc920399b5f0c5c2a to your computer and use it in GitHub Desktop.
Save ShubhamJain7/dd9bbc7104dbf38fc920399b5f0c5c2a to your computer and use it in GitHub Desktop.

Week 9

25 July, 2020

Hello again! Let's start with a simple but dangerous mistake I made last week. Turns out, copying an std::string object character by character into a char * array isn't a good thing to do. This technique may seem simple and innocent enough but is prone to security risks like buffer overflow attacks. Using an inbuilt library function to accomplish these tasks is always safer! Well, almost always. strcpy() was replaced with strcpy_s() since it is safer. So I switched over to using that and we were ready to go!

I spent most of this week working around the restrictions NVDA's contentRecog module has. The module seems to have been written with just the OCR in mind and so it doesn't lend itself too well to any other kind of add-on/feature. For example, it is hard-coded to present recognition results in the form of a virtual window. Another issue is that the recognition result itself isn't very accessible so it cannot be stored for processing or any other use. For my add-on, I'd like the result to be present in a few other ways and even store the result so the user can access it as many times as they want. The virtual window is useful in that it lets users access the result character by character, word by word and even copy it. However, it would be nice if the result was just read out once it was available. It would also be useful if I could draw bounding boxes around the objects detected and announce them when the user moves the mouse or their finger over the box. I tried three solutions to overcome these challenges:

  1. Pass a resultHandler function to contentRecog.recogUI.recognizeNavigatorObject which can be called from a function similar to contentRecog.recogUI._recogOnResult. Let's call this new function _myRecogOnResult. _myRecogOnResult is then passed to the recognizer (Object of class derived from contentRecog.ContentRecognizer) object's recognize method to be called when a result is ready. This solution is simple but not too flexible.
  2. (My mentor Reef came up with this one) Use a resultHandlerClass class instead of the resultHandler function. This allows for more flexibility with handling the result. However, this solution was a little complex. The resultHandlerClass is passed as an argument while initializing the recognizer which also includes a getResultHandler method, which returns an object of resultHandlerClass and contains the result as a member. A call to the getResultHandler method is made by _recogOnResult and the returned handler object can then be used by future developers. You can check out the code of this solution here. (Because of course you can't just read a description of a solution and understand it without looking at the code :p)
  3. This solution was inspired by the previous one and aimed to simplify it and give a little more control to add-on developers. With this solution, the call to recognizer.recognize is made by the user's script and the handler object is returned. It essentially takes out the extra stack layer of contentRecog.recogUI.recognizeNavigatorObject out of the equation and the code to handle the result need to be written entirely in the resultHandlerClass's __init__ method. Unfortunately with this solution, you lose the capability of running only one recognition result that contentRecog.recogUI._recogOnResult provides. You can check out this solution here.

In the end, I decided to go with the second solution. There wasn't any direct way to store the result so I had to get a little "hacky". By defining the resultHandlerClasses in a file with a global variable, I can store the result in the global variable and have access to it at any time!

Next, I moved on to the problem of drawing bounding boxes around the detected objects. I figured it's a lot harder to draw the boxes than it is to convert in-image coordinates to screen-coordinates so that's the problem I chose to tackle first. I first tried seeing if NVDA's Focus Highlight feature could be used/copied/modified for my purposes but the code was too difficult to understand so I started exploring other options. WxPython lets you draw directly on the screen or ee=ven just create GUI elements so I started experimenting with that. The first thing I tried was to use wx.ScreenDC() to draw a rectangle. It worked but the rectangle disappeared in milliseconds. This was because the wx.ScreenDc object went out of scope and so I started the process on a separate thread and put that in an infinite loop so that the thread never ended and the variable never went out of scope. Yes, I know it's not a good design and the results were a little glitchy so I had to try something else. Creating a new screen-sized window and drawing rectangles on that was my next solution. It worked perfectly well except that making the window transparent meant anything drawn on it would become transparent too. Still looking into ways of making just the background of the window transparent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment