Skip to content

Instantly share code, notes, and snippets.

View ShubhamJain7's full-sized avatar

Shubham Jain ShubhamJain7

  • Bengaluru, India
View GitHub Profile

Week 1

30 May, 2020

Hello!

This is the start of a blog that documents my experience and progress as a GSoC 2020 student with NV ACCESS while working on my project titled Image captioning and Object recognition modules for NVDA! The purpose of these blog posts is to not only keep track of the work being done each week but also to serve as a guide to any future GSoC students! (By the way, feel free to reach out to me about it!)

The community bonding period was a blast! My mentors, Michael Curran and Reef Turner, along with the rest of the NVDA community were incredibly warm and helpful to me! The community welcomed me and we discussed my project at great length. Some of the community members gave me helpful pointers (How did I not think of adding an OCR along with the object recognition module 🤦‍♂️) and referred me to some previous work that hoped to accomplish the same goals! I spent most of

Week 2

06 June, 2020

Aaaaaand we're done with week two! At the start of the week I discovered this mind-blowing new object-detection model called DE⫶TR by the great minds at Facebook (They made PyTorch🔥 too!!). What's great that all the pretrained models are easily available over at TorchHub and if that wasn't enough, they went ahead and created a Google colab notebook to demonstrate how easy it is to implement. Unfortunately, a class definition of the model arcitecture is only available for the demo model and not the full models on TorchHub. That meant I couldn't modify the models to include some processing steps. Most of the week was spent in reasercing and dealing with the perils of production/distribution. All the code works like a charm on my system but a user might not have Python and the required libraries dependencies installed on their system.

Week 3

13 June, 2020

Week three is done! This week was pretty much a continuation of all the exploration and experimentation done last week. A lot of questions answered too!

Firstly, it turns out that the odd and cryptic WinML was just because Windows doesn't support ONNX opset 10 as of yet (check out the compatibility here). I could just downgrade the opset version right? AAh if only it were that simple in real life😓. Turns out the lower opsets don't support an operator called Dynamic Slice that is crucial for DE⫶TR to work. So WinML is off the table now, atleast for this model. I did create an issue on DE⫶TR's Github repository last week but it didn't receieve much traction. It does, however, bring us to two of or next points of discussion!

Always ask for help! No matter how stupid the question seems, it is always better to seek advice and/or help from those that migh

Week 4

20 June, 2020

It's already been a month!? Time does fly, huh...

This week we moved from the realm of Python to the realm of C++! TorchScript looked like an excellent candidate for the job so I continued working with it. The first challenge was a rather silly one...reading images from the file system. This step is so easy in python, you barely even think of it as a step. Just pip install Pillow or pip install opencv-python and you're good to go. Alas, it isn't as easy with C++. It took me quite some time to figure out just how to compile a library for a 32-bit system and then link it. In the end, I just blindly followed this old-ish blog and was finally able to do it.

OpenCV reads images in BGR format for some mysterious regions so we first need to change it to RGB. cv::cvtColor(image, image, cv::COLOR_BGR2RGB); does the job. Next, you normalize all the image data to 0-1 range with `image.convertTo(img_float, CV_32FC3, 1.0f

Week 5

27 June, 2020

Welp, thankfully I was wrong about the model being wrong. It was my code that was wrong😥 The ONNX model only returned a pointer to the first value of the output which was stored contiguously in memory. To make processing the output easier, I decided to store the values in a 2D array. I declared my 2d array as float probs[92][100]; while my output dimension was 100 x 92 :P. That meant that even though it looked like the output straight from the model looked right, the results after processing were wildly different. Having fixed that, I had to reverse-engineer the postprocessing steps and implement them in C++. The first step was to apply the softmax function on each of the 100 rows. Softmax is just a function to normalize a vector of values into probabilities. To calculate the softmax of a vector you divide the exponent of each value in the vector by the sum of exponents taken over the entire vector. This gives us the probability of each class (column) in each detection

Week 6

4 July, 2020

Last week, we created a DLL for the YOLOv3 darknet models and a client that could use it in Python. I started this week by using the outputs of the model, which are in the form

struct Detection {
    int classId,
    float probability,
 int x1,

Week 7

11 July, 2020

This was a very slow week, more thinking and decision making and less coding. There were a few issues with the file structure of the add-on. Turns out NVDA expects every Python file in the globalPlugins directory to contain an instance of globalPluginHandler.GlobalPlugin. After spending an embarrassingly long amount of time (read two days). I was finally able to solve it by packaging all the code as a Python package. You can see the code here.

After discussions with my mentor Reef, I came to realize that I may have been focused on the wrong things. With such projects, it is quite easy to lose track of your initial goals and fly off on a tangent. I started worrying about things like the size and speed of the object detection models and lost focus on the real goal, to make an add-on that is useful and userfriendly for non-visual users. I wished to release the add-on and get feedback on which model the users thin

Week 8

18 July, 2020

Lots of coding this past week! I started out trying to fix most of the issues that the add-on release had. The biggest of which was that users were getting the “Cannot identify any objects in the image” more often than useful results. After looking into it a little deeper, I discovered three potential problems that might be contributing to this issue.

  1. The model really couldn't identify any objects in the image. This was the most obvious one but also one over which I had no control. The release was shipped with the tiniest(lol) of the 3 models, Tiny-YOLOv3. This was by choice since we didn't want anyone from being turned away from testing the add-on because of the download size. Of course, choosing a small model means the results won't be too good.
  2. Users were trying to run object detection on non-image elements on their screen. This seems a little unlikely but it was a case that needed to be handled anyway. Unfortunately, contentRecog.recogUi.recognizeNavigatorObject di

Week 9

25 July, 2020

Hello again! Let's start with a simple but dangerous mistake I made last week. Turns out, copying an std::string object character by character into a char * array isn't a good thing to do. This technique may seem simple and innocent enough but is prone to security risks like buffer overflow attacks. Using an inbuilt library function to accomplish these tasks is always safer! Well, almost always. strcpy() was replaced with strcpy_s() since it is safer. So I switched over to using that and we were ready to go!

I spent most of this week working around the restrictions NVDA's contentRecog module has. The module seems to have been written with just the OCR in mind and so it doesn't lend itself too well to any other kind of add-on/feature. For example, it is hard-coded to present recognition results in the form of a virtual window. Another issue is that the recognition result itself isn't very accessible so it cannot be stored for processing or any other use. For my add-

@ShubhamJain7
ShubhamJain7 / screenDC.py
Created August 2, 2020 16:33
wxPython bounding boxes
import wx
class Frame(wx.Frame):
def __init__(self,boxes):
super(Frame, self).__init__(None, title="Bounding boxes")
self.boxes = boxes
self.boundingBoxes = []
self.status = []