Shubham Jain ShubhamJain7

## GSoC Blog: week 1.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 1.md
            
            
              Last active
              May 30, 2020 11:32
            
          
    Week 1

30 May, 2020


Hello!
This is the start of a blog that documents my experience and progress as a GSoC 2020 student with NV ACCESS while working on my project titled Image captioning and Object recognition modules for NVDA!
The purpose of these blog posts is to not only keep track of the work being done each week but also to serve as a guide to any future GSoC students! (By the way, feel free to reach out to me about it!)
The community bonding period was a blast! My mentors, Michael Curran and Reef Turner, along with the rest of the NVDA community were incredibly warm and helpful to me! The community welcomed me and we discussed my project at great length. Some of the community members gave me helpful pointers (How did I not think of adding an OCR along with the object recognition module 🤦‍♂️) and referred me to some previous work that hoped to accomplish the same goals! I spent most of

  
## GSoC Blog: week 2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 2.md
            
            
              Created
              June 6, 2020 17:41
            
          
    Week 2

06 June, 2020


Aaaaaand we're done with week two! At the start of the week I discovered this mind-blowing new object-detection model called DE⫶TR by the great minds at Facebook (They made PyTorch🔥 too!!). What's great that all the pretrained models are easily available over at TorchHub and if that wasn't enough, they went ahead and created a Google colab notebook to demonstrate how easy it is to implement. Unfortunately, a class definition of the model arcitecture is only available for the demo model and not the full models on TorchHub. That meant I couldn't modify the models to include some processing steps.
Most of the week was spent in reasercing and dealing with the perils of production/distribution. All the code works like a charm on my system but a user might not have Python and the required libraries dependencies installed on their system.

  
## GSoC Blog: week 3.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 3.md
            
            
              Created
              June 13, 2020 17:58
            
          
    Week 3

13 June, 2020


Week three is done! This week was pretty much a continuation of all the exploration and experimentation done last week. A lot of questions answered too!
Firstly, it turns out that the odd and cryptic WinML was just because Windows doesn't support ONNX opset 10 as of yet (check out the compatibility here). I could just downgrade the opset version right? AAh if only it were that simple in real life😓. Turns out the lower opsets don't support an operator called Dynamic Slice that is crucial for DE⫶TR to work. So WinML is off the table now, atleast for this model. I did create an issue on DE⫶TR's Github repository last week but it didn't receieve much traction. It does, however, bring us to two of or next points of discussion!
Always ask for help! No matter how stupid the question seems, it is always better to seek advice and/or help from those that migh

  
## GSoC Blog: week 4.md

      
              1 file
            
          
              0 forks
            
          
              5 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 4.md
            
            
              Created
              June 20, 2020 18:07
            
          
    Week 4

20 June, 2020


It's already been a month!? Time does fly, huh...
This week we moved from the realm of Python to the realm of C++! TorchScript looked like an excellent candidate for the job so I continued working with it. The first challenge was a rather silly one...reading images from the file system. This step is so easy in python, you barely even think of it as a step. Just pip install Pillow or pip install opencv-python and you're good to go. Alas, it isn't as easy with C++. It took me quite some time to figure out just how to compile a library for a 32-bit system and then link it. In the end, I just blindly followed this old-ish blog and was finally able to do it.
OpenCV reads images in BGR format for some mysterious regions so we first need to change it to RGB. cv::cvtColor(image, image, cv::COLOR_BGR2RGB); does the job. Next, you normalize all the image data to 0-1 range with `image.convertTo(img_float, CV_32FC3, 1.0f

  
## GSoC Blog: week 5.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 5.md
            
            
              Last active
              June 30, 2020 08:08
            
          
    Week 5

27 June, 2020


Welp, thankfully I was wrong about the model being wrong. It was my code that was wrong😥
The ONNX model only returned a pointer to the first value of the output which was stored contiguously in memory. To make processing the output easier, I decided to store the values in a 2D array. I declared my 2d array as float probs[92][100]; while my output dimension was 100 x 92 :P. That meant that even though it looked like the output straight from the model looked right, the results after processing were wildly different. Having fixed that, I had to reverse-engineer the postprocessing steps and implement them in C++. The first step was to apply the softmax function on each of the 100 rows. Softmax is just a function to normalize a vector of values into probabilities. To calculate the softmax of a vector you divide the exponent of each value in the vector by the sum of exponents taken over the entire vector. This gives us the probability of each class (column) in each detection

  
## GSoC Blog: week 6.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 6.md
            
            
              Created
              July 4, 2020 15:31
            
          
    Week 6

4 July, 2020


Last week, we created a DLL for the YOLOv3 darknet models and a client that could use it in Python. I started this week by using the outputs of the model, which are in the form
struct Detection {
    int classId,
    float probability,
 int x1,


## GSoC Blog: week 7.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 7.md
            
            
              Created
              July 11, 2020 17:32
            
          
    Week 7

11 July, 2020


This was a very slow week, more thinking and decision making and less coding. There were a few issues with the file structure of the add-on. Turns out NVDA expects every Python file in the globalPlugins directory to contain an instance of globalPluginHandler.GlobalPlugin. After spending an embarrassingly long amount of time (read two days). I was finally able to solve it by packaging all the code as a Python package. You can see the code here.
After discussions with my mentor Reef, I came to realize that I may have been focused on the wrong things. With such projects, it is quite easy to lose track of your initial goals and fly off on a tangent. I started worrying about things like the size and speed of the object detection models and lost focus on the real goal, to make an add-on that is useful and userfriendly for non-visual users. I wished to release the add-on and get feedback on which model the users thin

  
## GSoC Blog: week 8.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 8.md
            
            
              Created
              July 18, 2020 14:34
            
          
    Week 8

18 July, 2020


Lots of coding this past week! I started out trying to fix most of the issues that the add-on release had. The biggest of which was that users were getting the “Cannot identify any objects in the image” more often than useful results. After looking into it a little deeper, I discovered three potential problems that might be contributing to this issue.

The model really couldn't identify any objects in the image. This was the most obvious one but also one over which I had no control. The release was shipped with the tiniest(lol) of the 3 models, Tiny-YOLOv3. This was by choice since we didn't want anyone from being turned away from testing the add-on because of the download size. Of course, choosing a small model means the results won't be too good.
Users were trying to run object detection on non-image elements on their screen. This seems a little unlikely but it was a case that needed to be handled anyway. Unfortunately, contentRecog.recogUi.recognizeNavigatorObject di


## GSoC Blog: week 9.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ShubhamJain7
                / GSoC Blog: week 9.md
            
            
              Created
              July 25, 2020 18:09
            
          
    Week 9

25 July, 2020


Hello again!
Let's start with a simple but dangerous mistake I made last week. Turns out, copying an std::string object character by character into a char * array isn't a good thing to do. This technique may seem simple and innocent enough but is prone to security risks like buffer overflow attacks. Using an inbuilt library function to accomplish these tasks is always safer! Well, almost always. strcpy() was replaced with strcpy_s() since it is safer. So I switched over to using that and we were ready to go!
I spent most of this week working around the restrictions NVDA's contentRecog module has. The module seems to have been written with just the OCR in mind and so it doesn't lend itself too well to any other kind of add-on/feature. For example, it is hard-coded to present recognition results in the form of a virtual window. Another issue is that the recognition result itself isn't very accessible so it cannot be stored for processing or any other use. For my add-

  
## screenDC.py
import wx

class Frame(wx.Frame):
	def __init__(self,boxes):
		super(Frame, self).__init__(None, title="Bounding boxes")

		self.boxes = boxes
		self.boundingBoxes = []
		self.status = []
	import wx

	class Frame(wx.Frame):
	def __init__(self,boxes):
	super(Frame, self).__init__(None, title="Bounding boxes")

	self.boxes = boxes
	self.boundingBoxes = []
	self.status = []