Skip to content

Instantly share code, notes, and snippets.

View ShubhamJain7's full-sized avatar

Shubham Jain ShubhamJain7

  • Bengaluru, India
View GitHub Profile

Week 5

27 June, 2020

Welp, thankfully I was wrong about the model being wrong. It was my code that was wrong😥 The ONNX model only returned a pointer to the first value of the output which was stored contiguously in memory. To make processing the output easier, I decided to store the values in a 2D array. I declared my 2d array as float probs[92][100]; while my output dimension was 100 x 92 :P. That meant that even though it looked like the output straight from the model looked right, the results after processing were wildly different. Having fixed that, I had to reverse-engineer the postprocessing steps and implement them in C++. The first step was to apply the softmax function on each of the 100 rows. Softmax is just a function to normalize a vector of values into probabilities. To calculate the softmax of a vector you divide the exponent of each value in the vector by the sum of exponents taken over the entire vector. This gives us the probability of each class (column) in each detection

Week 4

20 June, 2020

It's already been a month!? Time does fly, huh...

This week we moved from the realm of Python to the realm of C++! TorchScript looked like an excellent candidate for the job so I continued working with it. The first challenge was a rather silly one...reading images from the file system. This step is so easy in python, you barely even think of it as a step. Just pip install Pillow or pip install opencv-python and you're good to go. Alas, it isn't as easy with C++. It took me quite some time to figure out just how to compile a library for a 32-bit system and then link it. In the end, I just blindly followed this old-ish blog and was finally able to do it.

OpenCV reads images in BGR format for some mysterious regions so we first need to change it to RGB. cv::cvtColor(image, image, cv::COLOR_BGR2RGB); does the job. Next, you normalize all the image data to 0-1 range with `image.convertTo(img_float, CV_32FC3, 1.0f

Week 3

13 June, 2020

Week three is done! This week was pretty much a continuation of all the exploration and experimentation done last week. A lot of questions answered too!

Firstly, it turns out that the odd and cryptic WinML was just because Windows doesn't support ONNX opset 10 as of yet (check out the compatibility here). I could just downgrade the opset version right? AAh if only it were that simple in real life😓. Turns out the lower opsets don't support an operator called Dynamic Slice that is crucial for DE⫶TR to work. So WinML is off the table now, atleast for this model. I did create an issue on DE⫶TR's Github repository last week but it didn't receieve much traction. It does, however, bring us to two of or next points of discussion!

Always ask for help! No matter how stupid the question seems, it is always better to seek advice and/or help from those that migh

Week 2

06 June, 2020

Aaaaaand we're done with week two! At the start of the week I discovered this mind-blowing new object-detection model called DE⫶TR by the great minds at Facebook (They made PyTorch🔥 too!!). What's great that all the pretrained models are easily available over at TorchHub and if that wasn't enough, they went ahead and created a Google colab notebook to demonstrate how easy it is to implement. Unfortunately, a class definition of the model arcitecture is only available for the demo model and not the full models on TorchHub. That meant I couldn't modify the models to include some processing steps. Most of the week was spent in reasercing and dealing with the perils of production/distribution. All the code works like a charm on my system but a user might not have Python and the required libraries dependencies installed on their system.

Week 1

30 May, 2020

Hello!

This is the start of a blog that documents my experience and progress as a GSoC 2020 student with NV ACCESS while working on my project titled Image captioning and Object recognition modules for NVDA! The purpose of these blog posts is to not only keep track of the work being done each week but also to serve as a guide to any future GSoC students! (By the way, feel free to reach out to me about it!)

The community bonding period was a blast! My mentors, Michael Curran and Reef Turner, along with the rest of the NVDA community were incredibly warm and helpful to me! The community welcomed me and we discussed my project at great length. Some of the community members gave me helpful pointers (How did I not think of adding an OCR along with the object recognition module 🤦‍♂️) and referred me to some previous work that hoped to accomplish the same goals! I spent most of