Skip to content

Instantly share code, notes, and snippets.

@ShubhamJain7
Created June 20, 2020 18:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ShubhamJain7/818e3d4330fe533f33b9df10947afc5f to your computer and use it in GitHub Desktop.
Save ShubhamJain7/818e3d4330fe533f33b9df10947afc5f to your computer and use it in GitHub Desktop.

Week 4

20 June, 2020

It's already been a month!? Time does fly, huh...

This week we moved from the realm of Python to the realm of C++! TorchScript looked like an excellent candidate for the job so I continued working with it. The first challenge was a rather silly one...reading images from the file system. This step is so easy in python, you barely even think of it as a step. Just pip install Pillow or pip install opencv-python and you're good to go. Alas, it isn't as easy with C++. It took me quite some time to figure out just how to compile a library for a 32-bit system and then link it. In the end, I just blindly followed this old-ish blog and was finally able to do it.

OpenCV reads images in BGR format for some mysterious regions so we first need to change it to RGB. cv::cvtColor(image, image, cv::COLOR_BGR2RGB); does the job. Next, you normalize all the image data to 0-1 range with image.convertTo(img_float, CV_32FC3, 1.0f / 255.0f); because pretty much all ML models require your input to be normalized. Now, since the DE⫶TR uses resnet50as it's the backbone (quite literally!), we need to normalize the Red, Blue, and Green channels of the image with mean [0.485, 0.456, 0.406] and standard-deviation [0.229, 0.224, 0.225] respectively. God bless the LibTorch developers for including common image transforms for such tasks! Just run torch::data::transforms::Normalize<> normalize_transform({ 0.485, 0.456, 0.406 }, { 0.229, 0.224, 0.225 }); and viola! your image is prepared. Well actually, not quite. Remember that ML models usually accept inputs in batches. So we need to add another dimension to our image with a simple call to unsqueeze_(0) on our normalized tensor before passing it through the model. Unfortunately, the model takes too long to process its inputs (I'm talking up to 10 mins!) and then the process just ends without so much as a whimper. This made debugging this issue a little difficult but that's a mystery we solve another day because ONNX is back in the race!

Turns out WinML isn't the only way to run ONNX models on Windows. You can use something called ONNX Runtime too! I assumed that they only provided a Python package and tried to just use that instead of dabbling with C++ but it just wouldn't install on Windows. I did find out later that the issue was my 32-bit version of python and not the package :P Since NVDA is dependant on 32-bit Python so that option was eliminated. Thankfully, my mentor Reef discovered this release of ONNX Runtime that includes a 32-bit compiled library! Setting it up was quite easy but it didn't provide the same convenience transformations as LibTorch so things had to be done manually. Here's how I normalized each channel:

// Split image channels
std::vector<cv::Mat> channels(3);
cv::split(image_float, channels);

// Define mean and std-dev for each channel
std::vector<double> mean = { 0.485, 0.456, 0.406 };
std::vector<double> stddev = { 0.229, 0.224, 0.225 };
size_t i = 0;
// Normalize each channel with corresponding mean and std-dev values
for (auto& c : channels) {
    c = (c - mean[i]) / stddev[i];
    ++i;
}

The next few steps were quite hard to figure out owing to ONNX Runtime's non-existent documentation. Not only did they not provide any C++ documentation, but the source code itself also doesn't contain any comments. I had to scrape together the code by following a code example (which they felt was more than enough) and a few issues on their repo. It all went well until I saw the output and realized I probably made a mistake while converting the model to the ONNX format. Oops!😬

@Raj-Dave368
Copy link

Please tell, From where did you learn these things?
I also want to learn this.

@ShubhamJain7
Copy link
Author

Hi @Raj-Dave368,
Could you be a little more specific about what you want to learn?

@Raj-Dave368
Copy link

Raj-Dave368 commented Jan 13, 2021

Hi,

I just started to preparing for GSoC but the projects are bit difficult me to understand like this one.
I don't know how to work with image in c++, so please suggest me some resources to learn, so I can also work on this project.
And one more doubt, it's bit hard to understand codebase of any project, please give me some guidance, how can I understand and project and can start contributing.
Thanks 👍

@ShubhamJain7
Copy link
Author

I didn't know how to work with images in C++ before I took up this project either. You learn to solve problems only when you face them, don't worry about it. If working with images is what you're interested in, I would suggest you read the documentation for OpenCV.

It's always difficult to hop onto a codebase, sometimes even your own, if it has been a while. Some "tricks" that help are:- looking at the various modules of a project independently, figuring out what the "starting point" is and tracing your way through the program and, making physical notes and diagrams of how everything comes together.
If you've read my blog entries you would know how I struggled to integrate my work with NVDA's because their codebase was so large and complicated. I could do it only with the help of a great mentor. So after you've put in some time trying to understand the code, you can always reach out to authors/contributors with some specific questions. You can also ask them how you could contribute (If there isn't a section about it in the readme and there aren't any issues).

As for GSoC, look for an organization that is using the technologies you are comfortable with and whose mentors are willing to help you. More often than not, it's the mentors who will help you select, frame, and tackle your project. All the best!

@Raj-Dave368
Copy link

Shubham sir, thanks a lot
You explained everything and I understood.
Your each sentence will definitely help me to improve
Thanks for spending your time, answering my confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment