It's already been a month!? Time does fly, huh...
This week we moved from the realm of Python to the realm of C++! TorchScript looked like an excellent candidate for the job so I continued working with it. The first challenge was a rather silly one...reading images from the file system. This step is so easy in python, you barely even think of it as a step. Just pip install Pillow
or pip install opencv-python
and you're good to go. Alas, it isn't as easy with C++. It took me quite some time to figure out just how to compile a library for a 32-bit system and then link it. In the end, I just blindly followed this old-ish blog and was finally able to do it.
OpenCV reads images in BGR format for some mysterious regions so we first need to change it to RGB. cv::cvtColor(image, image, cv::COLOR_BGR2RGB);
does the job. Next, you normalize all the image data to 0-1 range with image.convertTo(img_float, CV_32FC3, 1.0f / 255.0f);
because pretty much all ML models require your input to be normalized. Now, since the DE⫶TR uses resnet50as it's the backbone (quite literally!), we need to normalize the Red, Blue, and Green channels of the image with mean [0.485, 0.456, 0.406]
and standard-deviation [0.229, 0.224, 0.225]
respectively. God bless the LibTorch developers for including common image transforms for such tasks! Just run torch::data::transforms::Normalize<> normalize_transform({ 0.485, 0.456, 0.406 }, { 0.229, 0.224, 0.225 });
and viola! your image is prepared. Well actually, not quite. Remember that ML models usually accept inputs in batches. So we need to add another dimension to our image with a simple call to unsqueeze_(0)
on our normalized tensor before passing it through the model. Unfortunately, the model takes too long to process its inputs (I'm talking up to 10 mins!) and then the process just ends without so much as a whimper. This made debugging this issue a little difficult but that's a mystery we solve another day because ONNX is back in the race!
Turns out WinML isn't the only way to run ONNX models on Windows. You can use something called ONNX Runtime too! I assumed that they only provided a Python package and tried to just use that instead of dabbling with C++ but it just wouldn't install on Windows. I did find out later that the issue was my 32-bit version of python and not the package :P Since NVDA is dependant on 32-bit Python so that option was eliminated. Thankfully, my mentor Reef discovered this release of ONNX Runtime that includes a 32-bit compiled library! Setting it up was quite easy but it didn't provide the same convenience transformations as LibTorch so things had to be done manually. Here's how I normalized each channel:
// Split image channels
std::vector<cv::Mat> channels(3);
cv::split(image_float, channels);
// Define mean and std-dev for each channel
std::vector<double> mean = { 0.485, 0.456, 0.406 };
std::vector<double> stddev = { 0.229, 0.224, 0.225 };
size_t i = 0;
// Normalize each channel with corresponding mean and std-dev values
for (auto& c : channels) {
c = (c - mean[i]) / stddev[i];
++i;
}
The next few steps were quite hard to figure out owing to ONNX Runtime's non-existent documentation. Not only did they not provide any C++ documentation, but the source code itself also doesn't contain any comments. I had to scrape together the code by following a code example (which they felt was more than enough) and a few issues on their repo. It all went well until I saw the output and realized I probably made a mistake while converting the model to the ONNX format. Oops!😬
Please tell, From where did you learn these things?
I also want to learn this.