Skip to content

Instantly share code, notes, and snippets.

@ShubhamJain7
Created June 6, 2020 17:41
Show Gist options
  • Save ShubhamJain7/ecf3eef6253caaab72979e3a19906131 to your computer and use it in GitHub Desktop.
Save ShubhamJain7/ecf3eef6253caaab72979e3a19906131 to your computer and use it in GitHub Desktop.

Week 2

06 June, 2020

Aaaaaand we're done with week two! At the start of the week I discovered this mind-blowing new object-detection model called DE⫶TR by the great minds at Facebook (They made PyTorch🔥 too!!). What's great that all the pretrained models are easily available over at TorchHub and if that wasn't enough, they went ahead and created a Google colab notebook to demonstrate how easy it is to implement. Unfortunately, a class definition of the model arcitecture is only available for the demo model and not the full models on TorchHub. That meant I couldn't modify the models to include some processing steps. Most of the week was spent in reasercing and dealing with the perils of production/distribution. All the code works like a charm on my system but a user might not have Python and the required libraries dependencies installed on their system. Thankfully, I wasn't the first to face this challenge. PyTorch provides two main ways of converting model from what they call 'eager mode' to a production ready version:

  1. Convert the model to ONNX
  2. Convert the model to TorchScript

Let's start with ONNX.

ONNX (Open Neural Network Exchange) is an open format built to represent machine learning models. ONNX defines a common set of operators - the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.

Awesome! ONNX supports, among others, frameworks like PyTorch and Tensorflow. In PyTorch, converting a model to ONNX is as simple as running torch.onnx.export on a model class object and a sample input. (Spoiler: This is exactly what TorchScript does too!) This works by tracing a graph of the execution of the input through the model. Be sure to use scripting when your model outputs depend on control flow. Another thing that needs to be noted is that the export will freeze most values as constants so be sure to explicitly specify that your input/output is dynamic (In my case, the image shape could vary). Despite the fact that it isn't very clear about how to define the input & output names and dynamic axes, this tutorial does a great job of explaining everything. The idea was to export the DE⫶TR model to a ONNX file and load that into WinML and make use of it's Python API to run evaluations through the model. Easy right? Nope. For some inexplicable reason, WinML refused to load the exported ONNX models and instead threw this cryptic and never-seen-before error message at me: RuntimeError: onecoreuap\windows\windowsai\winml\dll\learningmodel.cpp(390)\Windows.AI.MachineLearning.dll!719026C2: (caller: 71956928) Exception(2) tid(34d0) 80070057 The parameter is incorrect.😓 Despite all this effort, WinML's python API still requires Python to be installed. Moreover, the code requires PyTorch, NumPy and Pillow to process the input image and produce meaningful results. While reasearching how to overcome these challenges, I came across TorchScript that aims to solve exactly those issues!

TorchScript is an intermediate representation of a PyTorch model (subclass of nn.Module) that can then be run in a high-performance environment such as C++.

That just means you can run your model in any environment that can run C++. perfect! Turns out ONNX is exactly the same as TorchScript under the hood! TorchScript export is extremely easy too. Just run torch.jit.trace and viola! This time, fortunately, both the demo and full DE⫶TR models were exported without any issues. Hope to get them running first thing next week.

In the meantime, making the NVDA side of things was extremly easy! The NVDA Developer Guide and the Add-on Developer Guide are very well written and make the whole process a breeze! It took me less than 10 minutes to understand and write a Global Plugin to bind a simple Logger to a keyboard gesture. The next step is to identify image elements on the screen/browser, collect the image data and transfer to the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment