Code generation for fast inference of Deep Learning models ROOT/TMVA SOFIE (“System for Optimized Fast Inference code Emit”) is a new package introduced in this release that generates C++ functions easily invokable for the fast inference of trained neural network models. It takes ONNX model files as inputs and produces C++ header files that can be included and utilized in a “plug-and-go” style. This is a new development and it is currently still in experimental stage. SOFIE can take your trained ONNX model and generate blazingly fast C++ code from it, depending only on BLAS.
- Announcement https://root.cern/doc/v626/release-notes.html#sofie-code-generation-for-fast-inference-of-deep-learning-models
- SOFIE was created by Sitong An https://sitongan.github.io/ a Marie Curie Fellow at CERN
- Supported ONNX operators: https://github.com/root-project/root/blob/master/tmva/sofie/inc/TMVA/OperatorList.hxx
- SOFIE is part of TMVA, the ROOT Machine Learning library https://root.cern/manual/tmva/
- Sitang's demo models can be found here: https://github.com/sitongan/TMVAFastInferencePrototype
Requires building ROOT from source with experimental flags: https://root.cern/install/build_from_source/
git clone git@github.com:root-project/root.git
cd root
mkdir root_install root_build
cd root_build
cmake -DCMAKE_INSTALL_PREFIX=../root_install ../ -Dtmva-sofie=ON -Dtmva-pymva=On -DPython3_EXECUTABLE=CHOOSEYOURPYTHONBINARY
cmake --build . --target install -j2
source <installdir>/bin/thisroot.sh
Requires Eigen library to be installed on Bela.
I installed via apt-get install libeigen3-dev
but for some reason could only include it by editing the model files with #include "../../../usr/include/eigen3/Eigen/Eigen"
:/.
ONNX | Time (us) | CPU % | Notes |
---|---|---|---|
Linear_2 | 650 | 9.3 | |
Linear_4 | 720 | 9.5 | |
Linear_8 | 920 | 10 | |
Linear_16 | 1480 | 10.3 | |
Linear_32 | segfault | ||
Linear_64 | No .hxx file available | ||
Linear_event | undefined reference to `sgemv_' | ||
Linear_RDF | undefined reference to `sgemv_' | ||
LinearNN | segfault |
Tried with this: https://github.com/rodrigodzf/DeepLearningForBela/blob/main/python/mlp_pytorch.py
And got ~570ms per pass, apparently 25x slower than ArmNN!