Skip to content

Instantly share code, notes, and snippets.

@hsandid
Last active January 9, 2021 17:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hsandid/646ea5396efea296d4251d55434681fc to your computer and use it in GitHub Desktop.
Save hsandid/646ea5396efea296d4251d55434681fc to your computer and use it in GitHub Desktop.

Arrow FYP - Update on Software Component

General Update

  • Progress has been made on porting tests from the MLPerf Inference Benchmark suite to C++.
  • I did not start working on the gem5 model yet. I plan to do so over the remainder of the winter break.

Porting ML Inference Algorithms from Python to C++.

Research :

  • The MLPerf Inference reference implementation supports multiple Python ML frameworks, like TensorFlow, PyTorch, ONNX.
  • Using tools like Cython\Nuitka to port the reference implementation 'as-is' lead to many issues (i.e. missing dependencies).
  • I've opted to look into the Python ML framework mentioned above and see if they offered any C\C++ API.
  • TensorFlow and PyTorch both offer a C++ API, which allows to write pure C++ code to import ML models and do inference on them.

Implementation :

  • I am currently using PyTorch's C++ API/library to write inference tests in pure C++.
  • Code Demo

Running valid MLPerf Inference Tests

Research :

  • The MLPerf inference submission system contains a system-under-test (SUT), the Load Generator (LoadGen), a data set, and an accuracy script. The data set, LoadGen, and Accuracy script are fixed for all submissions and are provided by MLPerf. Submitters implement the SUT according to their architecture's requirements, and engineering judgment.

Table-3

  • For the purposes of our research, we will focus on porting the Edge-device suite offered by MLPerf :
Task Model Dataset QSL Size Quality Required Scenarios Reference App Framework
Image Classification Resnet50-v1.5 ImageNet (224x224) 1024 99% of FP32 (76.46%) Single Stream, Offline Link tensorflow, pytorch, onnx
Object Detection (large) SSD-ResNet34 COCO (1200x1200) 64 99% of FP32 (0.20 mAP) Single Stream, Offline Link tensorflow, pytorch, onnx
Object Detection (small) SSD-MobileNets-v1 COCO (300x300) 256 99% of FP32 (0.22 mAP) Single Stream, Offline Link tensorflow, pytorch, onnx
Medical Image Segmentation 3D UNET BraTS 2019 (224x224x160) 16 99% of FP32 and 99.9% of FP32 (0.85300 mean DICE score) Single Stream, Offline Link tensorflow(?), pytorch, onnx (?)
Speech-to-Text RNNT Librispeech dev-clean (samples < 15 seconds) 2513 99% of FP32 (1 - WER, where WER=7.452253714852645% Single Stream, Offline Link tensorflow (?), pytorch, onnx (?)
Language Processing BERT SQuAD v1.1 (max_seq_len=384) 10833 99% of FP32 (f1_score=90.874%) Single Stream, Offline Link pytorch
  • There are four evaluation scenarios in MLPerf Inference, which have been selected to represent real-world critical inference applications : (1) Single-stream, (2) Multi-stream, (3) Server, (4) and Off-line.
Scenario Query Generation Duration Samples/query Latency Constraint Tail Latency Performance Metric
Single stream LoadGen sends next query as soon as SUT completes the previous query 1024 queries and 60 seconds 1 None 90% 90%-ile measured latency
Offline LoadGen sends all queries to the SUT at start 1 query and 60 seconds At least 24,576 None N/A Measured throughput
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment