Skip to content

Instantly share code, notes, and snippets.

@1duo
Last active June 8, 2018 01:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 1duo/efd3be1dc8eb2c45863b75f20a76f8b2 to your computer and use it in GitHub Desktop.
Save 1duo/efd3be1dc8eb2c45863b75f20a76f8b2 to your computer and use it in GitHub Desktop.
TensorRT Performance on NVIDIA Volta V100 GPU.

TensorRT Performance on NVIDIA Volta V100 GPU

Ubuntu16.04, CUDA9.0, CUDNN7, Python2.7, TensorRT4.0.0.3, ResNet50 trained using Caffe.

Floating Point 32

BatchSize DataType Elapsed Images/sec
1 FP32 2.67725 373.517602016995
2 FP32 4.08289 489.849101984134
4 FP32 6.08236 657.639468890365
8 FP32 8.27648 966.594494277761
16 FP32 13.0539 1225.68734248003
32 FP32 23.1328 1383.31719463273
36 FP32 25.2694 1424.64799322501
40 FP32 27.9643 1430.39518242903
44 FP32 29.0452 1514.88025560161
48 FP32 31.2893 1534.0707526215
52 FP32 32.7521 1587.68445382128
56 FP32 38.3085 1461.81656812457
64 FP32 42.5329 1504.71752455158
128 FP32 80.5629 1588.82066062667
256 FP32 160.733 1592.70342742312

Floating Point 16 using TensorCores

BatchSize DataType Elapsed Images/sec
1 FP16 2.0779 481.255113335579
2 FP16 2.65073 754.509135219355
4 FP16 2.83658 1410.14884121019
8 FP16 3.2938 2428.80563482907
16 FP16 4.35118 3677.16343612537
32 FP16 6.58944 4856.25485625486
36 FP16 7.0443 5110.5148843746
40 FP16 7.63894 5236.328600565
44 FP16 7.93037 5548.29093724505
48 FP16 8.32809 5763.6264737773
52 FP16 8.64164 6017.37633134451
56 FP16 10.7251 5221.39653709523
64 FP16 11.5623 5535.23087966927
128 FP16 21.2693 6018.06359400638
256 FP16 39.7254 6444.2397055788

Integer 8

BatchSize DataType Elapsed Images/sec
1 INT8 1.22277 817.815288238998
2 INT8 1.71519 1166.05157446114
4 INT8 2.45023 1632.49980614065
8 INT8 2.90755 2751.45741259824
16 INT8 4.26373 3752.58283240261
32 INT8 6.78533 4716.05655141312
40 INT8 8.21903 4866.75434935753
48 INT8 8.94034 5368.9233295378
56 INT8 10.9543 5112.14774106972
64 INT8 11.7372 5452.74852605391
128 INT8 22.0699 5799.75441664892
256 INT8 43.2764 5915.46431773438

Test Command:

BATCH=56
DEPLOY=./deploy.prototxt
MODEL=./snapshots/resnet50.caffemodel

/usr/src/tensorrt/bin/giexec --deploy=$DEPLOY --model=$MODEL --output=prob --batch=$BATCH
/usr/src/tensorrt/bin/giexec --deploy=$DEPLOY --model=$MODEL --output=prob --batch=$BATCH --half2
/usr/src/tensorrt/bin/giexec --deploy=$DEPLOY --model=$MODEL --output=prob --batch=$BATCH --int8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment