1duo/tensorrt.v100.md

## tensorrt.v100.md

      
    Raw
  

              tensorrt.v100.md
            
          
    TensorRT Performance on NVIDIA Volta V100 GPU

Ubuntu16.04, CUDA9.0, CUDNN7, Python2.7, TensorRT4.0.0.3, ResNet50 trained using Caffe.
Floating Point 32


BatchSize
DataType
Elapsed
Images/sec


1
FP32
2.67725
373.517602016995


2
FP32
4.08289
489.849101984134


4
FP32
6.08236
657.639468890365


8
FP32
8.27648
966.594494277761


16
FP32
13.0539
1225.68734248003


32
FP32
23.1328
1383.31719463273


36
FP32
25.2694
1424.64799322501


40
FP32
27.9643
1430.39518242903


44
FP32
29.0452
1514.88025560161


48
FP32
31.2893
1534.0707526215


52
FP32
32.7521
1587.68445382128


56
FP32
38.3085
1461.81656812457


64
FP32
42.5329
1504.71752455158


128
FP32
80.5629
1588.82066062667


256
FP32
160.733
1592.70342742312


Floating Point 16 using TensorCores


BatchSize
DataType
Elapsed
Images/sec


1
FP16
2.0779
481.255113335579


2
FP16
2.65073
754.509135219355


4
FP16
2.83658
1410.14884121019


8
FP16
3.2938
2428.80563482907


16
FP16
4.35118
3677.16343612537


32
FP16
6.58944
4856.25485625486


36
FP16
7.0443
5110.5148843746


40
FP16
7.63894
5236.328600565


44
FP16
7.93037
5548.29093724505


48
FP16
8.32809
5763.6264737773


52
FP16
8.64164
6017.37633134451


56
FP16
10.7251
5221.39653709523


64
FP16
11.5623
5535.23087966927


128
FP16
21.2693
6018.06359400638


256
FP16
39.7254
6444.2397055788


Integer 8


BatchSize
DataType
Elapsed
Images/sec


1
INT8
1.22277
817.815288238998


2
INT8
1.71519
1166.05157446114


4
INT8
2.45023
1632.49980614065


8
INT8
2.90755
2751.45741259824


16
INT8
4.26373
3752.58283240261


32
INT8
6.78533
4716.05655141312


40
INT8
8.21903
4866.75434935753


48
INT8
8.94034
5368.9233295378


56
INT8
10.9543
5112.14774106972


64
INT8
11.7372
5452.74852605391


128
INT8
22.0699
5799.75441664892


256
INT8
43.2764
5915.46431773438


Test Command:

BATCH=56
DEPLOY=./deploy.prototxt
MODEL=./snapshots/resnet50.caffemodel

/usr/src/tensorrt/bin/giexec --deploy=$DEPLOY --model=$MODEL --output=prob --batch=$BATCH
/usr/src/tensorrt/bin/giexec --deploy=$DEPLOY --model=$MODEL --output=prob --batch=$BATCH --half2
/usr/src/tensorrt/bin/giexec --deploy=$DEPLOY --model=$MODEL --output=prob --batch=$BATCH --int8
BatchSize	DataType	Elapsed	Images/sec
1	FP32	2.67725	373.517602016995
2	FP32	4.08289	489.849101984134
4	FP32	6.08236	657.639468890365
8	FP32	8.27648	966.594494277761
16	FP32	13.0539	1225.68734248003
32	FP32	23.1328	1383.31719463273
36	FP32	25.2694	1424.64799322501
40	FP32	27.9643	1430.39518242903
44	FP32	29.0452	1514.88025560161
48	FP32	31.2893	1534.0707526215
52	FP32	32.7521	1587.68445382128
56	FP32	38.3085	1461.81656812457
64	FP32	42.5329	1504.71752455158
128	FP32	80.5629	1588.82066062667
256	FP32	160.733	1592.70342742312
BatchSize	DataType	Elapsed	Images/sec
1	FP16	2.0779	481.255113335579
2	FP16	2.65073	754.509135219355
4	FP16	2.83658	1410.14884121019
8	FP16	3.2938	2428.80563482907
16	FP16	4.35118	3677.16343612537
32	FP16	6.58944	4856.25485625486
36	FP16	7.0443	5110.5148843746
40	FP16	7.63894	5236.328600565
44	FP16	7.93037	5548.29093724505
48	FP16	8.32809	5763.6264737773
52	FP16	8.64164	6017.37633134451
56	FP16	10.7251	5221.39653709523
64	FP16	11.5623	5535.23087966927
128	FP16	21.2693	6018.06359400638
256	FP16	39.7254	6444.2397055788
BatchSize	DataType	Elapsed	Images/sec
1	INT8	1.22277	817.815288238998
2	INT8	1.71519	1166.05157446114
4	INT8	2.45023	1632.49980614065
8	INT8	2.90755	2751.45741259824
16	INT8	4.26373	3752.58283240261
32	INT8	6.78533	4716.05655141312
40	INT8	8.21903	4866.75434935753
48	INT8	8.94034	5368.9233295378
56	INT8	10.9543	5112.14774106972
64	INT8	11.7372	5452.74852605391
128	INT8	22.0699	5799.75441664892
256	INT8	43.2764	5915.46431773438