Skip to content

Instantly share code, notes, and snippets.

@trevor-m
Last active January 7, 2021 23:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save trevor-m/2fe5f6451a7739bf7493e8b91d2a2e4c to your computer and use it in GitHub Desktop.
Save trevor-m/2fe5f6451a7739bf7493e8b91d2a2e4c to your computer and use it in GitHub Desktop.
Slow conv2d trt
==21743== NVPROF is profiling process 21743, command: python3 -m pytest ../tests/python/contrib/test_tensorrt.py -s -k test_slow
enabled targets: llvm -device=arm_cpu; nvptx; cuda; llvm
pytest marker:
============================= test session starts ==============================
platform linux -- Python 3.6.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1
rootdir: /data/neo-ai-tvm, configfile: pytest.ini
plugins: arraydiff-0.2, cov-2.10.1, openfiles-0.3.0, remotedata-0.3.2, doctestplus-0.1.3
collected 48 items / 47 deselected / 1 selected
../tests/python/contrib/test_tensorrt.py [21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:245: CUDNN Found 8 fwd algorithms, choosing CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED
[21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 0) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED - time: 74.9478 ms, Memory: 3085369344
[21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 1) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED - time: 80.2795 ms, Memory: 3085369344
[21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 2) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: 152.414 ms, Memory: 18882048
[21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 3) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: 154.598 ms, Memory: 18882048
[21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 4) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM - time: 200.901 ms, Memory: 0
[21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 5) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD - time: 214.456 ms, Memory: 52429824
[21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 6) CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING - time: 262.18 ms, Memory: 4287102976
[21:29:43] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 7) CUDNN_CONVOLUTION_FWD_ALGO_GEMM - time: 284.236 ms, Memory: 8028979200
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:245: CUDNN Found 8 fwd algorithms, choosing CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 0) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED - time: 74.7333 ms, Memory: 3085369344
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 1) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED - time: 79.6577 ms, Memory: 3085369344
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 2) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: 154.622 ms, Memory: 18882048
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 3) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: 155.829 ms, Memory: 18882048
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 4) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM - time: 201.082 ms, Memory: 0
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 5) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD - time: 205.48 ms, Memory: 52429824
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 6) CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING - time: 261.588 ms, Memory: 4287102976
[21:29:45] /data/neo-ai-tvm/src/runtime/contrib/cudnn/conv_forward.cc:248: 7) CUDNN_CONVOLUTION_FWD_ALGO_GEMM - time: 294.803 ms, Memory: 8028979200
[21:29:45] /data/neo-ai-tvm/src/driver/driver_api.cc:241: Specified target cuda -keys=cuda,gpu -libs=cudnn -max_num_threads=1024 -thread_warp_size=32 but cannot find device code. Did you forget to bind?
Mean inference time (std dev): 96.47 ms (1.35 ms)
.
=============================== warnings summary ===============================
tests/python/contrib/test_tensorrt.py::test_slow
/data/neo-ai-tvm/tests/python/contrib/test_tensorrt.py:110: DeprecationWarning: legacy graph runtime behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_runtime.GraphModule for the new recommended usage.
graph, lib, params = relay.build(mod, params=params, target="cuda -libs=cudnn")
-- Docs: https://docs.pytest.org/en/stable/warnings.html
================= 1 passed, 47 deselected, 1 warning in 12.90s =================
==21743== Profiling application: python3 -m pytest ../tests/python/contrib/test_tensorrt.py -s -k test_slow
==21743== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 77.78% 3.06654s 41 74.794ms 62.740ms 93.789ms volta_sgemm_128x128_nn
19.34% 762.39ms 41 18.595ms 18.396ms 19.170ms void cudnn::winograd_nonfused::winogradForwardData4x4<float, float>(cudnn::winograd_nonfused::WinogradDataParams<float, float>)
2.37% 93.606ms 41 2.2831ms 2.2780ms 2.2896ms void cudnn::winograd_nonfused::winogradForwardOutput4x4<float, float>(cudnn::winograd_nonfused::WinogradOutputParams<float, float>)
0.51% 19.972ms 41 487.11us 481.53us 494.27us void cudnn::winograd_nonfused::winogradForwardFilter4x4<float, float>(cudnn::winograd_nonfused::WinogradFilterParams<float, float>)
API calls: 75.71% 3.94098s 11 358.27ms 84.617ms 400.60ms cudaStreamSynchronize
14.22% 740.17ms 164 4.5132ms 4.4100us 739.03ms cudaLaunchKernel
9.91% 515.66ms 226 2.2817ms 3.0950us 36.999ms cuModuleUnload
0.15% 7.9887ms 12 665.72us 5.9030us 3.3861ms cudaFree
0.01% 500.03us 1 500.03us 500.03us 500.03us cudaFreeHost
0.00% 66.178us 12 5.5140us 2.5890us 29.841us cudaStreamDestroy
0.00% 50.382us 56 899ns 463ns 3.6890us cudaSetDevice
0.00% 23.518us 30 783ns 563ns 2.7960us cudaEventDestroy
0.00% 22.516us 123 183ns 147ns 357ns cudaGetLastError
0.00% 13.924us 4 3.4810us 1.6840us 7.6110us cudaDeviceSynchronize
0.00% 1.9860us 3 662ns 309ns 1.0130us cuDevicePrimaryCtxRelease
0.00% 1.3780us 1 1.3780us 1.3780us 1.3780us cuDeviceGetCount
==1519== NVPROF is profiling process 1519, command: python3 run.py
[TensorRT] VERBOSE: Applying generic optimizations to the graph for inference.
[TensorRT] VERBOSE: Original: 1 layers
[TensorRT] VERBOSE: After dead-layer removal: 1 layers
[TensorRT] VERBOSE: After Myelin optimization: 1 layers
[TensorRT] VERBOSE: After scale fusion: 1 layers
[TensorRT] VERBOSE: After vertical fusions: 1 layers
[TensorRT] VERBOSE: After dupe layer removal: 1 layers
[TensorRT] VERBOSE: After final dead-layer removal: 1 layers
[TensorRT] VERBOSE: After tensor merging: 1 layers
[TensorRT] VERBOSE: After concat removal: 1 layers
[TensorRT] VERBOSE: Graph construction and optimization completed in 0.000835183 seconds.
[TensorRT] VERBOSE: Constructing optimization profile number 0 [1/1].
[TensorRT] VERBOSE: --------------- Timing Runner: <reformat> (Reformat)
[TensorRT] VERBOSE: Tactic: 1002 time 11.811
[TensorRT] VERBOSE: Tactic: 0 time 21.3475
[TensorRT] VERBOSE: Fastest Tactic: 1002 Time: 11.811
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(1,33,1089,2230272) -> Float(1,33,1089,278784) ***************
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_medium_nn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn_winograd) Set Tactic Name: volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_xregs_large_nn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_small_nn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_xregs_large_nn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_small_nn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_medium_nn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_medium_nn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_small_nn_v1
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 0) [Convolution] (CaskConvolution)
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: 1825138533642645384 time 160.088
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn_winograd) Set Tactic Name: volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
[TensorRT] VERBOSE: Tactic: 2775507031594384867 time 203.421
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_xregs_large_nn_v1
[TensorRT] VERBOSE: Tactic: 2842488832350522458 time 175.159
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: 3915320020053085238 time 157.629
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_xregs_large_nn_v1
[TensorRT] VERBOSE: Tactic: 6448355332020552203 time 159.349
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: 6808617066150061604 time 176.213
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: -8060443123034038864 time 178.616
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_medium_nn_v1
[TensorRT] VERBOSE: Tactic: -4420849921117327522 time 236.755
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_relu_small_nn_v1
[TensorRT] VERBOSE: Tactic: -3946921629105938337 time 223.079
[TensorRT] VERBOSE: Fastest Tactic: 3915320020053085238 Time: 157.629
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 0) [Convolution] (CudaConvolution)
[TensorRT] VERBOSE: Tactic: 0 time 235.459
[TensorRT] VERBOSE: Tactic: 1 time 154.482
[TensorRT] VERBOSE: Tactic: 2 time 312.281
[TensorRT] VERBOSE: Tactic: 5 time 242.73
[TensorRT] VERBOSE: Tactic: 6 time 206.311
[TensorRT] VERBOSE: Tactic: 56 time 229.123
[TensorRT] VERBOSE: Tactic: 57 time 154.966
[TensorRT] VERBOSE: Tactic: 58 time 309.228
[TensorRT] VERBOSE: Tactic: 61 time 242.596
[TensorRT] VERBOSE: Tactic: 62 time 210.168
[TensorRT] VERBOSE: Fastest Tactic: 1 Time: 154.482
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 0) [Convolution] (CudaDepthwiseConvolution)
[TensorRT] VERBOSE: CudaDepthwiseConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 0) [Convolution] (CublasConvolution)
[TensorRT] VERBOSE: CublasConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: >>>>>>>>>>>>>>> Chose Runner Type: CudaConvolution Tactic: 1
[TensorRT] VERBOSE:
[TensorRT] VERBOSE: *************** Autotuning format combination: Float(2048,67584,1,2230272) -> Float(256,8448,1,278784) ***************
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 0) [Convolution] (CaskConvolution)
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: 861694390046228376 time 166.846
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: 1017870653102653567 time 164.691
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: 5258189349241541167 time 199.062
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: 5821621277990374316 time 170.188
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: 5863767799113001648 time 197.194
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: -9147980667639709536 time 165.595
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: -8850904373104590857 time 200.528
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: -7751035352149795660 time 164.486
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: -3853827649136781465 time 172.252
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: -3263369460438823196 time 202.494
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: Tactic: -423878181466897819 time 202.942
[TensorRT] VERBOSE: Fastest Tactic: -7751035352149795660 Time: 164.486
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 0) [Convolution] (CudaConvolution)
[TensorRT] VERBOSE: CudaConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 0) [Convolution] (CudaDepthwiseConvolution)
[TensorRT] VERBOSE: CudaDepthwiseConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: --------------- Timing Runner: (Unnamed Layer* 0) [Convolution] (CublasConvolution)
[TensorRT] VERBOSE: CublasConvolution has no valid tactics for this config, skipping
[TensorRT] VERBOSE: >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: -7751035352149795660
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE:
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x64_sliced1x2_ldg4_relu_exp_large_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x32_sliced1x4_ldg4_relu_exp_medium_nhwc_tn_v1
[TensorRT] VERBOSE: (Unnamed Layer* 0) [Convolution] (scudnn) Set Tactic Name: volta_scudnn_128x128_ldg4_relu_exp_small_nhwc_tn_v1
[TensorRT] VERBOSE: --------------- Timing Runner: <reformat> (Reformat)
[TensorRT] VERBOSE: Tactic: 1002 time 1.49984
[TensorRT] VERBOSE: Tactic: 0 time 1.33254
[TensorRT] VERBOSE: Fastest Tactic: 0 Time: 1.33254
[TensorRT] VERBOSE: Formats and tactics selection completed in 20.1546 seconds.
[TensorRT] VERBOSE: After reformat layers: 1 layers
[TensorRT] VERBOSE: Block size 8589934592
[TensorRT] VERBOSE: Total Activation Memory: 8589934592
[TensorRT] INFO: Detected 1 inputs and 1 output network tensors.
[TensorRT] VERBOSE: Layer: (Unnamed Layer* 0) [Convolution] Weights: 18874368 HostPersistent: 8 DevicePersistent: 0
[TensorRT] VERBOSE: Total Host Persistent Memory: 8
[TensorRT] VERBOSE: Total Device Persistent Memory: 0
[TensorRT] VERBOSE: Total Weight Memory: 18874368
[TensorRT] VERBOSE: Engine generation completed in 21.6594 seconds.
[TensorRT] VERBOSE: Builder timing cache: created 4 entries, 0 hit(s)
[TensorRT] VERBOSE: Engine Layer Information:
[TensorRT] VERBOSE: Layer(Convolution): (Unnamed Layer* 0) [Convolution], Tactic: 1, input[Float(2048,33,33)] -> output[Float(256,33,33)]
[TensorRT] VERBOSE: Allocated persistent device memory of size 0
[TensorRT] VERBOSE: Allocated activation device memory of size 47793152
[TensorRT] VERBOSE: Assigning persistent memory blocks for various profiles
==1519== Profiling application: python3 run.py
==1519== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 154.90ms 1 154.90ms 154.90ms 154.90ms volta_scudnn_128x128_relu_small_nn_v1
0.00% 2.2720us 1 2.2720us 2.2720us 2.2720us cask_cudnn::computeOffsetsKernel(cask_cudnn::ComputeOffsetsParams)
API calls: 52.00% 1.52723s 2 763.62ms 14.257us 1.52722s cudaLaunchKernel
38.59% 1.13347s 1 1.13347s 1.13347s 1.13347s cuCtxDetach
5.27% 154.87ms 1 154.87ms 154.87ms 154.87ms cudaStreamSynchronize
3.98% 117.02ms 2 58.512ms 13.107ms 103.92ms cuMemFreeHost
0.08% 2.3135ms 2 1.1567ms 1.1184ms 1.1951ms cuMemFree
0.05% 1.4694ms 14 104.96us 4.2850us 1.0537ms cudaFree
0.01% 433.84us 1 433.84us 433.84us 433.84us cudaFreeHost
0.00% 59.491us 12 4.9570us 3.6560us 14.304us cudaStreamDestroy
0.00% 40.794us 52 784ns 418ns 4.4290us cudaEventDestroy
0.00% 18.659us 9 2.0730us 1.0440us 4.9380us cudaDeviceSynchronize
0.00% 5.6750us 1 5.6750us 5.6750us 5.6750us cudaEventRecord
0.00% 3.0340us 5 606ns 295ns 936ns cuCtxPopCurrent
0.00% 1.3990us 1 1.3990us 1.3990us 1.3990us cuDeviceGetCount
==1519== NVTX result:
==1519== Thread "<unnamed>" (id = 2083157824)
==1519== Domain "TensorRT"
==1519== Range "(Unnamed Layer* 0) [Convolution]"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 1.52760s 1 1.52760s 1.52760s 1.52760s (Unnamed Layer* 0) [Convolution]
GPU activities: 100.00% 154.90ms 1 154.90ms 154.90ms 154.90ms volta_scudnn_128x128_relu_small_nn_v1
0.00% 2.2720us 1 2.2720us 2.2720us 2.2720us cask_cudnn::computeOffsetsKernel(cask_cudnn::ComputeOffsetsParams)
API calls: 100.00% 1.52723s 2 763.62ms 14.257us 1.52722s cudaLaunchKernel
==1519== Range "ExecutionContext::execute"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 1.68252s 1 1.68252s 1.68252s 1.68252s ExecutionContext::execute
GPU activities: 100.00% 154.90ms 1 154.90ms 154.90ms 154.90ms volta_scudnn_128x128_relu_small_nn_v1
0.00% 2.2720us 1 2.2720us 2.2720us 2.2720us cask_cudnn::computeOffsetsKernel(cask_cudnn::ComputeOffsetsParams)
API calls: 100.00% 1.52723s 2 763.62ms 14.257us 1.52722s cudaLaunchKernel
==1519== Range "ExecutionContext::recompute"
Type Time(%) Time Calls Avg Min Max Name
Range: 100.00% 16.296us 1 16.296us 16.296us 16.296us ExecutionContext::recompute
No kernels were profiled in this range.
No API activities were profiled in this range.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment