Skip to content

Instantly share code, notes, and snippets.

@ChaiBapchya
Last active August 1, 2019 20:42
Show Gist options
  • Save ChaiBapchya/5763cf20d9b2ebdede57038c10be9ed5 to your computer and use it in GitHub Desktop.
Save ChaiBapchya/5763cf20d9b2ebdede57038c10be9ed5 to your computer and use it in GitHub Desktop.
Large Tensor Support ON vs OFF | MKL ON vs OFF | ctx=CPU

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✔ CUDA
  15. CUDA_RTC : ✔ CUDA_RTC
  16. CUDNN : ✔ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✔ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✔ INT64_TENSOR_SIZE
  22. JEMALLOC : ✔ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✖ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✔ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✔ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Avg Forward Time (ms) Avg. Backward Time (ms) Max Mem Usage (Storage) (Bytes) Inputs
FullyConnected 39.8663 109.7813 8.192 {'data': (32, 3, 256, 256), 'weight': (64, 196608), 'bias': (64,), 'num_hidden': 64}
argmin 55.1682 --- 4.096 {'data': (1024, 1024), 'axis': 0}
argmin 0.495 --- 0.002 {'data': (10000, 1), 'axis': 0}
argmin 35.3864 --- 0.2 {'data': (10000, 100), 'axis': 0}
argsort 139.6459 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
argsort 4.372 --- 20.0 {'data': (10000, 1), 'axis': 0}
argsort 174.554 --- 4000.0 {'data': (10000, 100), 'axis': 0}
broadcast_like 0.1231 --- 0.012 {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]}
broadcast_to 1.8884 --- 2097.1521 {'data': (1024, 1024), 'shape': (1024, 1024)}
broadcast_to 0.1103 --- 20.0 {'data': (10000, 1), 'shape': (10000, 1)}
broadcast_to 1.8062 --- 4000.0 {'data': (10000, 100), 'shape': (10000, 100)}
clip 0.5253 0.546 4194.3042 {'data': (1024, 1024), 'a_min': 1, 'a_max': 8}
clip 0.091 0.0767 20.0 {'data': (10000, 1), 'a_min': 1, 'a_max': 8}
clip 0.5106 0.5201 2000.0 {'data': (10000, 100), 'a_min': 1, 'a_max': 8}
depth_to_space 0.1011 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
depth_to_space 1.4567 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
diag 0.3312 43.2945 4.092 {'data': (1024, 1024), 'k': 1}
diag 0.0787 0.4899 --- {'data': (10000, 1), 'k': 1}
diag 0.0947 41.6317 0.396 {'data': (10000, 100), 'k': 1}
expand_dims 0.4528 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
expand_dims 0.0648 --- 20.0 {'data': (10000, 1), 'axis': 0}
expand_dims 0.4266 --- 2000.0 {'data': (10000, 100), 'axis': 0}
flip 0.0893 --- 0.064 {'data': (1, 4, 2, 4), 'axis': 0}
flip 0.7804 --- 500.0 {'data': (10, 25, 10, 100), 'axis': 0}
ones_like 0.2238 --- 2097.1521 {'data': (1024, 1024)}
ones_like 0.0953 --- 40.0 {'data': (10000, 1)}
ones_like 0.2198 --- 2000.0 {'data': (10000, 100)}
pick 0.0908 43.4164 4.096 {'data': (1024, 1024), 'index': (1024,), 'axis': 0}
pick 0.0893 0.4924 0.002 {'data': (10000, 1), 'index': (1,), 'axis': 0}
pick 0.0893 40.8091 0.4 {'data': (10000, 100), 'index': (100,), 'axis': 0}
random_randint 4.6972 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_randint 0.107 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_randint 4.501 --- 2000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
random_uniform 10.9605 --- 4194.3042 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_uniform 0.1915 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_uniform 10.4434 --- 4000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
softmax 3.8177 2.9884 2097.1521 {'data': (1024, 1024), 'axis': 0}
softmax 0.6586 0.2648 20.0 {'data': (10000, 1), 'axis': 0}
softmax 6.1592 5.1133 4000.0 {'data': (10000, 100), 'axis': 0}
sort 179.5931 --- 8388.6084 {'data': (1024, 1024), 'axis': 0}
sort 4.3598 --- 60.0 {'data': (10000, 1), 'axis': 0}
sort 149.9891 --- 8000.0 {'data': (10000, 100), 'axis': 0}
space_to_depth 0.0776 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
space_to_depth 1.5666 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
split --- --- 4194.3042 {'data': (1024, 1024), 'num_outputs': 2, 'axis': 0}
split --- --- 30.0 {'data': (10000, 1), 'num_outputs': 2, 'axis': 0}
split --- --- 4000.0 {'data': (10000, 100), 'num_outputs': 2, 'axis': 0}
swapaxes 0.0934 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 0, 'dim2': 1}
swapaxes 0.0816 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 1, 'dim2': 2}
swapaxes 0.0762 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 2, 'dim2': 3}
swapaxes 0.0817 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 3, 'dim2': 0}
take 0.1875 79.8202 2097.1521 {'a': (1024, 1024), 'indices': (1024,), 'axis': 0}
take 0.0945 0.1832 0.002 {'a': (10000, 1), 'indices': (1,), 'axis': 0}
take 0.0949 32.4908 20.0 {'a': (10000, 100), 'indices': (100,), 'axis': 0}
tile 5.9054 13.9355 4194.3042 {'data': (1024, 1024), 'reps': (2,)}
tile 0.1792 0.1644 40.0 {'data': (10000, 1), 'reps': (2,)}
tile 7.7242 20.1307 8000.0 {'data': (10000, 100), 'reps': (2,)}
topk 140.7752 --- 4.096 {'data': (1024, 1024), 'axis': 0, 'k': 1}
topk 1.6848 --- 0.004 {'data': (10000, 1), 'axis': 0, 'k': 1}
topk 122.0279 --- 0.2 {'data': (10000, 100), 'axis': 0, 'k': 1}
transpose 0.0677 --- 0.064 {'data': (1, 4, 2, 4)}
transpose 1.58 --- 500.0 {'data': (10, 25, 10, 100)}
zeros_like 0.209 --- 2097.1521 {'data': (1024, 1024)}
zeros_like 0.0545 --- 20.0 {'data': (10000, 1)}
zeros_like 0.214 --- 2000.0 {'data': (10000, 100)}

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✔ CUDA
  15. CUDA_RTC : ✔ CUDA_RTC
  16. CUDNN : ✔ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✔ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✖ INT64_TENSOR_SIZE
  22. JEMALLOC : ✔ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✖ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✔ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✔ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Avg Forward Time (ms) Avg. Backward Time (ms) Max Mem Usage (Storage) (Bytes) Inputs
FullyConnected 41.8118 108.6247 8.192 {'data': (32, 3, 256, 256), 'weight': (64, 196608), 'bias': (64,), 'num_hidden': 64}
argmin 22.8819 --- 4.096 {'data': (1024, 1024), 'axis': 0}
argmin 0.1974 --- 0.002 {'data': (10000, 1), 'axis': 0}
argmin 27.6257 --- 0.4 {'data': (10000, 100), 'axis': 0}
argsort 94.9644 --- 4194.3042 {'data': (1024, 1024), 'axis': 0}
argsort 4.0834 --- 20.0 {'data': (10000, 1), 'axis': 0}
argsort 105.5335 --- 4000.0 {'data': (10000, 100), 'axis': 0}
broadcast_like 0.1044 --- 0.012 {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]}
broadcast_to 1.2694 --- 4194.3042 {'data': (1024, 1024), 'shape': (1024, 1024)}
broadcast_to 0.1047 --- 20.0 {'data': (10000, 1), 'shape': (10000, 1)}
broadcast_to 1.1546 --- 2000.0 {'data': (10000, 100), 'shape': (10000, 100)}
clip 0.523 0.5314 2097.1521 {'data': (1024, 1024), 'a_min': 1, 'a_max': 8}
clip 0.0893 0.0773 20.0 {'data': (10000, 1), 'a_min': 1, 'a_max': 8}
clip 0.5078 0.4981 4000.0 {'data': (10000, 100), 'a_min': 1, 'a_max': 8}
depth_to_space 0.1104 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
depth_to_space 0.8795 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
diag 0.2689 38.0228 4.092 {'data': (1024, 1024), 'k': 1}
diag 0.079 0.4904 --- {'data': (10000, 1), 'k': 1}
diag 0.0922 42.0098 0.396 {'data': (10000, 100), 'k': 1}
expand_dims 0.3525 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
expand_dims 0.0416 --- 20.0 {'data': (10000, 1), 'axis': 0}
expand_dims 0.3492 --- 4000.0 {'data': (10000, 100), 'axis': 0}
flip 0.0559 --- 0.064 {'data': (1, 4, 2, 4), 'axis': 0}
flip 0.3763 --- 500.0 {'data': (10, 25, 10, 100), 'axis': 0}
ones_like 0.2117 --- 2097.1521 {'data': (1024, 1024)}
ones_like 0.0925 --- 20.0 {'data': (10000, 1)}
ones_like 0.2218 --- 2000.0 {'data': (10000, 100)}
pick 0.0904 42.9551 4.096 {'data': (1024, 1024), 'index': (1024,), 'axis': 0}
pick 0.0773 0.4988 0.002 {'data': (10000, 1), 'index': (1,), 'axis': 0}
pick 0.0928 42.168 0.4 {'data': (10000, 100), 'index': (100,), 'axis': 0}
random_randint 3.3302 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_randint 0.1273 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_randint 3.1818 --- 2000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
random_uniform 11.0115 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_uniform 0.2116 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_uniform 9.1816 --- 2000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
softmax 3.651 1.9923 4194.3042 {'data': (1024, 1024), 'axis': 0}
softmax 0.6612 0.2269 20.0 {'data': (10000, 1), 'axis': 0}
softmax 5.883 4.8679 2000.0 {'data': (10000, 100), 'axis': 0}
sort 108.5507 --- 6291.4561 {'data': (1024, 1024), 'axis': 0}
sort 4.4298 --- 40.0 {'data': (10000, 1), 'axis': 0}
sort 102.9695 --- 6000.0 {'data': (10000, 100), 'axis': 0}
space_to_depth 0.1023 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
space_to_depth 0.9134 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
split --- --- 3145.728 {'data': (1024, 1024), 'num_outputs': 2, 'axis': 0}
split --- --- 30.0 {'data': (10000, 1), 'num_outputs': 2, 'axis': 0}
split --- --- 3000.0 {'data': (10000, 100), 'num_outputs': 2, 'axis': 0}
swapaxes 0.0888 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 0, 'dim2': 1}
swapaxes 0.0907 --- 0.128 {'data': (1, 4, 2, 4), 'dim1': 1, 'dim2': 2}
swapaxes 0.0901 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 2, 'dim2': 3}
swapaxes 0.0907 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 3, 'dim2': 0}
take 0.169 97.8915 4194.3042 {'a': (1024, 1024), 'indices': (1024,), 'axis': 0}
take 0.071 0.1744 0.002 {'a': (10000, 1), 'indices': (1,), 'axis': 0}
take 0.097 41.6468 40.0 {'a': (10000, 100), 'indices': (100,), 'axis': 0}
tile 4.0543 12.992 4194.3042 {'data': (1024, 1024), 'reps': (2,)}
tile 0.158 0.153 40.0 {'data': (10000, 1), 'reps': (2,)}
tile 3.9041 11.8146 8000.0 {'data': (10000, 100), 'reps': (2,)}
topk 65.0825 --- 2.048 {'data': (1024, 1024), 'axis': 0, 'k': 1}
topk 0.7007 --- 0.002 {'data': (10000, 1), 'axis': 0, 'k': 1}
topk 57.9985 --- 0.2 {'data': (10000, 100), 'axis': 0, 'k': 1}
transpose 0.0808 --- 0.064 {'data': (1, 4, 2, 4)}
transpose 0.689 --- 500.0 {'data': (10, 25, 10, 100)}
zeros_like 0.1891 --- 2097.1521 {'data': (1024, 1024)}
zeros_like 0.0531 --- 20.0 {'data': (10000, 1)}
zeros_like 0.1904 --- 2000.0 {'data': (10000, 100)}

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✔ CUDA
  15. CUDA_RTC : ✔ CUDA_RTC
  16. CUDNN : ✔ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✔ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✔ INT64_TENSOR_SIZE
  22. JEMALLOC : ✔ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✔ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✔ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✔ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Avg Forward Time (ms) Avg. Backward Time (ms) Max Mem Usage (Storage) (Bytes) Inputs
FullyConnected 41.5176 124.3903 8.192 {'data': (32, 3, 256, 256), 'weight': (64, 196608), 'bias': (64,), 'num_hidden': 64}
argmin 45.2478 --- 4.096 {'data': (1024, 1024), 'axis': 0}
argmin 0.5421 --- 0.002 {'data': (10000, 1), 'axis': 0}
argmin 34.6892 --- 0.2 {'data': (10000, 100), 'axis': 0}
argsort 144.3384 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
argsort 4.4859 --- 40.0 {'data': (10000, 1), 'axis': 0}
argsort 134.118 --- 2000.0 {'data': (10000, 100), 'axis': 0}
broadcast_like 0.1028 --- 0.012 {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]}
broadcast_to 1.9175 --- 4194.3042 {'data': (1024, 1024), 'shape': (1024, 1024)}
broadcast_to 0.1036 --- 20.0 {'data': (10000, 1), 'shape': (10000, 1)}
broadcast_to 1.8308 --- 2000.0 {'data': (10000, 100), 'shape': (10000, 100)}
clip 0.5579 0.5456 4194.3042 {'data': (1024, 1024), 'a_min': 1, 'a_max': 8}
clip 0.0926 0.0762 20.0 {'data': (10000, 1), 'a_min': 1, 'a_max': 8}
clip 0.4732 0.3969 4000.0 {'data': (10000, 100), 'a_min': 1, 'a_max': 8}
depth_to_space 0.1126 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
depth_to_space 1.0985 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
diag 0.108 32.8571 2.046 {'data': (1024, 1024), 'k': 1}
diag 0.0832 0.429 --- {'data': (10000, 1), 'k': 1}
diag 0.0909 31.3499 0.198 {'data': (10000, 100), 'k': 1}
expand_dims 0.4257 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
expand_dims 0.0732 --- 20.0 {'data': (10000, 1), 'axis': 0}
expand_dims 0.4134 --- 2000.0 {'data': (10000, 100), 'axis': 0}
flip 0.096 --- 0.064 {'data': (1, 4, 2, 4), 'axis': 0}
flip 0.5488 --- 500.0 {'data': (10, 25, 10, 100), 'axis': 0}
ones_like 0.2262 --- 2097.1521 {'data': (1024, 1024)}
ones_like 0.0955 --- 20.0 {'data': (10000, 1)}
ones_like 0.2235 --- 2000.0 {'data': (10000, 100)}
pick 0.1023 32.9918 2.048 {'data': (1024, 1024), 'index': (1024,), 'axis': 0}
pick 0.0972 0.4588 0.002 {'data': (10000, 1), 'index': (1,), 'axis': 0}
pick 0.0944 31.4887 0.2 {'data': (10000, 100), 'index': (100,), 'axis': 0}
random_randint 3.9903 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_randint 0.1409 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_randint 4.5664 --- 4000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
random_uniform 9.7449 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_uniform 0.1891 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_uniform 12.2506 --- 4000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
softmax 3.7926 2.0096 4194.3042 {'data': (1024, 1024), 'axis': 0}
softmax 0.6555 0.2536 20.0 {'data': (10000, 1), 'axis': 0}
softmax 5.6594 4.7096 4000.0 {'data': (10000, 100), 'axis': 0}
sort 194.9989 --- 8388.6084 {'data': (1024, 1024), 'axis': 0}
sort 5.0902 --- 60.0 {'data': (10000, 1), 'axis': 0}
sort 151.5422 --- 8000.0 {'data': (10000, 100), 'axis': 0}
space_to_depth 0.089 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
space_to_depth 1.6132 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
split --- --- 3145.728 {'data': (1024, 1024), 'num_outputs': 2, 'axis': 0}
split --- --- 30.0 {'data': (10000, 1), 'num_outputs': 2, 'axis': 0}
split --- --- 3000.0 {'data': (10000, 100), 'num_outputs': 2, 'axis': 0}
swapaxes 0.099 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 0, 'dim2': 1}
swapaxes 0.1007 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 1, 'dim2': 2}
swapaxes 0.099 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 2, 'dim2': 3}
swapaxes 0.1035 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 3, 'dim2': 0}
take 0.1628 104.0802 4194.3042 {'a': (1024, 1024), 'indices': (1024,), 'axis': 0}
take 0.0959 0.1749 0.004 {'a': (10000, 1), 'indices': (1,), 'axis': 0}
take 0.0996 41.8818 40.0 {'a': (10000, 100), 'indices': (100,), 'axis': 0}
tile 6.9296 15.6418 4194.3042 {'data': (1024, 1024), 'reps': (2,)}
tile 0.2069 0.1754 80.0 {'data': (10000, 1), 'reps': (2,)}
tile 7.3153 16.9362 8000.0 {'data': (10000, 100), 'reps': (2,)}
topk 161.6241 --- 4.096 {'data': (1024, 1024), 'axis': 0, 'k': 1}
topk 1.317 --- 0.002 {'data': (10000, 1), 'axis': 0, 'k': 1}
topk 110.4164 --- 0.2 {'data': (10000, 100), 'axis': 0, 'k': 1}
transpose 0.103 --- 0.064 {'data': (1, 4, 2, 4)}
transpose 0.2255 --- 500.0 {'data': (10, 25, 10, 100)}
zeros_like 0.2274 --- 2097.1521 {'data': (1024, 1024)}
zeros_like 0.0658 --- 40.0 {'data': (10000, 1)}
zeros_like 0.2247 --- 2000.0 {'data': (10000, 100)}

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✔ CUDA
  15. CUDA_RTC : ✔ CUDA_RTC
  16. CUDNN : ✔ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✔ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✖ INT64_TENSOR_SIZE
  22. JEMALLOC : ✔ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✔ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✔ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✔ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Avg Forward Time (ms) Avg. Backward Time (ms) Max Mem Usage (Storage) (Bytes) Inputs
FullyConnected 56.0858 152.1666 8.192 {'data': (32, 3, 256, 256), 'weight': (64, 196608), 'bias': (64,), 'num_hidden': 64}
argmin 35.4789 --- 4.096 {'data': (1024, 1024), 'axis': 0}
argmin 1.0013 --- 0.004 {'data': (10000, 1), 'axis': 0}
argmin 26.2885 --- 0.4 {'data': (10000, 100), 'axis': 0}
argsort 116.4863 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
argsort 4.3148 --- 20.0 {'data': (10000, 1), 'axis': 0}
argsort 118.7844 --- 4000.0 {'data': (10000, 100), 'axis': 0}
broadcast_like 0.0884 --- 0.012 {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]}
broadcast_to 1.2232 --- 2097.1521 {'data': (1024, 1024), 'shape': (1024, 1024)}
broadcast_to 0.1175 --- 20.0 {'data': (10000, 1), 'shape': (10000, 1)}
broadcast_to 1.1786 --- 2000.0 {'data': (10000, 100), 'shape': (10000, 100)}
clip 0.5708 0.8614 4194.3042 {'data': (1024, 1024), 'a_min': 1, 'a_max': 8}
clip 0.0988 0.0644 40.0 {'data': (10000, 1), 'a_min': 1, 'a_max': 8}
clip 0.7765 0.8307 4000.0 {'data': (10000, 100), 'a_min': 1, 'a_max': 8}
depth_to_space 0.1156 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
depth_to_space 0.9208 --- 1000.0 {'data': (10, 25, 10, 100), 'block_size': 5}
diag 0.0968 44.3838 4.092 {'data': (1024, 1024), 'k': 1}
diag 0.0779 0.4817 0.0 {'data': (10000, 1), 'k': 1}
diag 0.0935 42.0182 0.396 {'data': (10000, 100), 'k': 1}
expand_dims 0.4314 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
expand_dims 0.0651 --- 40.0 {'data': (10000, 1), 'axis': 0}
expand_dims 0.4453 --- 2000.0 {'data': (10000, 100), 'axis': 0}
flip 0.0898 --- 0.064 {'data': (1, 4, 2, 4), 'axis': 0}
flip 0.6172 --- 500.0 {'data': (10, 25, 10, 100), 'axis': 0}
ones_like 0.219 --- 2097.1521 {'data': (1024, 1024)}
ones_like 0.0938 --- 20.0 {'data': (10000, 1)}
ones_like 0.2058 --- 2000.0 {'data': (10000, 100)}
pick 0.1024 43.8672 4.096 {'data': (1024, 1024), 'index': (1024,), 'axis': 0}
pick 0.0888 0.4865 0.004 {'data': (10000, 1), 'index': (1,), 'axis': 0}
pick 0.1478 38.391 0.4 {'data': (10000, 100), 'index': (100,), 'axis': 0}
random_randint 3.403 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_randint 0.1292 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_randint 3.2702 --- 4000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
random_uniform 9.6187 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_uniform 0.1884 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_uniform 9.265 --- 2000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
softmax 4.053 2.3699 4194.3042 {'data': (1024, 1024), 'axis': 0}
softmax 0.679 0.2539 40.0 {'data': (10000, 1), 'axis': 0}
softmax 5.1732 4.2304 4000.0 {'data': (10000, 100), 'axis': 0}
sort 125.4268 --- 6291.4561 {'data': (1024, 1024), 'axis': 0}
sort 5.0743 --- 40.0 {'data': (10000, 1), 'axis': 0}
sort 109.5413 --- 6000.0 {'data': (10000, 100), 'axis': 0}
space_to_depth 0.1071 --- 0.128 {'data': (1, 4, 2, 4), 'block_size': 2}
space_to_depth 0.9492 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
split --- --- 4194.3042 {'data': (1024, 1024), 'num_outputs': 2, 'axis': 0}
split --- --- 40.0 {'data': (10000, 1), 'num_outputs': 2, 'axis': 0}
split --- --- 4000.0 {'data': (10000, 100), 'num_outputs': 2, 'axis': 0}
swapaxes 0.0796 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 0, 'dim2': 1}
swapaxes 0.0792 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 1, 'dim2': 2}
swapaxes 0.098 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 2, 'dim2': 3}
swapaxes 0.0967 --- 0.064 {'data': (1, 4, 2, 4), 'dim1': 3, 'dim2': 0}
take 0.7868 102.5945 4194.3042 {'a': (1024, 1024), 'indices': (1024,), 'axis': 0}
take 0.101 0.2013 0.002 {'a': (10000, 1), 'indices': (1,), 'axis': 0}
take 0.2079 40.784 40.0 {'a': (10000, 100), 'indices': (100,), 'axis': 0}
tile 2.8714 9.0327 4194.3042 {'data': (1024, 1024), 'reps': (2,)}
tile 0.1468 0.1414 40.0 {'data': (10000, 1), 'reps': (2,)}
tile 3.4732 10.2888 8000.0 {'data': (10000, 100), 'reps': (2,)}
topk 91.2732 --- 4.096 {'data': (1024, 1024), 'axis': 0, 'k': 1}
topk 0.9208 --- 0.002 {'data': (10000, 1), 'axis': 0, 'k': 1}
topk 82.5583 --- 0.4 {'data': (10000, 100), 'axis': 0, 'k': 1}
transpose 0.1052 --- 0.064 {'data': (1, 4, 2, 4)}
transpose 0.2086 --- 500.0 {'data': (10, 25, 10, 100)}
zeros_like 0.2224 --- 2097.1521 {'data': (1024, 1024)}
zeros_like 0.0588 --- 20.0 {'data': (10000, 1)}
zeros_like 0.2694 --- 2000.0 {'data': (10000, 100)}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment