Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ChaiBapchya/f2d6879b8ca7634773fb8c5b93367dae to your computer and use it in GitHub Desktop.
Save ChaiBapchya/f2d6879b8ca7634773fb8c5b93367dae to your computer and use it in GitHub Desktop.
Array Manipulation branch MXNet OpPerf

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✖ CUDA
  15. CUDA_RTC : ✖ CUDA_RTC
  16. CUDNN : ✖ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✖ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✔ INT64_TENSOR_SIZE
  22. JEMALLOC : ✔ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✔ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✔ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✔ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Avg Forward Time (ms) Avg. Backward Time (ms) Max Mem Usage (Storage) (Bytes) Inputs
BatchNorm 2.4547 1.3703 25165.8359 {'data': (32, 3, 256, 256), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)}
BatchNorm 6.0028 1.5685 38400.0117 {'data': (32, 3, 10000, 10), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)}
BlockGrad 0.5237 --- 2097.1521 {'data': (1024, 1024)}
BlockGrad 0.0145 --- 20.0 {'data': (10000, 1)}
BlockGrad 0.5857 --- 2000.0 {'data': (10000, 100)}
CTCLoss --- --- 3492195.5 {'data': (1024, 100, 100), 'label': (100, 100)}
Convolution 0.5066 0.9342 98815.9375 {'data': (32, 3, 256), 'weight': (64, 3, 3), 'bias': (64,), 'kernel': (3,), 'stride': (1,), 'dilate': (1,), 'pad': (0,), 'num_filter': 64, 'layout': 'NCW'}
Deconvolution 5.2545 8.364 2835571.75 {'data': (32, 3, 256), 'weight': (3, 64, 3), 'bias': (64,), 'kernel': (3,), 'stride': (1,), 'dilate': (1,), 'pad': (0,), 'adj': (0,), 'num_filter': 64, 'no_bias': False, 'layout': 'NCW'}
Dropout 2.8362 0.6007 37748.7344 {'data': (32, 3, 256, 256), 'p': 0.5, 'mode': 'always'}
Dropout 0.2912 0.0546 400.0 {'data': (10000, 10), 'p': 0.5, 'mode': 'always'}
FullyConnected 2.465 86.5237 8.192 {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 196608), 'bias': (64,), 'flatten': True}
FullyConnected 1.1209 32.8333 6291.4561 {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 256), 'bias': (64,), 'flatten': False}
LeakyReLU 0.0696 0.0817 2097.1521 {'data': (1024, 1024), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.0486 0.0466 40.0 {'data': (10000, 1), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.0854 0.0839 2000.0 {'data': (10000, 100), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.1382 0.7319 2097.1521 {'data': (1024, 1024), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.0844 0.0497 20.0 {'data': (10000, 1), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.1359 0.6828 2000.0 {'data': (10000, 100), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.7239 0.2762 2097.1521 {'data': (1024, 1024), 'act_type': 'selu'}
LeakyReLU 0.0382 0.0347 20.0 {'data': (10000, 1), 'act_type': 'selu'}
LeakyReLU 0.7685 0.3158 4000.0 {'data': (10000, 100), 'act_type': 'selu'}
LeakyReLU 0.4101 8.3719 4194.3042 {'data': (1024, 1024), 'act_type': 'prelu', 'gamma': (1, 1024)}
LeakyReLU 0.0425 0.7114 20.0 {'data': (10000, 1), 'act_type': 'prelu', 'gamma': (1, 1)}
LeakyReLU 0.3988 7.9939 4000.0 {'data': (10000, 100), 'act_type': 'prelu', 'gamma': (1, 100)}
MakeLoss 0.2846 0.0365 3478006.75 {'data': (1024, 1024)}
MakeLoss 0.0324 0.0239 3475949.75 {'data': (10000, 1)}
MakeLoss 0.086 0.036 3479929.75 {'data': (10000, 100)}
Pooling 0.2163 0.2732 49.152 {'data': (32, 3, 256), 'kernel': 3, 'pool_type': 'avg', 'global_pool': 0, 'stride': 1, 'pad': 1}
SwapAxis 3.0576 3.1598 4194.3042 {'data': (1024, 1024), 'dim1': 0, 'dim2': 1}
SwapAxis 0.0548 0.0503 20.0 {'data': (10000, 1), 'dim1': 0, 'dim2': 1}
SwapAxis 2.5586 2.5601 2000.0 {'data': (10000, 100), 'dim1': 0, 'dim2': 1}
abs 0.0555 0.2705 2097.1521 {'data': (1024, 1024)}
abs 0.0163 0.0775 20.0 {'data': (10000, 1)}
abs 0.0544 0.3541 4000.0 {'data': (10000, 100)}
adam_update 0.3242 --- 2770335.75 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
adam_update 0.0376 --- 2768278.5 {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
adam_update 0.6383 --- 2772258.5 {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
arccos 0.8253 0.6489 2097.1521 {'data': (1024, 1024)}
arccos 0.0488 0.0268 20.0 {'data': (10000, 1)}
arccos 0.8131 0.7397 2000.0 {'data': (10000, 100)}
arccosh 0.7674 0.8093 2097.1521 {'data': (1024, 1024)}
arccosh 0.0329 0.0293 20.0 {'data': (10000, 1)}
arccosh 0.7672 0.7945 2000.0 {'data': (10000, 100)}
arcsin 0.752 0.6436 2097.1521 {'data': (1024, 1024)}
arcsin 0.0312 0.0273 20.0 {'data': (10000, 1)}
arcsin 0.774 0.7634 2000.0 {'data': (10000, 100)}
arcsinh 1.4861 0.5057 4194.3042 {'data': (1024, 1024)}
arcsinh 0.0447 0.0333 40.0 {'data': (10000, 1)}
arcsinh 1.7063 0.6787 2000.0 {'data': (10000, 100)}
arctan 0.635 0.0452 2097.1521 {'data': (1024, 1024)}
arctan 0.0384 0.0246 20.0 {'data': (10000, 1)}
arctan 0.883 0.0811 2000.0 {'data': (10000, 100)}
arctanh 1.307 0.0802 2097.1521 {'data': (1024, 1024)}
arctanh 0.0547 0.0343 20.0 {'data': (10000, 1)}
arctanh 1.2393 0.0824 4000.0 {'data': (10000, 100)}
argmax 18.4832 --- 2.048 {'data': (1024, 1024), 'axis': 0}
argmax 0.1647 --- 0.002 {'data': (10000, 1), 'axis': 0}
argmax 14.4155 --- 0.2 {'data': (10000, 100), 'axis': 0}
argmax_channel 0.6482 --- 2.048 {'data': (1024, 1024)}
argmax_channel 0.0435 --- 20.0 {'data': (10000, 1)}
argmax_channel 0.4859 --- 20.0 {'data': (10000, 100)}
argmin 17.7454 --- 2.048 {'data': (1024, 1024), 'axis': 0}
argmin 0.1595 --- 0.002 {'data': (10000, 1), 'axis': 0}
argmin 14.3292 --- 0.2 {'data': (10000, 100), 'axis': 0}
argsort 64.5372 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
argsort 1.3778 --- 20.0 {'data': (10000, 1), 'axis': 0}
argsort 59.7691 --- 2000.0 {'data': (10000, 100), 'axis': 0}
batch_dot 535.6865 --- 134217.7344 {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)}
batch_dot 380.7793 --- 128000.0 {'lhs': (32, 1000, 10), 'rhs': (32, 1000, 10), 'transpose_b': True}
batch_dot 4.269 --- 12.8 {'lhs': (32, 1000, 1), 'rhs': (32, 100, 1000), 'transpose_a': True, 'transpose_b': True}
broadcast_add 0.0805 0.104 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_add 0.0389 0.0496 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_add 0.0286 0.0267 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_axes 0.0367 --- 4.096 {'data': (1, 1024), 'axis': 0, 'size': 2}
broadcast_axes 0.0364 --- 0.004 {'data': (1, 1), 'axis': 0, 'size': 2}
broadcast_axes 0.0357 --- 0.4 {'data': (1, 100), 'axis': 0, 'size': 2}
broadcast_axis 0.0362 --- 4.096 {'data': (1, 1024), 'axis': 0, 'size': 2}
broadcast_axis 0.0357 --- 0.004 {'data': (1, 1), 'axis': 0, 'size': 2}
broadcast_axis 0.0362 --- 0.4 {'data': (1, 100), 'axis': 0, 'size': 2}
broadcast_div 0.0696 0.1254 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_div 0.0392 0.0562 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_div 0.025 0.029 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_equal 0.0679 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_equal 0.0364 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_equal 0.0218 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_greater 0.065 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_greater 0.1909 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_greater 0.0237 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_greater_equal 0.0641 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_greater_equal 0.0362 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_greater_equal 0.0237 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_hypot 0.4676 1.0331 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_hypot 0.0744 0.1468 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_hypot 0.0369 0.0512 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_lesser 0.0637 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_lesser 0.0384 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_lesser 0.0235 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_lesser_equal 0.0665 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_lesser_equal 0.0366 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_lesser_equal 0.0236 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_like 0.9105 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_like 0.1282 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_like 0.052 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_and 0.0664 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_and 0.0367 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_and 0.0243 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_or 0.0673 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_or 0.0351 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_or 0.0248 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_xor 0.1612 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_xor 0.043 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_xor 0.0342 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_maximum 0.0975 0.1265 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_maximum 0.0435 0.0544 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_maximum 0.041 0.0382 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_minimum 0.0929 0.1232 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_minimum 0.0391 0.0526 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_minimum 0.0348 0.0348 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_minus 0.0547 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_minus 0.0365 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_minus 0.0248 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_mod 1.452 0.1408 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_mod 0.2005 0.0536 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_mod 0.0432 0.0611 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_mul 0.0719 0.1064 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_mul 0.0398 0.054 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_mul 0.0259 0.0287 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_not_equal 0.0593 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_not_equal 0.0362 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_not_equal 0.0223 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_plus 0.0623 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_plus 0.0379 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_plus 0.0235 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_power 2.4128 5.668 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_power 0.2574 0.5698 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_power 0.1101 0.0918 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_sub 0.0693 0.0848 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_sub 0.0387 0.0507 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_sub 0.0263 0.0257 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_to 0.9184 --- 4194.3042 {'data': (1, 1024), 'shape': (1024, 1024)}
broadcast_to 0.0463 --- 20.0 {'data': (1, 1), 'shape': (10000, 1)}
broadcast_to 1.4525 --- 4000.0 {'data': (1, 100), 'shape': (10000, 100)}
cbrt 1.3946 0.0813 4194.3042 {'data': (1024, 1024)}
cbrt 0.0418 0.024 20.0 {'data': (10000, 1)}
cbrt 1.4034 0.0819 4000.0 {'data': (10000, 100)}
ceil 0.1926 --- 2097.1521 {'data': (1024, 1024)}
ceil 0.054 --- 20.0 {'data': (10000, 1)}
ceil 0.1758 --- 4000.0 {'data': (10000, 100)}
cos 0.54 0.5433 4194.3042 {'data': (1024, 1024)}
cos 0.0329 0.0309 20.0 {'data': (10000, 1)}
cos 0.532 0.5122 2000.0 {'data': (10000, 100)}
cosh 0.6319 1.0282 4194.3042 {'data': (1024, 1024)}
cosh 0.0358 0.0352 20.0 {'data': (10000, 1)}
cosh 0.8056 1.4169 2000.0 {'data': (10000, 100)}
ctc_loss --- 2.9964 3559850.25 {'data': (1024, 100, 100), 'label': (100, 100)}
degrees 0.0807 0.0472 4194.3042 {'data': (1024, 1024)}
degrees 0.0264 0.021 40.0 {'data': (10000, 1)}
degrees 0.1904 0.0483 4000.0 {'data': (10000, 100)}
depth_to_space 0.0502 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
depth_to_space 0.9073 --- 1000.0 {'data': (10, 25, 10, 100), 'block_size': 5}
diag 0.035 0.3348 2.046 {'data': (1024, 1024), 'k': 1}
diag 0.0298 0.046 --- {'data': (10000, 1), 'k': 1}
diag 0.05 0.3138 0.198 {'data': (10000, 100), 'k': 1}
dot 3.8458 5.8184 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
dot 0.4278 4.5966 4000.0 {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}
dot 0.1662 0.24 0.2 {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_a': True, 'transpose_b': True}
elemwise_add 0.0886 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_add 0.0635 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_add 0.0541 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_div 0.0763 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_div 0.0389 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_div 0.0224 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_mul 0.0768 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_mul 0.0383 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_mul 0.0217 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_sub 0.0767 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_sub 0.0375 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_sub 0.0221 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
erf 0.8126 0.6297 4194.3042 {'data': (1024, 1024)}
erf 0.0429 0.0347 40.0 {'data': (10000, 1)}
erf 0.8759 0.7495 4000.0 {'data': (10000, 100)}
erfinv 2.9241 0.6029 4194.3042 {'data': (1024, 1024)}
erfinv 0.0591 0.0358 20.0 {'data': (10000, 1)}
erfinv 3.3703 0.748 4000.0 {'data': (10000, 100)}
exp 0.6265 --- 2097.1521 {'data': (1024, 1024)}
exp 0.032 --- 20.0 {'data': (10000, 1)}
exp 0.604 --- 2000.0 {'data': (10000, 100)}
expand_dims 0.0704 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
expand_dims 0.0521 --- 20.0 {'data': (10000, 1), 'axis': 0}
expand_dims 0.0698 --- 2000.0 {'data': (10000, 100), 'axis': 0}
expm1 1.025 0.6943 4194.3042 {'data': (1024, 1024)}
expm1 0.0402 0.0322 40.0 {'data': (10000, 1)}
expm1 0.9514 0.6478 2000.0 {'data': (10000, 100)}
fix 0.3445 --- 2097.1521 {'data': (1024, 1024)}
fix 0.0292 --- 20.0 {'data': (10000, 1)}
fix 0.3399 --- 4000.0 {'data': (10000, 100)}
flatten 0.0562 --- 4194.3042 {'data': (1024, 1024)}
flatten 0.0369 --- 20.0 {'data': (10000, 1)}
flatten 0.0532 --- 2000.0 {'data': (10000, 100)}
flip 1.2305 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
flip 0.0402 --- 40.0 {'data': (10000, 1), 'axis': 0}
flip 1.1872 --- 4000.0 {'data': (10000, 100), 'axis': 0}
floor 0.1904 --- 4194.3042 {'data': (1024, 1024)}
floor 0.0619 --- 40.0 {'data': (10000, 1)}
floor 0.1711 --- 4000.0 {'data': (10000, 100)}
ftml_update 1.5383 --- 2776550.0 {'weight': (1024, 1024), 'grad': (1024, 1024), 'd': (1024, 1024), 'v': (1024, 1024), 'z': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftml_update 0.0437 --- 2774492.75 {'weight': (10000, 1), 'grad': (10000, 1), 'd': (10000, 1), 'v': (10000, 1), 'z': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftml_update 1.6264 --- 2778472.75 {'weight': (10000, 100), 'grad': (10000, 100), 'd': (10000, 100), 'v': (10000, 100), 'z': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftrl_update 1.1971 --- 2782764.25 {'weight': (1024, 1024), 'grad': (1024, 1024), 'z': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
ftrl_update 0.0393 --- 2780707.25 {'weight': (10000, 1), 'grad': (10000, 1), 'z': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
ftrl_update 3.3391 --- 2784687.25 {'weight': (10000, 100), 'grad': (10000, 100), 'z': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
gamma 3.812 6.915 2097.1521 {'data': (1024, 1024)}
gamma 0.0609 0.0861 20.0 {'data': (10000, 1)}
gamma 3.6748 6.6537 2000.0 {'data': (10000, 100)}
gammaln 27.0604 2.3544 2097.1521 {'data': (1024, 1024)}
gammaln 0.2901 0.0429 20.0 {'data': (10000, 1)}
gammaln 25.7833 2.0524 2000.0 {'data': (10000, 100)}
hard_sigmoid 0.1153 0.1333 2097.1521 {'data': (1024, 1024), 'alpha': 0.25, 'beta': 0.5}
hard_sigmoid 0.0367 0.032 40.0 {'data': (10000, 1), 'alpha': 0.25, 'beta': 0.5}
hard_sigmoid 0.1108 0.1274 4000.0 {'data': (10000, 100), 'alpha': 0.25, 'beta': 0.5}
identity --- --- 2097.1521 {'data': (1024, 1024)}
identity --- --- 20.0 {'data': (10000, 1)}
identity --- --- 2000.0 {'data': (10000, 100)}
log 0.6242 0.0443 2097.1521 {'data': (1024, 1024)}
log 0.0389 0.025 20.0 {'data': (10000, 1)}
log 0.7693 0.0647 2000.0 {'data': (10000, 100)}
log10 1.0121 0.0825 4194.3042 {'data': (1024, 1024)}
log10 0.0394 0.0222 20.0 {'data': (10000, 1)}
log10 0.9536 0.0782 4000.0 {'data': (10000, 100)}
log1p 1.0159 0.0818 2097.1521 {'data': (1024, 1024)}
log1p 0.0405 0.0229 20.0 {'data': (10000, 1)}
log1p 0.9541 0.0797 2000.0 {'data': (10000, 100)}
log2 0.8358 0.0774 4194.3042 {'data': (1024, 1024)}
log2 0.0412 0.0225 40.0 {'data': (10000, 1)}
log2 0.7859 0.0756 2000.0 {'data': (10000, 100)}
log_softmax 1.6662 0.7953 2097.1521 {'data': (1024, 1024), 'axis': -1, 'temperature': 0.5}
log_softmax 0.0543 0.0452 20.0 {'data': (10000, 1), 'axis': -1, 'temperature': 0.5}
log_softmax 1.5958 0.7584 4000.0 {'data': (10000, 100), 'axis': -1, 'temperature': 0.5}
logical_not 0.0484 --- 4194.3042 {'data': (1024, 1024)}
logical_not 0.0203 --- 20.0 {'data': (10000, 1)}
logical_not 0.0487 --- 4000.0 {'data': (10000, 100)}
make_loss 0.3954 --- 2097.1521 {'data': (1024, 1024)}
make_loss 0.0193 --- 20.0 {'data': (10000, 1)}
make_loss 0.3792 --- 4000.0 {'data': (10000, 100)}
max 1.7964 1.5455 4.096 {'data': (1024, 1024), 'axis': 0}
max 0.3288 0.0406 0.002 {'data': (10000, 1), 'axis': 0}
max 1.0818 0.9315 0.2 {'data': (10000, 100), 'axis': 0}
max_axis 1.7976 --- 4.096 {'data': (1024, 1024), 'axis': 0}
max_axis 0.3277 --- 0.002 {'data': (10000, 1), 'axis': 0}
max_axis 1.5417 --- 0.4 {'data': (10000, 100), 'axis': 0}
mean 1.9617 2.1988 4.096 {'data': (1024, 1024), 'axis': 0}
mean 0.5004 0.0677 0.004 {'data': (10000, 1), 'axis': 0}
mean 1.4045 1.8053 0.2 {'data': (10000, 100), 'axis': 0}
min 1.0851 0.978 2.048 {'data': (1024, 1024), 'axis': 0}
min 0.3317 0.0362 0.002 {'data': (10000, 1), 'axis': 0}
min 1.2383 1.4078 0.4 {'data': (10000, 100), 'axis': 0}
min_axis 1.7617 --- 4.096 {'data': (1024, 1024), 'axis': 0}
min_axis 0.265 --- 0.002 {'data': (10000, 1), 'axis': 0}
min_axis 1.0856 --- 0.2 {'data': (10000, 100), 'axis': 0}
mp_sgd_mom_update 0.2292 --- 2788978.5 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_mom_update 0.045 --- 2786921.5 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_mom_update 0.2199 --- 2790901.5 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.1828 --- 2795193.0 {'weight': (1024, 1024), 'grad': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.0876 --- 2793135.75 {'weight': (10000, 1), 'grad': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.1723 --- 2797115.75 {'weight': (10000, 100), 'grad': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
nanprod 1.3615 1.1304 2.048 {'data': (1024, 1024), 'axis': 0}
nanprod 0.3334 0.0379 0.002 {'data': (10000, 1), 'axis': 0}
nanprod 1.3297 1.4296 0.4 {'data': (10000, 100), 'axis': 0}
nansum 1.8905 1.4722 4.096 {'data': (1024, 1024), 'axis': 0}
nansum 0.3359 0.0407 0.002 {'data': (10000, 1), 'axis': 0}
nansum 1.6127 1.411 0.4 {'data': (10000, 100), 'axis': 0}
negative 0.0472 --- 4194.3042 {'data': (1024, 1024)}
negative 0.0205 --- 40.0 {'data': (10000, 1)}
negative 0.1123 --- 4000.0 {'data': (10000, 100)}
ones_like 0.0431 --- 2097.1521 {'data': (1024, 1024)}
ones_like 0.0187 --- 40.0 {'data': (10000, 1)}
ones_like 0.0431 --- 2000.0 {'data': (10000, 100)}
pad --- --- 0.192 {'data': (1, 4, 2, 4), 'mode': 'constant', 'pad_width': (0, 0, 0, 0, 1, 1, 1, 1)}
pad --- --- 612.0 {'data': (10, 25, 10, 100), 'mode': 'constant', 'pad_width': (0, 0, 0, 0, 1, 1, 1, 1)}
prod 1.849 1.4723 2.048 {'data': (1024, 1024), 'axis': 0}
prod 0.3253 0.0415 0.002 {'data': (10000, 1), 'axis': 0}
prod 1.5403 1.4172 0.4 {'data': (10000, 100), 'axis': 0}
radians 0.0587 0.0526 2097.1521 {'data': (1024, 1024)}
radians 0.0251 0.0219 40.0 {'data': (10000, 1)}
radians 0.0551 0.0502 2000.0 {'data': (10000, 100)}
random_exponential 2.5054 --- 4194.3042 {'shape': (1024, 1024)}
random_exponential 0.0538 --- 20.0 {'shape': (10000, 1)}
random_exponential 2.4009 --- 4000.0 {'shape': (10000, 100)}
random_gamma 7.2308 --- 4194.3042 {'shape': (1024, 1024)}
random_gamma 0.0958 --- 40.0 {'shape': (10000, 1)}
random_gamma 6.9175 --- 4000.0 {'shape': (10000, 100)}
random_generalized_negative_binomial 9.6304 --- 4194.3042 {'shape': (1024, 1024)}
random_generalized_negative_binomial 0.1185 --- 20.0 {'shape': (10000, 1)}
random_generalized_negative_binomial 8.3206 --- 4000.0 {'shape': (10000, 100)}
random_negative_binomial 8.1951 --- 4194.3042 {'k': 1, 'p': 1, 'shape': (1024, 1024)}
random_negative_binomial 0.1167 --- 20.0 {'k': 1, 'p': 1, 'shape': (10000, 1)}
random_negative_binomial 7.0832 --- 4000.0 {'k': 1, 'p': 1, 'shape': (10000, 100)}
random_normal 2.7703 --- 4194.3042 {'shape': (1024, 1024)}
random_normal 0.052 --- 20.0 {'shape': (10000, 1)}
random_normal 2.8538 --- 4000.0 {'shape': (10000, 100)}
random_pdf_dirichlet 0.036 --- 0.004 {'sample': (2,), 'alpha': [0.0, 2.5]}
random_pdf_exponential 0.0362 --- 0.004 {'sample': (2,), 'lam': [1.0, 8.5]}
random_pdf_gamma 0.0359 --- 0.004 {'sample': (2,), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]}
random_pdf_generalized_negative_binomial 0.0367 --- 0.004 {'sample': (2,), 'mu': [2.0, 2.5], 'alpha': [0.0, 2.5]}
random_pdf_negative_binomial 0.0379 --- 0.004 {'sample': (2,), 'k': [20, 49], 'p': [0.4, 0.77]}
random_pdf_normal 0.0382 --- 0.004 {'sample': (2,), 'mu': [2.0, 2.5], 'sigma': [1.0, 3.7]}
random_pdf_poisson 0.0366 --- 0.004 {'sample': (2,), 'lam': [1.0, 8.5]}
random_pdf_uniform 0.038 --- 0.004 {'sample': (2,), 'low': [0.0, 2.5], 'high': [1.0, 3.7]}
random_poisson 1.7797 --- 2097.1521 {'shape': (1024, 1024)}
random_poisson 0.0538 --- 20.0 {'shape': (10000, 1)}
random_poisson 2.2778 --- 4000.0 {'shape': (10000, 100)}
random_randint 1.6953 --- 4194.3042 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_randint 0.0494 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_randint 1.6299 --- 4000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
random_uniform 0.9135 --- 4194.3042 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_uniform 0.0407 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_uniform 0.8816 --- 2000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
rcbrt 1.3697 1.3793 2097.1521 {'data': (1024, 1024)}
rcbrt 0.0435 0.0393 20.0 {'data': (10000, 1)}
rcbrt 0.8965 0.8649 2000.0 {'data': (10000, 100)}
reciprocal 0.0572 0.0503 2097.1521 {'data': (1024, 1024)}
reciprocal 0.0265 0.0258 20.0 {'data': (10000, 1)}
reciprocal 0.0541 0.049 2000.0 {'data': (10000, 100)}
relu 0.0679 0.0725 2097.1521 {'data': (1024, 1024)}
relu 0.0307 0.0347 20.0 {'data': (10000, 1)}
relu 0.0631 0.0668 2000.0 {'data': (10000, 100)}
repeat 6.7251 9.2498 8388.6084 {'data': (1024, 1024), 'repeats': 2, 'axis': 0}
repeat 0.0821 0.0882 40.0 {'data': (10000, 1), 'repeats': 2, 'axis': 0}
repeat 6.6146 8.9965 8000.0 {'data': (10000, 100), 'repeats': 2, 'axis': 0}
reshape 0.1159 --- 4194.3042 {'data': (1024, 1024), 'shape': (1024, 1024)}
reshape 0.0393 --- 20.0 {'data': (10000, 1), 'shape': (10000, 1)}
reshape 0.1467 --- 2000.0 {'data': (10000, 100), 'shape': (10000, 100)}
reshape_like 0.4189 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
reshape_like 0.0525 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
reshape_like 0.0183 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
reverse 1.348 1.2252 4194.3042 {'data': (1024, 1024), 'axis': 0}
reverse 0.0448 0.0382 20.0 {'data': (10000, 1), 'axis': 0}
reverse 1.2309 1.1925 2000.0 {'data': (10000, 100), 'axis': 0}
rint 0.2959 --- 4194.3042 {'data': (1024, 1024)}
rint 0.0339 --- 20.0 {'data': (10000, 1)}
rint 0.2864 --- 2000.0 {'data': (10000, 100)}
rmsprop_update 0.3979 --- 2801407.25 {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmsprop_update 0.0359 --- 2799350.0 {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmsprop_update 0.5375 --- 2803330.0 {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 0.5847 --- 2807621.5 {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'g': (1024, 1024), 'delta': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'gamma2': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 0.0501 --- 2805564.25 {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'g': (10000, 1), 'delta': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'gamma2': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 1.0995 --- 2809544.25 {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'g': (10000, 100), 'delta': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'gamma2': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4}
round 0.334 --- 2097.1521 {'data': (1024, 1024)}
round 0.0812 --- 40.0 {'data': (10000, 1)}
round 0.3124 --- 2000.0 {'data': (10000, 100)}
rsqrt 0.8001 0.8138 4194.3042 {'data': (1024, 1024)}
rsqrt 0.04 0.0341 20.0 {'data': (10000, 1)}
rsqrt 0.9483 0.8989 4000.0 {'data': (10000, 100)}
sample_exponential 4.659 --- 8388.6084 {'lam': [1.0, 8.5], 'shape': (1024, 1024)}
sample_exponential 0.0723 --- 40.0 {'lam': [1.0, 8.5], 'shape': (10000, 1)}
sample_exponential 3.9018 --- 4000.0 {'lam': [1.0, 8.5], 'shape': (10000, 100)}
sample_gamma 12.5227 --- 8388.6084 {'alpha': [0.0, 2.5], 'shape': (1024, 1024), 'beta': [1.0, 0.7]}
sample_gamma 0.2556 --- 40.0 {'alpha': [0.0, 2.5], 'shape': (10000, 1), 'beta': [1.0, 0.7]}
sample_gamma 16.9475 --- 8000.0 {'alpha': [0.0, 2.5], 'shape': (10000, 100), 'beta': [1.0, 0.7]}
sample_generalized_negative_binomial 27.9329 --- 4194.3042 {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'alpha': [0.0, 2.5]}
sample_generalized_negative_binomial 0.304 --- 40.0 {'mu': [2.0, 2.5], 'shape': (10000, 1), 'alpha': [0.0, 2.5]}
sample_generalized_negative_binomial 26.4842 --- 8000.0 {'mu': [2.0, 2.5], 'shape': (10000, 100), 'alpha': [0.0, 2.5]}
sample_negative_binomial 348.2019 --- 8388.6084 {'k': [20, 49], 'shape': (1024, 1024), 'p': [0.4, 0.77]}
sample_negative_binomial 3.392 --- 80.0 {'k': [20, 49], 'shape': (10000, 1), 'p': [0.4, 0.77]}
sample_negative_binomial 333.3187 --- 8000.0 {'k': [20, 49], 'shape': (10000, 100), 'p': [0.4, 0.77]}
sample_normal 5.4808 --- 8388.6084 {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'sigma': [1.0, 3.7]}
sample_normal 0.0865 --- 40.0 {'mu': [2.0, 2.5], 'shape': (10000, 1), 'sigma': [1.0, 3.7]}
sample_normal 5.4253 --- 8000.0 {'mu': [2.0, 2.5], 'shape': (10000, 100), 'sigma': [1.0, 3.7]}
sample_poisson 9.9675 --- 8388.6084 {'lam': [1.0, 8.5], 'shape': (1024, 1024)}
sample_poisson 0.1336 --- 80.0 {'lam': [1.0, 8.5], 'shape': (10000, 1)}
sample_poisson 8.4468 --- 8000.0 {'lam': [1.0, 8.5], 'shape': (10000, 100)}
sample_uniform 1.8085 --- 8388.6084 {'low': [0.0, 2.5], 'shape': (1024, 1024), 'high': [1.0, 3.7]}
sample_uniform 0.0503 --- 40.0 {'low': [0.0, 2.5], 'shape': (10000, 1), 'high': [1.0, 3.7]}
sample_uniform 1.7414 --- 4000.0 {'low': [0.0, 2.5], 'shape': (10000, 100), 'high': [1.0, 3.7]}
sgd_mom_update 0.2096 --- 2813835.75 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_mom_update 0.0435 --- 2811778.75 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_mom_update 0.2025 --- 2815758.75 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.184 --- 2820050.0 {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.0352 --- 2817993.0 {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.1758 --- 2821973.0 {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
shape_array 0.0154 --- 0.016 {'data': (1024, 1024)}
shape_array 0.0167 --- 0.016 {'data': (10000, 1)}
shape_array 0.0169 --- 0.008 {'data': (10000, 100)}
shuffle 0.9945 --- 4194.3042 {'data': (1024, 1024)}
shuffle 0.3834 --- 20.0 {'data': (10000, 1)}
shuffle 1.2289 --- 2000.0 {'data': (10000, 100)}
sigmoid 0.9567 0.0783 2097.1521 {'data': (1024, 1024)}
sigmoid 0.0308 0.0189 20.0 {'data': (10000, 1)}
sigmoid 0.7113 0.0588 2000.0 {'data': (10000, 100)}
sign 0.2668 0.0479 2097.1521 {'data': (1024, 1024)}
sign 0.0665 0.0161 20.0 {'data': (10000, 1)}
sign 0.2572 0.0461 2000.0 {'data': (10000, 100)}
signsgd_update 0.2092 --- 2826264.5 {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
signsgd_update 0.0347 --- 2824207.25 {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
signsgd_update 0.1997 --- 2828187.25 {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
signum_update 0.3595 --- 2832478.75 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
signum_update 0.038 --- 2830421.5 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
signum_update 0.3437 --- 2834401.5 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
sin 0.5153 0.5249 2097.1521 {'data': (1024, 1024)}
sin 0.0286 0.0261 20.0 {'data': (10000, 1)}
sin 0.4978 0.4983 2000.0 {'data': (10000, 100)}
sinh 1.6528 1.0047 4194.3042 {'data': (1024, 1024)}
sinh 0.0413 0.0326 20.0 {'data': (10000, 1)}
sinh 1.4752 0.9075 2000.0 {'data': (10000, 100)}
size_array 0.0118 --- 0.008 {'data': (1024, 1024)}
size_array 0.012 --- 0.004 {'data': (10000, 1)}
size_array 0.0119 --- 0.008 {'data': (10000, 100)}
smooth_l1 0.3027 0.2908 3525181.5 {'data': (1024, 1024)}
smooth_l1 0.0653 0.059 3523124.5 {'data': (10000, 1)}
smooth_l1 0.4443 0.2827 3527104.5 {'data': (10000, 100)}
softmax 1.7664 0.1404 2097.1521 {'data': (1024, 1024), 'axis': -1, 'temperature': 0.5}
softmax 0.0518 0.0401 20.0 {'data': (10000, 1), 'axis': -1, 'temperature': 0.5}
softmax 1.7034 0.1508 2000.0 {'data': (10000, 100), 'axis': -1, 'temperature': 0.5}
softmax_cross_entropy 0.8424 --- 3525104.5 {'data': (1024, 1024), 'label': (1024,)}
softsign 0.0648 0.0614 2097.1521 {'data': (1024, 1024)}
softsign 0.0205 0.0183 40.0 {'data': (10000, 1)}
softsign 0.1246 0.2147 2000.0 {'data': (10000, 100)}
sort 65.2196 --- 6291.4561 {'data': (1024, 1024), 'axis': 0}
sort 1.4063 --- 60.0 {'data': (10000, 1), 'axis': 0}
sort 60.064 --- 6000.0 {'data': (10000, 100), 'axis': 0}
space_to_depth 0.0603 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
space_to_depth 1.1028 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
split --- --- 2097.1521 {'data': (1024, 1024), 'num_outputs': 1, 'axis': 0}
split --- --- 20.0 {'data': (10000, 1), 'num_outputs': 1, 'axis': 0}
split --- --- 2000.0 {'data': (10000, 100), 'num_outputs': 1, 'axis': 0}
sqrt 0.6781 0.0613 2097.1521 {'data': (1024, 1024)}
sqrt 0.0376 0.0253 20.0 {'data': (10000, 1)}
sqrt 0.7613 0.0631 4000.0 {'data': (10000, 100)}
square 0.1501 0.0604 4194.3042 {'data': (1024, 1024)}
square 0.0274 0.0226 40.0 {'data': (10000, 1)}
square 0.1878 0.0584 4000.0 {'data': (10000, 100)}
stop_gradient --- --- 2097.1521 {'data': (1024, 1024)}
stop_gradient --- --- 20.0 {'data': (10000, 1)}
stop_gradient --- --- 4000.0 {'data': (10000, 100)}
sum 1.9129 1.449 2.048 {'data': (1024, 1024), 'axis': 0}
sum 0.3474 0.0384 0.002 {'data': (10000, 1), 'axis': 0}
sum 1.6377 1.391 0.4 {'data': (10000, 100), 'axis': 0}
sum_axis 1.9243 --- 4.096 {'data': (1024, 1024), 'axis': 0}
sum_axis 0.3433 --- 0.004 {'data': (10000, 1), 'axis': 0}
sum_axis 1.6669 --- 0.2 {'data': (10000, 100), 'axis': 0}
swapaxes 3.0577 --- 4194.3042 {'data': (1024, 1024), 'dim1': 0, 'dim2': 1}
swapaxes 0.0575 --- 20.0 {'data': (10000, 1), 'dim1': 0, 'dim2': 1}
swapaxes 2.7522 --- 4000.0 {'data': (10000, 100), 'dim1': 0, 'dim2': 1}
tan 1.178 0.0876 2097.1521 {'data': (1024, 1024)}
tan 0.0437 0.0247 20.0 {'data': (10000, 1)}
tan 1.1238 0.0835 2000.0 {'data': (10000, 100)}
tanh 1.4582 0.0829 2097.1521 {'data': (1024, 1024)}
tanh 0.0471 0.0262 20.0 {'data': (10000, 1)}
tanh 1.3859 0.0836 4000.0 {'data': (10000, 100)}
tile 6.824 8.9998 8388.6084 {'data': (1024, 1024), 'reps': 2}
tile 0.169 0.1915 40.0 {'data': (10000, 1), 'reps': 2}
tile 6.7106 9.2244 8000.0 {'data': (10000, 100), 'reps': 2}
topk 58.6909 --- 2.048 {'data': (1024, 1024), 'axis': 0, 'k': 1}
topk 0.625 --- 0.004 {'data': (10000, 1), 'axis': 0, 'k': 1}
topk 55.8119 --- 0.4 {'data': (10000, 100), 'axis': 0, 'k': 1}
transpose 0.0938 --- 4194.3042 {'data': (1024, 1024)}
transpose 0.0436 --- 40.0 {'data': (10000, 1)}
transpose 0.214 --- 4000.0 {'data': (10000, 100)}
trunc 0.3037 --- 4194.3042 {'data': (1024, 1024)}
trunc 0.0872 --- 20.0 {'data': (10000, 1)}
trunc 0.3006 --- 4000.0 {'data': (10000, 100)}
zeros_like 0.0465 --- 4194.3042 {'data': (1024, 1024)}
zeros_like 0.0185 --- 40.0 {'data': (10000, 1)}
zeros_like 0.0443 --- 2000.0 {'data': (10000, 100)}

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✔ CUDA
  15. CUDA_RTC : ✔ CUDA_RTC
  16. CUDNN : ✔ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✖ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✖ INT64_TENSOR_SIZE
  22. JEMALLOC : ✔ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✔ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✔ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✔ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Avg Forward Time (ms) Avg. Backward Time (ms) Max Mem Usage (Storage) (Bytes) Inputs
BatchNorm 1.187 2.3393 25165.8359 {'data': (32, 3, 256, 256), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)}
BatchNorm 1.8038 3.561 38400.0117 {'data': (32, 3, 10000, 10), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)}
BlockGrad 0.0461 --- 2097.1521 {'data': (1024, 1024)}
BlockGrad 0.0697 --- 40.0 {'data': (10000, 1)}
BlockGrad 0.043 --- 4000.0 {'data': (10000, 100)}
CTCLoss --- --- 1259642.125 {'data': (1024, 100, 100), 'label': (100, 100)}
Convolution 0.1133 0.1788 17924.0508 {'data': (32, 3, 256), 'weight': (64, 3, 3), 'bias': (64,), 'kernel': (3,), 'stride': (1,), 'dilate': (1,), 'pad': (0,), 'num_filter': 64, 'layout': 'NCW'}
Deconvolution 0.1669 0.1472 648924.3125 {'data': (32, 3, 256), 'weight': (3, 64, 3), 'bias': (64,), 'kernel': (3,), 'stride': (1,), 'dilate': (1,), 'pad': (0,), 'adj': (0,), 'num_filter': 64, 'no_bias': False, 'layout': 'NCW'}
Dropout 0.1527 0.1032 37748.7344 {'data': (32, 3, 256, 256), 'p': 0.5, 'mode': 'always'}
Dropout 0.0665 0.0395 600.0 {'data': (10000, 10), 'p': 0.5, 'mode': 'always'}
FullyConnected 0.335 0.3779 8.192 {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 196608), 'bias': (64,), 'flatten': True}
FullyConnected 0.1999 0.2843 6291.4561 {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 256), 'bias': (64,), 'flatten': False}
LeakyReLU 0.0451 0.0484 4194.3042 {'data': (1024, 1024), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.0495 0.0348 40.0 {'data': (10000, 1), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.0431 0.0461 2000.0 {'data': (10000, 100), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.0467 0.0501 2097.1521 {'data': (1024, 1024), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.032 0.0325 20.0 {'data': (10000, 1), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.0522 0.0519 4000.0 {'data': (10000, 100), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.0501 0.0504 4194.3042 {'data': (1024, 1024), 'act_type': 'selu'}
LeakyReLU 0.0608 0.0443 40.0 {'data': (10000, 1), 'act_type': 'selu'}
LeakyReLU 0.049 0.0504 4000.0 {'data': (10000, 100), 'act_type': 'selu'}
LeakyReLU 0.0734 0.1204 4194.3042 {'data': (1024, 1024), 'act_type': 'prelu', 'gamma': (1, 1024)}
LeakyReLU 0.0492 0.0748 40.0 {'data': (10000, 1), 'act_type': 'prelu', 'gamma': (1, 1)}
LeakyReLU 0.0695 0.1284 4000.0 {'data': (10000, 100), 'act_type': 'prelu', 'gamma': (1, 100)}
MakeLoss 0.0435 0.0333 1245453.375 {'data': (1024, 1024)}
MakeLoss 0.0292 0.0255 1243396.25 {'data': (10000, 1)}
MakeLoss 0.0389 0.0311 1247376.25 {'data': (10000, 100)}
Pooling 0.0242 0.0291 49.152 {'data': (32, 3, 256), 'kernel': 3, 'pool_type': 'avg', 'global_pool': 0, 'stride': 1, 'pad': 1}
SwapAxis 0.0563 0.0481 4194.3042 {'data': (1024, 1024), 'dim1': 0, 'dim2': 1}
SwapAxis 0.0315 0.0274 20.0 {'data': (10000, 1), 'dim1': 0, 'dim2': 1}
SwapAxis 0.0503 0.0433 2000.0 {'data': (10000, 100), 'dim1': 0, 'dim2': 1}
abs 0.0414 0.0412 4194.3042 {'data': (1024, 1024)}
abs 0.0278 0.025 20.0 {'data': (10000, 1)}
abs 0.0353 0.0369 4000.0 {'data': (10000, 100)}
adam_update 0.0801 --- 583688.125 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
adam_update 0.0481 --- 581631.0 {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
adam_update 0.0792 --- 585611.0 {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
arccos 0.0373 0.0416 4194.3042 {'data': (1024, 1024)}
arccos 0.0255 0.024 20.0 {'data': (10000, 1)}
arccos 0.0375 0.0417 2000.0 {'data': (10000, 100)}
arccosh 0.047 0.0455 4194.3042 {'data': (1024, 1024)}
arccosh 0.0289 0.0271 40.0 {'data': (10000, 1)}
arccosh 0.0355 0.0417 2000.0 {'data': (10000, 100)}
arcsin 0.037 0.0442 2097.1521 {'data': (1024, 1024)}
arcsin 0.0241 0.0248 20.0 {'data': (10000, 1)}
arcsin 0.0329 0.041 2000.0 {'data': (10000, 100)}
arcsinh 0.0345 0.0414 2097.1521 {'data': (1024, 1024)}
arcsinh 0.0235 0.0255 20.0 {'data': (10000, 1)}
arcsinh 0.0335 0.0397 2000.0 {'data': (10000, 100)}
arctan 0.0496 0.0494 4194.3042 {'data': (1024, 1024)}
arctan 0.0294 0.0269 20.0 {'data': (10000, 1)}
arctan 0.0367 0.0411 2000.0 {'data': (10000, 100)}
arctanh 0.0473 0.0485 4194.3042 {'data': (1024, 1024)}
arctanh 0.0353 0.0312 20.0 {'data': (10000, 1)}
arctanh 0.0414 0.0414 2000.0 {'data': (10000, 100)}
argmax 0.3082 --- 4.096 {'data': (1024, 1024), 'axis': 0}
argmax 1.3093 --- 0.004 {'data': (10000, 1), 'axis': 0}
argmax 2.412 --- 0.4 {'data': (10000, 100), 'axis': 0}
argmax_channel 0.3038 --- 4.096 {'data': (1024, 1024)}
argmax_channel 0.0429 --- 40.0 {'data': (10000, 1)}
argmax_channel 0.0662 --- 40.0 {'data': (10000, 100)}
argmin 0.2968 --- 4.096 {'data': (1024, 1024), 'axis': 0}
argmin 1.3119 --- 0.004 {'data': (10000, 1), 'axis': 0}
argmin 2.4093 --- 0.4 {'data': (10000, 100), 'axis': 0}
argsort 1.263 --- 4194.3042 {'data': (1024, 1024), 'axis': 0}
argsort 0.2307 --- 40.0 {'data': (10000, 1), 'axis': 0}
argsort 1.1869 --- 4000.0 {'data': (10000, 100), 'axis': 0}
batch_dot 4.908 --- 67108.8672 {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)}
batch_dot 0.5305 --- 64000.0 {'lhs': (32, 1000, 10), 'rhs': (32, 1000, 10), 'transpose_b': True}
batch_dot 0.0544 --- 12.8 {'lhs': (32, 1000, 1), 'rhs': (32, 100, 1000), 'transpose_a': True, 'transpose_b': True}
broadcast_add 0.0667 0.0588 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_add 0.054 0.0408 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_add 0.052 0.0388 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_axes 0.038 --- 8.192 {'data': (1, 1024), 'axis': 0, 'size': 2}
broadcast_axes 0.0383 --- 0.008 {'data': (1, 1), 'axis': 0, 'size': 2}
broadcast_axes 0.0386 --- 0.8 {'data': (1, 100), 'axis': 0, 'size': 2}
broadcast_axis 0.0386 --- 8.192 {'data': (1, 1024), 'axis': 0, 'size': 2}
broadcast_axis 0.0384 --- 0.008 {'data': (1, 1), 'axis': 0, 'size': 2}
broadcast_axis 0.0394 --- 0.8 {'data': (1, 100), 'axis': 0, 'size': 2}
broadcast_div 0.0527 0.0756 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_div 0.0379 0.0427 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_div 0.0382 0.0419 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_equal 0.053 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_equal 0.0372 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_equal 0.0357 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_greater 0.0534 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_greater 0.0374 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_greater 0.035 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_greater_equal 0.0532 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_greater_equal 0.0383 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_greater_equal 0.0347 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_hypot 0.0512 0.0806 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_hypot 0.0378 0.0437 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_hypot 0.038 0.0429 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_lesser 0.0529 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_lesser 0.036 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_lesser 0.0343 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_lesser_equal 0.0533 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_lesser_equal 0.0361 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_lesser_equal 0.0364 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_like 0.1243 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_like 0.0439 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_like 0.0392 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_and 0.0531 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_and 0.0387 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_and 0.0334 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_or 0.0485 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_or 0.0373 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_or 0.0367 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_xor 0.0535 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_xor 0.0351 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_xor 0.0357 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_maximum 0.0522 0.0771 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_maximum 0.0375 0.0413 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_maximum 0.0385 0.0419 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_minimum 0.0528 0.0799 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_minimum 0.0377 0.0425 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_minimum 0.0379 0.0422 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_minus 0.0529 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_minus 0.0382 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_minus 0.0357 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_mod 0.1119 0.062 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_mod 0.047 0.0443 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_mod 0.042 0.0424 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_mul 0.0533 0.0696 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_mul 0.038 0.0434 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_mul 0.0386 0.0433 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_not_equal 0.0542 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_not_equal 0.0365 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_not_equal 0.035 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_plus 0.0532 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_plus 0.0336 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_plus 0.0331 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_power 0.0513 0.0828 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_power 0.0387 0.0423 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_power 0.0383 0.0419 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_sub 0.0525 0.054 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_sub 0.0366 0.0407 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_sub 0.0376 0.0413 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_to 0.1055 --- 4194.3042 {'data': (1, 1024), 'shape': (1024, 1024)}
broadcast_to 0.0387 --- 40.0 {'data': (1, 1), 'shape': (10000, 1)}
broadcast_to 0.0966 --- 4000.0 {'data': (1, 100), 'shape': (10000, 100)}
cbrt 0.0459 0.048 2097.1521 {'data': (1024, 1024)}
cbrt 0.0358 0.0326 40.0 {'data': (10000, 1)}
cbrt 0.0422 0.0433 2000.0 {'data': (10000, 100)}
ceil 0.045 --- 4194.3042 {'data': (1024, 1024)}
ceil 0.034 --- 40.0 {'data': (10000, 1)}
ceil 0.0448 --- 4000.0 {'data': (10000, 100)}
cos 0.0463 0.0475 2097.1521 {'data': (1024, 1024)}
cos 0.036 0.0334 40.0 {'data': (10000, 1)}
cos 0.0444 0.0433 2000.0 {'data': (10000, 100)}
cosh 0.0463 0.0469 4194.3042 {'data': (1024, 1024)}
cosh 0.0323 0.0308 20.0 {'data': (10000, 1)}
cosh 0.0412 0.0402 2000.0 {'data': (10000, 100)}
ctc_loss --- 0.1446 1286336.875 {'data': (1024, 100, 100), 'label': (100, 100)}
degrees 0.0458 0.0415 4194.3042 {'data': (1024, 1024)}
degrees 0.0335 0.0325 20.0 {'data': (10000, 1)}
degrees 0.0417 0.0347 2000.0 {'data': (10000, 100)}
depth_to_space 0.0373 --- 0.128 {'data': (1, 4, 2, 4), 'block_size': 2}
depth_to_space 0.071 --- 1000.0 {'data': (10, 25, 10, 100), 'block_size': 5}
diag 0.0369 0.0417 4.092 {'data': (1024, 1024), 'k': 1}
diag 0.0185 0.0335 --- {'data': (10000, 1), 'k': 1}
diag 0.0374 0.0407 0.198 {'data': (10000, 100), 'k': 1}
dot 0.2585 0.4344 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
dot 0.0725 0.096 4000.0 {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}
dot 0.0806 0.0694 0.4 {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_a': True, 'transpose_b': True}
elemwise_add 0.053 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_add 0.0349 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_add 0.0346 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_div 0.052 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_div 0.0352 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_div 0.0369 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_mul 0.0533 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_mul 0.0359 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_mul 0.0348 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_sub 0.0538 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_sub 0.0367 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_sub 0.0369 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
erf 0.0466 0.0467 2097.1521 {'data': (1024, 1024)}
erf 0.0344 0.032 40.0 {'data': (10000, 1)}
erf 0.0457 0.0457 4000.0 {'data': (10000, 100)}
erfinv 0.122 0.0491 4194.3042 {'data': (1024, 1024)}
erfinv 0.0377 0.0318 20.0 {'data': (10000, 1)}
erfinv 0.1151 0.0459 2000.0 {'data': (10000, 100)}
exp 0.0457 --- 4194.3042 {'data': (1024, 1024)}
exp 0.0312 --- 40.0 {'data': (10000, 1)}
exp 0.0448 --- 4000.0 {'data': (10000, 100)}
expand_dims 0.0533 --- 4194.3042 {'data': (1024, 1024), 'axis': 0}
expand_dims 0.0438 --- 40.0 {'data': (10000, 1), 'axis': 0}
expand_dims 0.0545 --- 4000.0 {'data': (10000, 100), 'axis': 0}
expm1 0.047 0.0452 4194.3042 {'data': (1024, 1024)}
expm1 0.0305 0.0281 20.0 {'data': (10000, 1)}
expm1 0.0355 0.0401 2000.0 {'data': (10000, 100)}
fix 0.0471 --- 4194.3042 {'data': (1024, 1024)}
fix 0.0361 --- 40.0 {'data': (10000, 1)}
fix 0.0461 --- 4000.0 {'data': (10000, 100)}
flatten 0.0522 --- 4194.3042 {'data': (1024, 1024)}
flatten 0.0459 --- 40.0 {'data': (10000, 1)}
flatten 0.0586 --- 4000.0 {'data': (10000, 100)}
flip 0.0536 --- 4194.3042 {'data': (1024, 1024), 'axis': 0}
flip 0.043 --- 40.0 {'data': (10000, 1), 'axis': 0}
flip 0.0541 --- 4000.0 {'data': (10000, 100), 'axis': 0}
floor 0.0452 --- 4194.3042 {'data': (1024, 1024)}
floor 0.0359 --- 40.0 {'data': (10000, 1)}
floor 0.0448 --- 4000.0 {'data': (10000, 100)}
ftml_update 0.099 --- 589902.4375 {'weight': (1024, 1024), 'grad': (1024, 1024), 'd': (1024, 1024), 'v': (1024, 1024), 'z': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftml_update 0.0409 --- 587845.3125 {'weight': (10000, 1), 'grad': (10000, 1), 'd': (10000, 1), 'v': (10000, 1), 'z': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftml_update 0.1 --- 591825.3125 {'weight': (10000, 100), 'grad': (10000, 100), 'd': (10000, 100), 'v': (10000, 100), 'z': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftrl_update 0.0794 --- 596116.75 {'weight': (1024, 1024), 'grad': (1024, 1024), 'z': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
ftrl_update 0.04 --- 594059.625 {'weight': (10000, 1), 'grad': (10000, 1), 'z': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
ftrl_update 0.0772 --- 598039.625 {'weight': (10000, 100), 'grad': (10000, 100), 'z': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
gamma 0.0675 0.0893 4194.3042 {'data': (1024, 1024)}
gamma 0.0337 0.0332 20.0 {'data': (10000, 1)}
gamma 0.0439 0.0752 2000.0 {'data': (10000, 100)}
gammaln 0.0564 0.0747 4194.3042 {'data': (1024, 1024)}
gammaln 0.0248 0.0241 20.0 {'data': (10000, 1)}
gammaln 0.0423 0.0624 2000.0 {'data': (10000, 100)}
hard_sigmoid 0.0341 0.036 2097.1521 {'data': (1024, 1024), 'alpha': 0.25, 'beta': 0.5}
hard_sigmoid 0.0237 0.0221 20.0 {'data': (10000, 1), 'alpha': 0.25, 'beta': 0.5}
hard_sigmoid 0.0328 0.0345 2000.0 {'data': (10000, 100), 'alpha': 0.25, 'beta': 0.5}
identity --- --- 4194.3042 {'data': (1024, 1024)}
identity --- --- 20.0 {'data': (10000, 1)}
identity --- --- 2000.0 {'data': (10000, 100)}
log 0.0475 0.049 4194.3042 {'data': (1024, 1024)}
log 0.0327 0.032 20.0 {'data': (10000, 1)}
log 0.0402 0.0416 4000.0 {'data': (10000, 100)}
log10 0.0457 0.0479 4194.3042 {'data': (1024, 1024)}
log10 0.0294 0.0273 20.0 {'data': (10000, 1)}
log10 0.0371 0.0402 2000.0 {'data': (10000, 100)}
log1p 0.0463 0.0483 4194.3042 {'data': (1024, 1024)}
log1p 0.0349 0.0311 20.0 {'data': (10000, 1)}
log1p 0.0439 0.0431 2000.0 {'data': (10000, 100)}
log2 0.0431 0.0455 2097.1521 {'data': (1024, 1024)}
log2 0.0302 0.0279 20.0 {'data': (10000, 1)}
log2 0.0373 0.0398 2000.0 {'data': (10000, 100)}
log_softmax 0.0586 0.0544 2097.1521 {'data': (1024, 1024), 'axis': -1, 'temperature': 0.5}
log_softmax 0.0723 0.0556 40.0 {'data': (10000, 1), 'axis': -1, 'temperature': 0.5}
log_softmax 0.0587 0.0546 2000.0 {'data': (10000, 100), 'axis': -1, 'temperature': 0.5}
logical_not 0.0358 --- 4194.3042 {'data': (1024, 1024)}
logical_not 0.0235 --- 40.0 {'data': (10000, 1)}
logical_not 0.0354 --- 4000.0 {'data': (10000, 100)}
make_loss 0.0404 --- 4194.3042 {'data': (1024, 1024)}
make_loss 0.0289 --- 40.0 {'data': (10000, 1)}
make_loss 0.0421 --- 4000.0 {'data': (10000, 100)}
max 0.0807 0.1123 4.096 {'data': (1024, 1024), 'axis': 0}
max 0.0502 0.0368 0.002 {'data': (10000, 1), 'axis': 0}
max 0.0795 0.1086 0.4 {'data': (10000, 100), 'axis': 0}
max_axis 0.0574 --- 4.096 {'data': (1024, 1024), 'axis': 0}
max_axis 0.0502 --- 0.004 {'data': (10000, 1), 'axis': 0}
max_axis 0.063 --- 0.4 {'data': (10000, 100), 'axis': 0}
mean 0.0722 0.1033 4.096 {'data': (1024, 1024), 'axis': 0}
mean 0.0535 0.0454 0.002 {'data': (10000, 1), 'axis': 0}
mean 0.0766 0.0994 0.4 {'data': (10000, 100), 'axis': 0}
min 0.0712 0.1112 4.096 {'data': (1024, 1024), 'axis': 0}
min 0.0511 0.0403 0.002 {'data': (10000, 1), 'axis': 0}
min 0.063 0.0995 0.2 {'data': (10000, 100), 'axis': 0}
min_axis 0.0478 --- 4.096 {'data': (1024, 1024), 'axis': 0}
min_axis 0.0431 --- 0.002 {'data': (10000, 1), 'axis': 0}
min_axis 0.0524 --- 0.2 {'data': (10000, 100), 'axis': 0}
mp_sgd_mom_update 0.0696 --- 602331.0625 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_mom_update 0.0388 --- 600273.875 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_mom_update 0.0682 --- 604253.875 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.0586 --- 608545.375 {'weight': (1024, 1024), 'grad': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.0391 --- 606488.1875 {'weight': (10000, 1), 'grad': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.0584 --- 610468.1875 {'weight': (10000, 100), 'grad': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
nanprod 0.0611 0.0996 4.096 {'data': (1024, 1024), 'axis': 0}
nanprod 0.0355 0.0231 0.002 {'data': (10000, 1), 'axis': 0}
nanprod 0.0559 0.0921 0.4 {'data': (10000, 100), 'axis': 0}
nansum 0.0531 0.0956 4.096 {'data': (1024, 1024), 'axis': 0}
nansum 0.0366 0.0215 0.002 {'data': (10000, 1), 'axis': 0}
nansum 0.0579 0.0916 0.4 {'data': (10000, 100), 'axis': 0}
negative 0.045 --- 4194.3042 {'data': (1024, 1024)}
negative 0.0343 --- 40.0 {'data': (10000, 1)}
negative 0.0458 --- 4000.0 {'data': (10000, 100)}
ones_like 0.0339 --- 4194.3042 {'data': (1024, 1024)}
ones_like 0.0244 --- 40.0 {'data': (10000, 1)}
ones_like 0.0321 --- 4000.0 {'data': (10000, 100)}
pad --- --- 0.384 {'data': (1, 4, 2, 4), 'mode': 'constant', 'pad_width': (0, 0, 0, 0, 1, 1, 1, 1)}
pad --- --- 1224.0 {'data': (10, 25, 10, 100), 'mode': 'constant', 'pad_width': (0, 0, 0, 0, 1, 1, 1, 1)}
prod 0.0643 0.0995 4.096 {'data': (1024, 1024), 'axis': 0}
prod 0.034 0.0228 0.002 {'data': (10000, 1), 'axis': 0}
prod 0.0526 0.092 0.4 {'data': (10000, 100), 'axis': 0}
radians 0.0361 0.0328 2097.1521 {'data': (1024, 1024)}
radians 0.0262 0.0248 20.0 {'data': (10000, 1)}
radians 0.0362 0.0321 2000.0 {'data': (10000, 100)}
random_exponential 2.916 --- 2097.1521 {'shape': (1024, 1024)}
random_exponential 0.0508 --- 20.0 {'shape': (10000, 1)}
random_exponential 2.5791 --- 4000.0 {'shape': (10000, 100)}
random_gamma 7.8382 --- 4194.3042 {'shape': (1024, 1024)}
random_gamma 0.1 --- 20.0 {'shape': (10000, 1)}
random_gamma 7.7596 --- 2000.0 {'shape': (10000, 100)}
random_generalized_negative_binomial 8.6073 --- 2097.1521 {'shape': (1024, 1024)}
random_generalized_negative_binomial 0.0938 --- 20.0 {'shape': (10000, 1)}
random_generalized_negative_binomial 6.5623 --- 4000.0 {'shape': (10000, 100)}
random_negative_binomial 5.775 --- 4194.3042 {'k': 1, 'p': 1, 'shape': (1024, 1024)}
random_negative_binomial 0.1201 --- 40.0 {'k': 1, 'p': 1, 'shape': (10000, 1)}
random_negative_binomial 7.9775 --- 4000.0 {'k': 1, 'p': 1, 'shape': (10000, 100)}
random_normal 2.493 --- 4194.3042 {'shape': (1024, 1024)}
random_normal 0.0571 --- 20.0 {'shape': (10000, 1)}
random_normal 2.363 --- 4000.0 {'shape': (10000, 100)}
random_pdf_dirichlet 0.0456 --- 0.004 {'sample': (2,), 'alpha': [0.0, 2.5]}
random_pdf_exponential 0.043 --- 0.008 {'sample': (2,), 'lam': [1.0, 8.5]}
random_pdf_gamma 0.047 --- 0.008 {'sample': (2,), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]}
random_pdf_generalized_negative_binomial 0.0461 --- 0.008 {'sample': (2,), 'mu': [2.0, 2.5], 'alpha': [0.0, 2.5]}
random_pdf_negative_binomial 0.0382 --- 0.008 {'sample': (2,), 'k': [20, 49], 'p': [0.4, 0.77]}
random_pdf_normal 0.0378 --- 0.008 {'sample': (2,), 'mu': [2.0, 2.5], 'sigma': [1.0, 3.7]}
random_pdf_poisson 0.0177 --- 0.004 {'sample': (2,), 'lam': [1.0, 8.5]}
random_pdf_uniform 0.0236 --- 0.008 {'sample': (2,), 'low': [0.0, 2.5], 'high': [1.0, 3.7]}
random_poisson 2.3348 --- 2097.1521 {'shape': (1024, 1024)}
random_poisson 0.0527 --- 20.0 {'shape': (10000, 1)}
random_poisson 1.6591 --- 2000.0 {'shape': (10000, 100)}
random_randint 0.9378 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_randint 0.0423 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_randint 0.8959 --- 2000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
random_uniform 0.3582 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_uniform 0.0387 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_uniform 0.3482 --- 2000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
rcbrt 0.0482 0.0491 4194.3042 {'data': (1024, 1024)}
rcbrt 0.0354 0.0309 20.0 {'data': (10000, 1)}
rcbrt 0.0435 0.0415 2000.0 {'data': (10000, 100)}
reciprocal 0.0441 0.0453 4194.3042 {'data': (1024, 1024)}
reciprocal 0.0307 0.0266 40.0 {'data': (10000, 1)}
reciprocal 0.046 0.0459 4000.0 {'data': (10000, 100)}
relu 0.0514 0.0496 4194.3042 {'data': (1024, 1024)}
relu 0.0249 0.0241 40.0 {'data': (10000, 1)}
relu 0.0491 0.0472 4000.0 {'data': (10000, 100)}
repeat 0.1931 0.0951 8388.6084 {'data': (1024, 1024), 'repeats': 2, 'axis': 0}
repeat 0.0415 0.041 80.0 {'data': (10000, 1), 'repeats': 2, 'axis': 0}
repeat 0.1779 0.0945 8000.0 {'data': (10000, 100), 'repeats': 2, 'axis': 0}
reshape 0.058 --- 4194.3042 {'data': (1024, 1024), 'shape': (1024, 1024)}
reshape 0.0417 --- 20.0 {'data': (10000, 1), 'shape': (10000, 1)}
reshape 0.0528 --- 2000.0 {'data': (10000, 100), 'shape': (10000, 100)}
reshape_like 0.0534 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
reshape_like 0.0458 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
reshape_like 0.0431 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
reverse 0.0442 0.0391 2097.1521 {'data': (1024, 1024), 'axis': 0}
reverse 0.0291 0.0279 20.0 {'data': (10000, 1), 'axis': 0}
reverse 0.0419 0.0399 2000.0 {'data': (10000, 100), 'axis': 0}
rint 0.0436 --- 4194.3042 {'data': (1024, 1024)}
rint 0.0318 --- 40.0 {'data': (10000, 1)}
rint 0.0426 --- 4000.0 {'data': (10000, 100)}
rmsprop_update 0.0649 --- 614759.6875 {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmsprop_update 0.0386 --- 612702.5 {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmsprop_update 0.064 --- 616682.5 {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 0.0926 --- 620974.0 {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'g': (1024, 1024), 'delta': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'gamma2': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 0.031 --- 618916.8125 {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'g': (10000, 1), 'delta': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'gamma2': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 0.0881 --- 622896.8125 {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'g': (10000, 100), 'delta': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'gamma2': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4}
round 0.069 --- 4194.3042 {'data': (1024, 1024)}
round 0.0532 --- 40.0 {'data': (10000, 1)}
round 0.0641 --- 4000.0 {'data': (10000, 100)}
rsqrt 0.0478 0.0513 4194.3042 {'data': (1024, 1024)}
rsqrt 0.0391 0.0349 40.0 {'data': (10000, 1)}
rsqrt 0.0497 0.0515 4000.0 {'data': (10000, 100)}
sample_exponential 0.1656 --- 8388.6084 {'lam': [1.0, 8.5], 'shape': (1024, 1024)}
sample_exponential 0.1043 --- 80.0 {'lam': [1.0, 8.5], 'shape': (10000, 1)}
sample_exponential 0.1557 --- 8000.0 {'lam': [1.0, 8.5], 'shape': (10000, 100)}
sample_gamma 0.3236 --- 8388.6084 {'alpha': [0.0, 2.5], 'shape': (1024, 1024), 'beta': [1.0, 0.7]}
sample_gamma 0.2764 --- 80.0 {'alpha': [0.0, 2.5], 'shape': (10000, 1), 'beta': [1.0, 0.7]}
sample_gamma 0.3106 --- 8000.0 {'alpha': [0.0, 2.5], 'shape': (10000, 100), 'beta': [1.0, 0.7]}
sample_generalized_negative_binomial 0.9559 --- 8388.6084 {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'alpha': [0.0, 2.5]}
sample_generalized_negative_binomial 0.5525 --- 80.0 {'mu': [2.0, 2.5], 'shape': (10000, 1), 'alpha': [0.0, 2.5]}
sample_generalized_negative_binomial 0.9381 --- 8000.0 {'mu': [2.0, 2.5], 'shape': (10000, 100), 'alpha': [0.0, 2.5]}
sample_negative_binomial 1.4427 --- 8388.6084 {'k': [20, 49], 'shape': (1024, 1024), 'p': [0.4, 0.77]}
sample_negative_binomial 0.8663 --- 80.0 {'k': [20, 49], 'shape': (10000, 1), 'p': [0.4, 0.77]}
sample_negative_binomial 1.4467 --- 8000.0 {'k': [20, 49], 'shape': (10000, 100), 'p': [0.4, 0.77]}
sample_normal 0.1471 --- 8388.6084 {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'sigma': [1.0, 3.7]}
sample_normal 0.0622 --- 80.0 {'mu': [2.0, 2.5], 'shape': (10000, 1), 'sigma': [1.0, 3.7]}
sample_normal 0.1462 --- 8000.0 {'mu': [2.0, 2.5], 'shape': (10000, 100), 'sigma': [1.0, 3.7]}
sample_poisson 0.441 --- 8388.6084 {'lam': [1.0, 8.5], 'shape': (1024, 1024)}
sample_poisson 0.3674 --- 80.0 {'lam': [1.0, 8.5], 'shape': (10000, 1)}
sample_poisson 0.4476 --- 8000.0 {'lam': [1.0, 8.5], 'shape': (10000, 100)}
sample_uniform 0.165 --- 8388.6084 {'low': [0.0, 2.5], 'shape': (1024, 1024), 'high': [1.0, 3.7]}
sample_uniform 0.0614 --- 80.0 {'low': [0.0, 2.5], 'shape': (10000, 1), 'high': [1.0, 3.7]}
sample_uniform 0.1557 --- 8000.0 {'low': [0.0, 2.5], 'shape': (10000, 100), 'high': [1.0, 3.7]}
sgd_mom_update 0.0586 --- 627188.25 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_mom_update 0.033 --- 625131.125 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_mom_update 0.049 --- 629111.125 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.0355 --- 633402.5625 {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.0197 --- 631345.4375 {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.0334 --- 635325.4375 {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
shape_array 0.0307 --- 0.016 {'data': (1024, 1024)}
shape_array 0.0579 --- 0.016 {'data': (10000, 1)}
shape_array 0.0313 --- 0.016 {'data': (10000, 100)}
shuffle 0.1734 --- 4194.3042 {'data': (1024, 1024)}
shuffle 0.3943 --- 40.0 {'data': (10000, 1)}
shuffle 0.4167 --- 4000.0 {'data': (10000, 100)}
sigmoid 0.0488 0.048 4194.3042 {'data': (1024, 1024)}
sigmoid 0.0315 0.0294 20.0 {'data': (10000, 1)}
sigmoid 0.0359 0.0398 2000.0 {'data': (10000, 100)}
sign 0.0454 0.041 2097.1521 {'data': (1024, 1024)}
sign 0.0319 0.0316 20.0 {'data': (10000, 1)}
sign 0.0436 0.0365 4000.0 {'data': (10000, 100)}
signsgd_update 0.0436 --- 639616.875 {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
signsgd_update 0.0272 --- 637559.75 {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
signsgd_update 0.0426 --- 641539.75 {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
signum_update 0.0553 --- 645831.1875 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
signum_update 0.0299 --- 643774.0625 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
signum_update 0.0548 --- 647754.0625 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
sin 0.0435 0.0434 4194.3042 {'data': (1024, 1024)}
sin 0.0418 0.0339 40.0 {'data': (10000, 1)}
sin 0.0531 0.0488 4000.0 {'data': (10000, 100)}
sinh 0.0488 0.0488 2097.1521 {'data': (1024, 1024)}
sinh 0.0357 0.0322 40.0 {'data': (10000, 1)}
sinh 0.0385 0.0397 2000.0 {'data': (10000, 100)}
size_array 0.0273 --- 0.008 {'data': (1024, 1024)}
size_array 0.0257 --- 0.008 {'data': (10000, 1)}
size_array 0.0247 --- 0.008 {'data': (10000, 100)}
smooth_l1 0.0327 0.0338 1251668.125 {'data': (1024, 1024)}
smooth_l1 0.0213 0.0196 1249611.0 {'data': (10000, 1)}
smooth_l1 0.0313 0.0331 1253591.0 {'data': (10000, 100)}
softmax 0.0773 0.0605 4194.3042 {'data': (1024, 1024), 'axis': -1, 'temperature': 0.5}
softmax 0.0751 0.0536 40.0 {'data': (10000, 1), 'axis': -1, 'temperature': 0.5}
softmax 0.0625 0.0563 4000.0 {'data': (10000, 100), 'axis': -1, 'temperature': 0.5}
softmax_cross_entropy 0.0495 --- 1251591.0 {'data': (1024, 1024), 'label': (1024,)}
softsign 0.0432 0.0434 2097.1521 {'data': (1024, 1024)}
softsign 0.0322 0.0282 40.0 {'data': (10000, 1)}
softsign 0.0425 0.0413 2000.0 {'data': (10000, 100)}
sort 1.2868 --- 6291.4561 {'data': (1024, 1024), 'axis': 0}
sort 0.2408 --- 60.0 {'data': (10000, 1), 'axis': 0}
sort 1.2224 --- 6000.0 {'data': (10000, 100), 'axis': 0}
space_to_depth 0.047 --- 0.128 {'data': (1, 4, 2, 4), 'block_size': 2}
space_to_depth 0.0524 --- 1000.0 {'data': (10, 25, 10, 100), 'block_size': 5}
split --- --- 4194.3042 {'data': (1024, 1024), 'num_outputs': 1, 'axis': 0}
split --- --- 40.0 {'data': (10000, 1), 'num_outputs': 1, 'axis': 0}
split --- --- 2000.0 {'data': (10000, 100), 'num_outputs': 1, 'axis': 0}
sqrt 0.0716 0.045 4194.3042 {'data': (1024, 1024)}
sqrt 0.0336 0.0309 20.0 {'data': (10000, 1)}
sqrt 0.0387 0.0421 2000.0 {'data': (10000, 100)}
square 0.0351 0.0405 2097.1521 {'data': (1024, 1024)}
square 0.0301 0.0282 20.0 {'data': (10000, 1)}
square 0.0366 0.0397 2000.0 {'data': (10000, 100)}
stop_gradient --- --- 4194.3042 {'data': (1024, 1024)}
stop_gradient --- --- 40.0 {'data': (10000, 1)}
stop_gradient --- --- 4000.0 {'data': (10000, 100)}
sum 0.0616 0.0855 4.096 {'data': (1024, 1024), 'axis': 0}
sum 0.0483 0.0298 0.004 {'data': (10000, 1), 'axis': 0}
sum 0.0708 0.0843 0.4 {'data': (10000, 100), 'axis': 0}
sum_axis 0.0534 --- 4.096 {'data': (1024, 1024), 'axis': 0}
sum_axis 0.0499 --- 0.004 {'data': (10000, 1), 'axis': 0}
sum_axis 0.0677 --- 0.4 {'data': (10000, 100), 'axis': 0}
swapaxes 0.0496 --- 4194.3042 {'data': (1024, 1024), 'dim1': 0, 'dim2': 1}
swapaxes 0.036 --- 40.0 {'data': (10000, 1), 'dim1': 0, 'dim2': 1}
swapaxes 0.0508 --- 4000.0 {'data': (10000, 100), 'dim1': 0, 'dim2': 1}
tan 0.0432 0.0422 4194.3042 {'data': (1024, 1024)}
tan 0.0285 0.0252 20.0 {'data': (10000, 1)}
tan 0.0356 0.0384 2000.0 {'data': (10000, 100)}
tanh 0.0355 0.0384 2097.1521 {'data': (1024, 1024)}
tanh 0.027 0.0263 20.0 {'data': (10000, 1)}
tanh 0.035 0.0387 2000.0 {'data': (10000, 100)}
tile 0.2002 0.0998 8388.6084 {'data': (1024, 1024), 'reps': 2}
tile 0.0369 0.036 40.0 {'data': (10000, 1), 'reps': 2}
tile 0.1754 0.0927 4000.0 {'data': (10000, 100), 'reps': 2}
topk 0.1062 --- 2.048 {'data': (1024, 1024), 'axis': 0, 'k': 1}
topk 0.0693 --- 0.002 {'data': (10000, 1), 'axis': 0, 'k': 1}
topk 0.1154 --- 0.2 {'data': (10000, 100), 'axis': 0, 'k': 1}
transpose 0.054 --- 4194.3042 {'data': (1024, 1024)}
transpose 0.0319 --- 40.0 {'data': (10000, 1)}
transpose 0.0526 --- 4000.0 {'data': (10000, 100)}
trunc 0.0395 --- 4194.3042 {'data': (1024, 1024)}
trunc 0.0308 --- 40.0 {'data': (10000, 1)}
trunc 0.0399 --- 4000.0 {'data': (10000, 100)}
zeros_like 0.0326 --- 4194.3042 {'data': (1024, 1024)}
zeros_like 0.0283 --- 40.0 {'data': (10000, 1)}
zeros_like 0.0321 --- 4000.0 {'data': (10000, 100)}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment