Skip to content

Instantly share code, notes, and snippets.

@ChaiBapchya
Created January 31, 2020 02:06
Show Gist options
  • Save ChaiBapchya/7ec49647bb2ae8549e00d703e99371af to your computer and use it in GitHub Desktop.
Save ChaiBapchya/7ec49647bb2ae8549e00d703e99371af to your computer and use it in GitHub Desktop.
Updated markdown for native profiler

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✔ CUDA
  15. CUDA_RTC : ✔ CUDA_RTC
  16. CUDNN : ✔ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✖ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✖ INT64_TENSOR_SIZE
  22. JEMALLOC : ✔ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✔ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✔ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✔ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Avg Forward Time (ms) Avg. Backward Time (ms) Max Mem Usage (Storage) (Bytes) Inputs
BatchNorm 2.5248 1.2922 25165.8359 {'data': (32, 3, 256, 256), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)}
BatchNorm 5.8611 1.524 38400.0117 {'data': (32, 3, 10000, 10), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)}
BlockGrad 0.3609 --- 2097.1521 {'data': (1024, 1024)}
BlockGrad 0.0089 --- 20.0 {'data': (10000, 1)}
BlockGrad 0.3618 --- 2000.0 {'data': (10000, 100)}
Convolution 0.5094 0.9317 2080.7681 {'data': (32, 3, 256), 'weight': (64, 3, 3), 'bias': (64,), 'kernel': (3,), 'stride': (1,), 'dilate': (1,), 'pad': (0,), 'num_filter': 64, 'layout': 'NCW'}
Deconvolution 2.2693 6.6616 2704448.25 {'data': (32, 3, 256), 'weight': (3, 64, 3), 'bias': (64,), 'kernel': (3,), 'stride': (1,), 'dilate': (1,), 'pad': (0,), 'adj': (0,), 'num_filter': 64, 'no_bias': False, 'layout': 'NCW'}
Dropout 2.6628 0.5423 37748.7344 {'data': (32, 3, 256, 256), 'p': 0.5, 'mode': 'always'}
Dropout 0.0831 0.0349 600.0 {'data': (10000, 10), 'p': 0.5, 'mode': 'always'}
FullyConnected 1.5243 98.0013 8.192 {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 196608), 'bias': (64,), 'flatten': True}
FullyConnected 1.0998 30.8501 6291.4561 {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 256), 'bias': (64,), 'flatten': False}
LeakyReLU 0.0667 0.0676 4194.3042 {'data': (1024, 1024), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.0534 0.049 20.0 {'data': (10000, 1), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.0865 0.0811 2000.0 {'data': (10000, 100), 'act_type': 'leaky', 'slope': 0.1}
LeakyReLU 0.1206 0.7309 2097.1521 {'data': (1024, 1024), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.0448 0.04 20.0 {'data': (10000, 1), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.1212 0.7229 2000.0 {'data': (10000, 100), 'act_type': 'elu', 'slope': 0.1}
LeakyReLU 0.704 0.2679 2097.1521 {'data': (1024, 1024), 'act_type': 'selu'}
LeakyReLU 0.0319 0.0287 20.0 {'data': (10000, 1), 'act_type': 'selu'}
LeakyReLU 0.6942 0.2591 2000.0 {'data': (10000, 100), 'act_type': 'selu'}
LeakyReLU 0.4296 3.7836 4194.3042 {'data': (1024, 1024), 'act_type': 'prelu', 'gamma': (1, 1024)}
LeakyReLU 0.0412 0.2741 20.0 {'data': (10000, 1), 'act_type': 'prelu', 'gamma': (1, 1)}
LeakyReLU 0.4179 3.2919 4000.0 {'data': (10000, 100), 'act_type': 'prelu', 'gamma': (1, 100)}
Pooling 0.2986 0.3777 98.304 {'data': (32, 3, 256), 'kernel': 3, 'pool_type': 'avg', 'global_pool': 0, 'stride': 1, 'pad': 1}
abs 0.0513 0.3088 4194.3042 {'data': (1024, 1024)}
abs 0.0323 0.0748 20.0 {'data': (10000, 1)}
abs 0.5531 0.7512 4000.0 {'data': (10000, 100)}
adam_update 0.4475 --- 2639212.0 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
adam_update 0.0538 --- 2637154.75 {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
adam_update 0.6825 --- 2641134.75 {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4, 'lazy_update': 0}
arccos 1.0392 0.6424 2097.1521 {'data': (1024, 1024)}
arccos 0.0344 0.0292 20.0 {'data': (10000, 1)}
arccos 0.769 0.6015 2000.0 {'data': (10000, 100)}
arccosh 0.6852 0.7021 2097.1521 {'data': (1024, 1024)}
arccosh 0.04 0.0237 20.0 {'data': (10000, 1)}
arccosh 0.6658 0.671 2000.0 {'data': (10000, 100)}
arcsin 0.9986 0.7568 2097.1521 {'data': (1024, 1024)}
arcsin 0.0267 0.0241 20.0 {'data': (10000, 1)}
arcsin 0.7345 0.6657 2000.0 {'data': (10000, 100)}
arcsinh 1.4487 0.4922 2097.1521 {'data': (1024, 1024)}
arcsinh 0.0401 0.0318 20.0 {'data': (10000, 1)}
arcsinh 1.6481 0.4774 2000.0 {'data': (10000, 100)}
arctan 0.8992 0.0822 2097.1521 {'data': (1024, 1024)}
arctan 0.0323 0.0258 20.0 {'data': (10000, 1)}
arctan 0.94 0.0845 2000.0 {'data': (10000, 100)}
arctanh 1.3642 0.0813 4194.3042 {'data': (1024, 1024)}
arctanh 0.0345 0.0247 20.0 {'data': (10000, 1)}
arctanh 1.3843 0.1367 2000.0 {'data': (10000, 100)}
argmax 10.5619 --- 4.096 {'data': (1024, 1024), 'axis': 0}
argmax 0.0614 --- 0.002 {'data': (10000, 1), 'axis': 0}
argmax 6.553 --- 0.2 {'data': (10000, 100), 'axis': 0}
argmax_channel 0.2887 --- 2.048 {'data': (1024, 1024)}
argmax_channel 0.0364 --- 20.0 {'data': (10000, 1)}
argmax_channel 0.3419 --- 40.0 {'data': (10000, 100)}
argmin 7.2113 --- 2.048 {'data': (1024, 1024), 'axis': 0}
argmin 0.0877 --- 0.002 {'data': (10000, 1), 'axis': 0}
argmin 6.1747 --- 0.2 {'data': (10000, 100), 'axis': 0}
argsort 36.1523 --- 4194.3042 {'data': (1024, 1024), 'axis': 0}
argsort 1.1341 --- 40.0 {'data': (10000, 1), 'axis': 0}
argsort 28.6094 --- 4000.0 {'data': (10000, 100), 'axis': 0}
batch_dot 566.924 --- 67108.8672 {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)}
batch_dot 381.4693 --- 128000.0 {'lhs': (32, 1000, 10), 'rhs': (32, 1000, 10), 'transpose_b': True}
batch_dot 3.2809 --- 12.8 {'lhs': (32, 1000, 1), 'rhs': (32, 100, 1000), 'transpose_a': True, 'transpose_b': True}
broadcast_add 0.0693 0.0825 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_add 0.038 0.0458 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_add 0.0297 0.0327 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_div 0.072 0.1186 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_div 0.0365 0.0493 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_div 0.0327 0.0549 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_equal 0.1043 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_equal 0.0389 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_equal 0.0329 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_greater 0.1042 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_greater 0.0366 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_greater 0.0373 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_greater_equal 0.0708 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_greater_equal 0.042 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_greater_equal 0.0437 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_hypot 0.4855 1.0332 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_hypot 0.0746 0.1288 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_hypot 0.0403 0.0546 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_lesser 0.2353 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_lesser 0.0346 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_lesser 0.0355 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_lesser_equal 0.1437 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_lesser_equal 0.0337 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_lesser_equal 0.0307 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_and 0.0988 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_and 0.0309 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_and 0.0328 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_or 0.1435 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_or 0.033 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_or 0.0358 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_logical_xor 0.2049 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_logical_xor 0.0409 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_logical_xor 0.0416 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_maximum 0.1068 0.124 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_maximum 0.0385 0.0452 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_maximum 0.0386 0.0631 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_minimum 0.0786 0.1034 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_minimum 0.0415 0.0547 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_minimum 0.0378 0.0525 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_minus 0.0741 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_minus 0.04 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_minus 0.032 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_mod 1.4273 0.1419 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_mod 0.2105 0.0541 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_mod 0.051 0.0662 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_mul 0.0721 0.1085 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_mul 0.038 0.0516 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_mul 0.0327 0.0416 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_not_equal 0.3561 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_not_equal 0.0382 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_not_equal 0.0323 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_plus 0.1912 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_plus 0.0372 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_plus 0.0293 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_power 1.8165 3.9704 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_power 0.2463 0.416 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_power 0.0529 0.0814 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
broadcast_sub 0.0642 0.0722 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
broadcast_sub 0.0417 0.0541 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
broadcast_sub 0.0349 0.0393 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
cbrt 1.3988 0.0834 2097.1521 {'data': (1024, 1024)}
cbrt 0.039 0.028 20.0 {'data': (10000, 1)}
cbrt 1.3426 0.0825 2000.0 {'data': (10000, 100)}
ceil 0.1624 --- 4194.3042 {'data': (1024, 1024)}
ceil 0.0484 --- 20.0 {'data': (10000, 1)}
ceil 0.1578 --- 2000.0 {'data': (10000, 100)}
cos 0.5093 0.5672 2097.1521 {'data': (1024, 1024)}
cos 0.1051 0.0349 20.0 {'data': (10000, 1)}
cos 0.4706 0.5068 2000.0 {'data': (10000, 100)}
cosh 0.8828 1.5931 2097.1521 {'data': (1024, 1024)}
cosh 0.0329 0.0353 20.0 {'data': (10000, 1)}
cosh 0.778 1.3467 4000.0 {'data': (10000, 100)}
degrees 0.0497 0.0461 2097.1521 {'data': (1024, 1024)}
degrees 0.0274 0.0259 20.0 {'data': (10000, 1)}
degrees 0.058 0.0506 2000.0 {'data': (10000, 100)}
depth_to_space 0.0482 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
depth_to_space 0.4207 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
dot 3.4368 5.7041 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
dot 0.4234 4.5814 2000.0 {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True}
dot 0.1543 0.2657 0.2 {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_a': True, 'transpose_b': True}
elemwise_add 0.1565 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_add 0.0459 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_add 0.0424 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_div 0.1586 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_div 0.0305 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_div 0.0248 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_mul 0.0938 --- 4194.3042 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_mul 0.0316 --- 200.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_mul 0.0236 --- 20.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
elemwise_sub 0.0667 --- 2097.1521 {'lhs': (1024, 1024), 'rhs': (1024, 1024)}
elemwise_sub 0.0323 --- 400.0 {'lhs': (10000, 10), 'rhs': (10000, 10)}
elemwise_sub 0.0228 --- 40.0 {'lhs': (10000, 1), 'rhs': (10000, 1)}
erf 0.9298 0.7784 2097.1521 {'data': (1024, 1024)}
erf 0.0365 0.0306 20.0 {'data': (10000, 1)}
erf 0.9764 0.7217 4000.0 {'data': (10000, 100)}
erfinv 3.06 0.6606 4194.3042 {'data': (1024, 1024)}
erfinv 0.0726 0.0397 20.0 {'data': (10000, 1)}
erfinv 2.3107 0.477 4000.0 {'data': (10000, 100)}
exp 0.3763 --- 2097.1521 {'data': (1024, 1024)}
exp 0.0351 --- 40.0 {'data': (10000, 1)}
exp 0.362 --- 2000.0 {'data': (10000, 100)}
expm1 0.9999 0.6668 4194.3042 {'data': (1024, 1024)}
expm1 0.0323 0.0273 20.0 {'data': (10000, 1)}
expm1 1.0027 0.6459 4000.0 {'data': (10000, 100)}
fix 0.3574 --- 4194.3042 {'data': (1024, 1024)}
fix 0.0355 --- 40.0 {'data': (10000, 1)}
fix 0.3398 --- 4000.0 {'data': (10000, 100)}
flatten 0.0807 --- 4194.3042 {'data': (1024, 1024)}
flatten 0.0623 --- 40.0 {'data': (10000, 1)}
flatten 0.0835 --- 4000.0 {'data': (10000, 100)}
flip 0.4665 --- 2097.1521 {'data': (1024, 1024), 'axis': 0}
flip 0.0395 --- 20.0 {'data': (10000, 1), 'axis': 0}
flip 0.4598 --- 4000.0 {'data': (10000, 100), 'axis': 0}
floor 0.2024 --- 4194.3042 {'data': (1024, 1024)}
floor 0.0647 --- 40.0 {'data': (10000, 1)}
floor 0.2069 --- 4000.0 {'data': (10000, 100)}
ftml_update 0.8405 --- 2645426.25 {'weight': (1024, 1024), 'grad': (1024, 1024), 'd': (1024, 1024), 'v': (1024, 1024), 'z': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftml_update 0.0474 --- 2643369.25 {'weight': (10000, 1), 'grad': (10000, 1), 'd': (10000, 1), 'v': (10000, 1), 'z': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftml_update 0.8968 --- 2647349.25 {'weight': (10000, 100), 'grad': (10000, 100), 'd': (10000, 100), 'v': (10000, 100), 'z': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'rescale_grad': 0.4, 'clip_grad': -1.0}
ftrl_update 1.002 --- 2651640.5 {'weight': (1024, 1024), 'grad': (1024, 1024), 'z': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
ftrl_update 0.0456 --- 2649583.5 {'weight': (10000, 1), 'grad': (10000, 1), 'z': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
ftrl_update 1.1469 --- 2653563.5 {'weight': (10000, 100), 'grad': (10000, 100), 'z': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
gamma 2.9704 5.2562 4194.3042 {'data': (1024, 1024)}
gamma 0.0655 0.095 20.0 {'data': (10000, 1)}
gamma 3.7014 6.6354 4000.0 {'data': (10000, 100)}
gammaln 29.6384 2.4416 4194.3042 {'data': (1024, 1024)}
gammaln 0.3629 0.0522 20.0 {'data': (10000, 1)}
gammaln 27.7632 3.2959 4000.0 {'data': (10000, 100)}
hard_sigmoid 0.0984 0.0984 2097.1521 {'data': (1024, 1024), 'alpha': 0.25, 'beta': 0.5}
hard_sigmoid 0.0381 0.0331 20.0 {'data': (10000, 1), 'alpha': 0.25, 'beta': 0.5}
hard_sigmoid 0.0932 0.0944 2000.0 {'data': (10000, 100), 'alpha': 0.25, 'beta': 0.5}
identity --- --- 4194.3042 {'data': (1024, 1024)}
identity --- --- 40.0 {'data': (10000, 1)}
identity --- --- 4000.0 {'data': (10000, 100)}
log 0.8534 0.0808 2097.1521 {'data': (1024, 1024)}
log 0.0376 0.0299 20.0 {'data': (10000, 1)}
log 0.8084 0.0775 2000.0 {'data': (10000, 100)}
log10 0.9989 0.0825 2097.1521 {'data': (1024, 1024)}
log10 0.0517 0.052 20.0 {'data': (10000, 1)}
log10 0.7217 0.0465 2000.0 {'data': (10000, 100)}
log1p 1.0008 0.085 2097.1521 {'data': (1024, 1024)}
log1p 0.0392 0.0333 20.0 {'data': (10000, 1)}
log1p 0.954 0.0813 2000.0 {'data': (10000, 100)}
log2 0.8624 0.0797 2097.1521 {'data': (1024, 1024)}
log2 0.0449 0.0318 20.0 {'data': (10000, 1)}
log2 0.9151 0.0703 4000.0 {'data': (10000, 100)}
log_softmax 1.6722 0.8149 2097.1521 {'data': (1024, 1024), 'axis': -1, 'temperature': 0.5}
log_softmax 0.042 0.036 20.0 {'data': (10000, 1), 'axis': -1, 'temperature': 0.5}
log_softmax 1.1919 0.5942 2000.0 {'data': (10000, 100), 'axis': -1, 'temperature': 0.5}
logical_not 0.0574 --- 2097.1521 {'data': (1024, 1024)}
logical_not 0.0302 --- 20.0 {'data': (10000, 1)}
logical_not 0.054 --- 2000.0 {'data': (10000, 100)}
make_loss 0.3905 --- 2097.1521 {'data': (1024, 1024)}
make_loss 0.0192 --- 20.0 {'data': (10000, 1)}
make_loss 0.3912 --- 4000.0 {'data': (10000, 100)}
max 0.7253 0.8652 2.048 {'data': (1024, 1024), 'axis': 0}
max 0.1004 0.0333 0.002 {'data': (10000, 1), 'axis': 0}
max 0.5944 0.8481 0.2 {'data': (10000, 100), 'axis': 0}
max_axis 0.7089 --- 2.048 {'data': (1024, 1024), 'axis': 0}
max_axis 0.1393 --- 0.004 {'data': (10000, 1), 'axis': 0}
max_axis 0.6674 --- 0.2 {'data': (10000, 100), 'axis': 0}
mean 0.6215 1.1323 4.096 {'data': (1024, 1024), 'axis': 0}
mean 0.1909 0.0532 0.002 {'data': (10000, 1), 'axis': 0}
mean 0.9125 1.4182 0.4 {'data': (10000, 100), 'axis': 0}
min 0.4774 0.4824 2.048 {'data': (1024, 1024), 'axis': 0}
min 0.1494 0.0412 0.002 {'data': (10000, 1), 'axis': 0}
min 0.5034 0.4568 0.2 {'data': (10000, 100), 'axis': 0}
min_axis 0.7033 --- 2.048 {'data': (1024, 1024), 'axis': 0}
min_axis 0.1374 --- 0.002 {'data': (10000, 1), 'axis': 0}
min_axis 0.6716 --- 0.4 {'data': (10000, 100), 'axis': 0}
mp_sgd_mom_update 0.2236 --- 2657855.0 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_mom_update 0.0371 --- 2655797.75 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_mom_update 0.1963 --- 2659777.75 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.1748 --- 2664069.25 {'weight': (1024, 1024), 'grad': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.0315 --- 2662012.0 {'weight': (10000, 1), 'grad': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
mp_sgd_update 0.1708 --- 2665992.0 {'weight': (10000, 100), 'grad': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
nanprod 0.7865 0.7833 2.048 {'data': (1024, 1024), 'axis': 0}
nanprod 0.0939 0.0319 0.002 {'data': (10000, 1), 'axis': 0}
nanprod 0.586 0.765 0.4 {'data': (10000, 100), 'axis': 0}
nansum 0.7961 0.7197 2.048 {'data': (1024, 1024), 'axis': 0}
nansum 0.1528 0.0357 0.002 {'data': (10000, 1), 'axis': 0}
nansum 0.7476 0.697 0.4 {'data': (10000, 100), 'axis': 0}
negative 0.0515 --- 2097.1521 {'data': (1024, 1024)}
negative 0.026 --- 40.0 {'data': (10000, 1)}
negative 0.0528 --- 4000.0 {'data': (10000, 100)}
ones_like 0.0416 --- 4194.3042 {'data': (1024, 1024)}
ones_like 0.0228 --- 40.0 {'data': (10000, 1)}
ones_like 0.0423 --- 4000.0 {'data': (10000, 100)}
prod 0.7739 0.7642 2.048 {'data': (1024, 1024), 'axis': 0}
prod 0.1388 0.0387 0.002 {'data': (10000, 1), 'axis': 0}
prod 0.6413 0.6501 0.2 {'data': (10000, 100), 'axis': 0}
radians 0.0717 0.0498 2097.1521 {'data': (1024, 1024)}
radians 0.0275 0.0261 40.0 {'data': (10000, 1)}
radians 0.0816 0.0475 4000.0 {'data': (10000, 100)}
random_exponential 2.1319 --- 4194.3042 {'shape': (1024, 1024)}
random_exponential 0.0516 --- 20.0 {'shape': (10000, 1)}
random_exponential 2.1279 --- 2000.0 {'shape': (10000, 100)}
random_gamma 7.0137 --- 4194.3042 {'shape': (1024, 1024)}
random_gamma 0.1017 --- 20.0 {'shape': (10000, 1)}
random_gamma 6.6949 --- 4000.0 {'shape': (10000, 100)}
random_generalized_negative_binomial 9.5154 --- 4194.3042 {'shape': (1024, 1024)}
random_generalized_negative_binomial 0.1137 --- 20.0 {'shape': (10000, 1)}
random_generalized_negative_binomial 6.0115 --- 2000.0 {'shape': (10000, 100)}
random_negative_binomial 5.0877 --- 2097.1521 {'k': 1, 'p': 1, 'shape': (1024, 1024)}
random_negative_binomial 0.0879 --- 20.0 {'k': 1, 'p': 1, 'shape': (10000, 1)}
random_negative_binomial 4.8165 --- 2000.0 {'k': 1, 'p': 1, 'shape': (10000, 100)}
random_normal 1.593 --- 2097.1521 {'shape': (1024, 1024)}
random_normal 0.0506 --- 20.0 {'shape': (10000, 1)}
random_normal 1.9039 --- 2000.0 {'shape': (10000, 100)}
random_pdf_dirichlet 0.0371 --- 0.002 {'sample': (2,), 'alpha': [0.0, 2.5]}
random_pdf_exponential 0.0345 --- 0.004 {'sample': (2,), 'lam': [1.0, 8.5]}
random_pdf_gamma 0.037 --- 0.004 {'sample': (2,), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]}
random_pdf_generalized_negative_binomial 0.0364 --- 0.004 {'sample': (2,), 'mu': [2.0, 2.5], 'alpha': [0.0, 2.5]}
random_pdf_negative_binomial 0.038 --- 0.008 {'sample': (2,), 'k': [20, 49], 'p': [0.4, 0.77]}
random_pdf_normal 0.0532 --- 0.008 {'sample': (2,), 'mu': [2.0, 2.5], 'sigma': [1.0, 3.7]}
random_pdf_poisson 0.0355 --- 0.004 {'sample': (2,), 'lam': [1.0, 8.5]}
random_pdf_uniform 0.0296 --- 0.008 {'sample': (2,), 'low': [0.0, 2.5], 'high': [1.0, 3.7]}
random_poisson 1.9602 --- 2097.1521 {'shape': (1024, 1024)}
random_poisson 0.0493 --- 20.0 {'shape': (10000, 1)}
random_poisson 1.6006 --- 4000.0 {'shape': (10000, 100)}
random_randint 1.0988 --- 4194.3042 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_randint 0.05 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_randint 1.3119 --- 4000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
random_uniform 0.5839 --- 2097.1521 {'low': 0, 'high': 5, 'shape': (1024, 1024)}
random_uniform 0.0374 --- 20.0 {'low': 0, 'high': 5, 'shape': (10000, 1)}
random_uniform 0.5641 --- 4000.0 {'low': 0, 'high': 5, 'shape': (10000, 100)}
rcbrt 1.3523 1.4009 4194.3042 {'data': (1024, 1024)}
rcbrt 0.0432 0.0394 20.0 {'data': (10000, 1)}
rcbrt 1.2844 1.3316 2000.0 {'data': (10000, 100)}
reciprocal 0.0811 0.0619 2097.1521 {'data': (1024, 1024)}
reciprocal 0.0298 0.0327 20.0 {'data': (10000, 1)}
reciprocal 0.0735 0.0563 2000.0 {'data': (10000, 100)}
relu 0.0938 0.0975 4194.3042 {'data': (1024, 1024)}
relu 0.0289 0.0357 20.0 {'data': (10000, 1)}
relu 0.0781 0.0827 4000.0 {'data': (10000, 100)}
rint 0.3149 --- 2097.1521 {'data': (1024, 1024)}
rint 0.0384 --- 20.0 {'data': (10000, 1)}
rint 0.1954 --- 2000.0 {'data': (10000, 100)}
rmsprop_update 0.4186 --- 2670283.5 {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmsprop_update 0.0387 --- 2668226.25 {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmsprop_update 0.5333 --- 2672206.25 {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 0.642 --- 2676497.75 {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'g': (1024, 1024), 'delta': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'gamma2': 0.1, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 0.0488 --- 2674440.75 {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'g': (10000, 1), 'delta': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'gamma2': 0.5, 'epsilon': 1e-08, 'rescale_grad': 0.4}
rmspropalex_update 1.1672 --- 2678420.75 {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'g': (10000, 100), 'delta': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'gamma2': 0.9, 'epsilon': 1e-08, 'rescale_grad': 0.4}
round 0.2487 --- 2097.1521 {'data': (1024, 1024)}
round 0.0968 --- 20.0 {'data': (10000, 1)}
round 0.2385 --- 2000.0 {'data': (10000, 100)}
rsqrt 0.6115 0.5608 2097.1521 {'data': (1024, 1024)}
rsqrt 0.0909 0.0355 20.0 {'data': (10000, 1)}
rsqrt 0.6664 0.6492 4000.0 {'data': (10000, 100)}
sample_exponential 4.1897 --- 8388.6084 {'lam': [1.0, 8.5], 'shape': (1024, 1024)}
sample_exponential 0.0779 --- 40.0 {'lam': [1.0, 8.5], 'shape': (10000, 1)}
sample_exponential 4.2833 --- 8000.0 {'lam': [1.0, 8.5], 'shape': (10000, 100)}
sample_gamma 16.064 --- 8388.6084 {'alpha': [0.0, 2.5], 'shape': (1024, 1024), 'beta': [1.0, 0.7]}
sample_gamma 0.1433 --- 40.0 {'alpha': [0.0, 2.5], 'shape': (10000, 1), 'beta': [1.0, 0.7]}
sample_gamma 9.8149 --- 4000.0 {'alpha': [0.0, 2.5], 'shape': (10000, 100), 'beta': [1.0, 0.7]}
sample_generalized_negative_binomial 30.0658 --- 8388.6084 {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'alpha': [0.0, 2.5]}
sample_generalized_negative_binomial 0.3096 --- 40.0 {'mu': [2.0, 2.5], 'shape': (10000, 1), 'alpha': [0.0, 2.5]}
sample_generalized_negative_binomial 28.0669 --- 8000.0 {'mu': [2.0, 2.5], 'shape': (10000, 100), 'alpha': [0.0, 2.5]}
sample_negative_binomial 347.9748 --- 4194.3042 {'k': [20, 49], 'shape': (1024, 1024), 'p': [0.4, 0.77]}
sample_negative_binomial 3.098 --- 80.0 {'k': [20, 49], 'shape': (10000, 1), 'p': [0.4, 0.77]}
sample_negative_binomial 306.4638 --- 4000.0 {'k': [20, 49], 'shape': (10000, 100), 'p': [0.4, 0.77]}
sample_normal 4.9098 --- 8388.6084 {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'sigma': [1.0, 3.7]}
sample_normal 0.0861 --- 40.0 {'mu': [2.0, 2.5], 'shape': (10000, 1), 'sigma': [1.0, 3.7]}
sample_normal 4.7348 --- 8000.0 {'mu': [2.0, 2.5], 'shape': (10000, 100), 'sigma': [1.0, 3.7]}
sample_poisson 10.097 --- 4194.3042 {'lam': [1.0, 8.5], 'shape': (1024, 1024)}
sample_poisson 0.1211 --- 40.0 {'lam': [1.0, 8.5], 'shape': (10000, 1)}
sample_poisson 9.5847 --- 4000.0 {'lam': [1.0, 8.5], 'shape': (10000, 100)}
sample_uniform 1.1467 --- 4194.3042 {'low': [0.0, 2.5], 'shape': (1024, 1024), 'high': [1.0, 3.7]}
sample_uniform 0.0384 --- 40.0 {'low': [0.0, 2.5], 'shape': (10000, 1), 'high': [1.0, 3.7]}
sample_uniform 1.1347 --- 8000.0 {'low': [0.0, 2.5], 'shape': (10000, 100), 'high': [1.0, 3.7]}
sgd_mom_update 0.2134 --- 2682712.0 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_mom_update 0.0419 --- 2680655.0 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_mom_update 0.2044 --- 2684635.0 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.1791 --- 2688926.5 {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.0367 --- 2686869.25 {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0}
sgd_update 0.1686 --- 2690849.25 {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0}
shape_array 0.0165 --- 0.016 {'data': (1024, 1024)}
shape_array 0.0166 --- 0.016 {'data': (10000, 1)}
shape_array 0.0161 --- 0.016 {'data': (10000, 100)}
shuffle 1.1006 --- 4194.3042 {'data': (1024, 1024)}
shuffle 0.4032 --- 20.0 {'data': (10000, 1)}
shuffle 1.2646 --- 2000.0 {'data': (10000, 100)}
sigmoid 0.8481 0.0758 2097.1521 {'data': (1024, 1024)}
sigmoid 0.0337 0.0273 20.0 {'data': (10000, 1)}
sigmoid 0.7093 0.0667 2000.0 {'data': (10000, 100)}
sign 0.2681 0.0484 2097.1521 {'data': (1024, 1024)}
sign 0.0687 0.022 20.0 {'data': (10000, 1)}
sign 0.2627 0.0448 2000.0 {'data': (10000, 100)}
signsgd_update 0.2254 --- 2695140.75 {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
signsgd_update 0.0339 --- 2693083.5 {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
signsgd_update 0.2173 --- 2697063.5 {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
signum_update 0.3357 --- 2701355.0 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'rescale_grad': 0.4}
signum_update 0.0377 --- 2699297.75 {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'rescale_grad': 0.4}
signum_update 0.3269 --- 2703277.75 {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'rescale_grad': 0.4}
sin 0.5448 0.5685 2097.1521 {'data': (1024, 1024)}
sin 0.0298 0.0275 20.0 {'data': (10000, 1)}
sin 0.4843 0.4851 2000.0 {'data': (10000, 100)}
sinh 1.5249 0.9406 2097.1521 {'data': (1024, 1024)}
sinh 0.0452 0.0384 20.0 {'data': (10000, 1)}
sinh 1.4619 0.8909 4000.0 {'data': (10000, 100)}
size_array 0.014 --- 0.008 {'data': (1024, 1024)}
size_array 0.0139 --- 0.008 {'data': (10000, 1)}
size_array 0.0139 --- 0.008 {'data': (10000, 100)}
softmax 1.3116 0.1007 2097.1521 {'data': (1024, 1024), 'axis': -1, 'temperature': 0.5}
softmax 0.0492 0.036 20.0 {'data': (10000, 1), 'axis': -1, 'temperature': 0.5}
softmax 1.1855 0.1003 2000.0 {'data': (10000, 100), 'axis': -1, 'temperature': 0.5}
softsign 0.0783 0.0609 2097.1521 {'data': (1024, 1024)}
softsign 0.0324 0.036 20.0 {'data': (10000, 1)}
softsign 0.076 0.059 2000.0 {'data': (10000, 100)}
sort 25.079 --- 6291.4561 {'data': (1024, 1024), 'axis': 0}
sort 1.0047 --- 40.0 {'data': (10000, 1), 'axis': 0}
sort 26.6501 --- 6000.0 {'data': (10000, 100), 'axis': 0}
space_to_depth 0.0493 --- 0.064 {'data': (1, 4, 2, 4), 'block_size': 2}
space_to_depth 0.4216 --- 500.0 {'data': (10, 25, 10, 100), 'block_size': 5}
sqrt 0.9161 0.2843 2097.1521 {'data': (1024, 1024)}
sqrt 0.0399 0.0316 20.0 {'data': (10000, 1)}
sqrt 0.7284 0.1733 2000.0 {'data': (10000, 100)}
square 0.0698 0.0616 2097.1521 {'data': (1024, 1024)}
square 0.029 0.0288 40.0 {'data': (10000, 1)}
square 0.0669 0.0636 4000.0 {'data': (10000, 100)}
stop_gradient --- --- 4194.3042 {'data': (1024, 1024)}
stop_gradient --- --- 20.0 {'data': (10000, 1)}
stop_gradient --- --- 2000.0 {'data': (10000, 100)}
sum 0.8737 0.698 4.096 {'data': (1024, 1024), 'axis': 0}
sum 0.1644 0.0335 0.002 {'data': (10000, 1), 'axis': 0}
sum 0.7912 0.6712 0.2 {'data': (10000, 100), 'axis': 0}
sum_axis 0.7821 --- 2.048 {'data': (1024, 1024), 'axis': 0}
sum_axis 0.1451 --- 0.002 {'data': (10000, 1), 'axis': 0}
sum_axis 0.7964 --- 0.4 {'data': (10000, 100), 'axis': 0}
swapaxes 1.4262 --- 2097.1521 {'data': (1024, 1024), 'dim1': 0, 'dim2': 1}
swapaxes 0.0452 --- 20.0 {'data': (10000, 1), 'dim1': 0, 'dim2': 1}
swapaxes 1.3266 --- 4000.0 {'data': (10000, 100), 'dim1': 0, 'dim2': 1}
tan 1.1632 0.081 2097.1521 {'data': (1024, 1024)}
tan 0.0372 0.0256 20.0 {'data': (10000, 1)}
tan 1.146 0.0842 2000.0 {'data': (10000, 100)}
tanh 1.4511 0.082 2097.1521 {'data': (1024, 1024)}
tanh 0.041 0.0263 40.0 {'data': (10000, 1)}
tanh 1.3866 0.079 4000.0 {'data': (10000, 100)}
topk 29.8296 --- 4.096 {'data': (1024, 1024), 'axis': 0, 'k': 1}
topk 0.2789 --- 0.002 {'data': (10000, 1), 'axis': 0, 'k': 1}
topk 24.2809 --- 0.4 {'data': (10000, 100), 'axis': 0, 'k': 1}
transpose 0.0961 --- 4194.3042 {'data': (1024, 1024)}
transpose 0.0454 --- 40.0 {'data': (10000, 1)}
transpose 0.2221 --- 4000.0 {'data': (10000, 100)}
trunc 0.2947 --- 4194.3042 {'data': (1024, 1024)}
trunc 0.0851 --- 40.0 {'data': (10000, 1)}
trunc 0.2885 --- 4000.0 {'data': (10000, 100)}
zeros_like 0.042 --- 4194.3042 {'data': (1024, 1024)}
zeros_like 0.0189 --- 40.0 {'data': (10000, 1)}
zeros_like 0.0454 --- 4000.0 {'data': (10000, 100)}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment