Skip to content

Instantly share code, notes, and snippets.

@connorgoggins
Last active February 14, 2020 23:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save connorgoggins/3fc3c7a40dae8b5cea7f82d9ca4b1a72 to your computer and use it in GitHub Desktop.
Save connorgoggins/3fc3c7a40dae8b5cea7f82d9ca4b1a72 to your computer and use it in GitHub Desktop.

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✖ CUDA
  15. CUDA_RTC : ✖ CUDA_RTC
  16. CUDNN : ✖ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✔ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✔ INT64_TENSOR_SIZE
  22. JEMALLOC : ✖ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✖ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✖ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✖ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Inputs Max Mem Usage (Storage) (Bytes) Avg Forward Time (ms) Avg. Backward Time (ms)
Activation {'data': (1024, 1024), 'act_type': 'relu'} 963876.9375 0.3758 0.3706
Activation {'data': (1024, 1024), 'act_type': 'sigmoid'} 968071.25 0.624 0.2184
Activation {'data': (1024, 1024), 'act_type': 'softrelu'} 972265.5625 1.363 0.6018
Activation {'data': (1024, 1024), 'act_type': 'softsign'} 976459.875 0.329 0.2568
Activation {'data': (1024, 1024), 'act_type': 'tanh'} 980654.1875 1.0498 0.2032
BatchNorm {'data': (32, 3, 256, 256), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)} 917088.3125 150.7382 37.4914
BatchNorm {'data': (32, 3, 10000, 10), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)} 955488.375 225.9198 57.3142
BilinearSampler {'data': (32, 2, 256, 256), 'grid': (32, 2, 256, 256)} 91637.4531 249.187 308.75
BlockGrad {'data': (1024, 1024)} 2097.1521 1.2824 ---
BlockGrad {'data': (10000, 1)} 20.0 0.024 ---
BlockGrad {'data': (10000, 100)} 2000.0 1.2422 ---
CTCLoss {'data': (1024, 100, 100), 'label': (100, 100)} 3982897.25 120.0944 ---
Convolution {'data': (32, 3, 64, 64), 'weight': (64, 3, 3, 3), 'bias': (64,), 'kernel': (3, 3), 'stride': (1, 1), 'dilate': (1, 1), 'pad': (0, 0), 'num_filter': 64, 'layout': 'NCHW'} 2336265.75 37.7908 65.7456
Custom {'args': [(1024, 1024)], 'op_type': 'CustomAddOne'} 4346207.0 0.0234 0.0156
Custom {'args': [(10000, 1)], 'op_type': 'CustomAddOne'} 4346207.0 0.0095 0.014
Custom {'args': [(10000, 10)], 'op_type': 'CustomAddOne'} 4346207.0 0.0093 0.0134
Deconvolution {'data': (32, 3, 64, 64), 'weight': (3, 64, 3, 3), 'bias': (64,), 'kernel': (3, 3), 'stride': (1, 1), 'dilate': (1, 1), 'pad': (0, 0), 'num_filter': 64, 'no_bias': False, 'layout': 'NCHW'} 3959778.75 416.8632 280.7622
Dropout {'data': (32, 3, 256, 256), 'p': 0.5, 'mode': 'always'} 866156.6875 30.1472 1.3324
Dropout {'data': (10000, 10), 'p': 0.5, 'mode': 'always'} 866756.6875 0.5516 0.0518
ElementWiseSum {'args': (1024, 1024)} 2097.1521 0.2022 ---
FullyConnected {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 196608), 'bias': (64,), 'flatten': True} 784367.75 1.0534 5.3758
FullyConnected {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 256), 'bias': (64,), 'flatten': False} 790659.1875 1.5328 5.066
GridGenerator {'data': (32, 2, 256, 256), 'transform_type': 'warp', 'target_shape': (256, 6)} 125454.0234 28.213 20.387
GridGenerator {'data': (256, 6), 'transform_type': 'affine', 'target_shape': (256, 6)} 128346.8281 1.1386 0.1196
LeakyReLU {'data': (1024, 1024), 'act_type': 'leaky'} 989042.8125 0.324 0.36
LeakyReLU {'data': (1024, 1024), 'act_type': 'elu'} 993237.0625 0.657 0.3744
LeakyReLU {'data': (1024, 1024), 'act_type': 'selu'} 997431.375 0.643 0.3458
LeakyReLU {'data': (1024, 1024), 'act_type': 'gelu'} 1001625.6875 0.717 1.0024
MakeLoss {'data': (1024, 1024)} 3989188.5 0.4068 0.2872
MakeLoss {'data': (10000, 1)} 3987131.25 0.0444 0.0248
MakeLoss {'data': (10000, 100)} 3991111.25 0.4052 0.2756
Pooling {'data': (32, 3, 64, 64), 'kernel': (3, 3), 'pool_type': 'sum', 'global_pool': 1, 'stride': (1, 1), 'pad': (0, 0)} 1226811.0 1.9442 3.6538
ROIPooling {'data': (32, 3, 64, 64), 'rois': (32, 5), 'pooled_size': (2, 2), 'spatial_scale': 0.5} 1226823.25 0.2136 0.1412
SequenceLast {'data': (1024, 1024), 'axis': 0} 4346213.0 0.0458 ---
SequenceLast {'data': (10000, 1), 'axis': 0} 4346211.0 0.0334 ---
SequenceLast {'data': (10000, 100), 'axis': 0} 4346211.5 0.0366 ---
SequenceMask {'data': (1024, 1024), 'axis': 0} 4354600.0 0.4798 0.4728
SequenceMask {'data': (10000, 1), 'axis': 0} 4354640.0 0.0368 0.0304
SequenceMask {'data': (10000, 100), 'axis': 0} 4358620.0 0.4608 0.4544
SequenceReverse {'data': (1024, 1024), 'axis': 0} 4367008.5 1.8988 2.0938
SequenceReverse {'data': (10000, 1), 'axis': 0} 4367048.5 0.0446 0.037
SequenceReverse {'data': (10000, 100), 'axis': 0} 4371028.5 1.6934 1.7924
Softmax {'data': (1024, 1024), 'label': (1024, 1024)} 1007917.125 0.7094 ---
Softmax {'data': (10000, 1), 'label': (10000, 1)} 1005860.0 0.0676 ---
Softmax {'data': (10000, 100), 'label': (10000, 100)} 1009840.0 0.6962 ---
SoftmaxActivation {'data': (1024, 1024)} 1016228.625 1.0122 2.2824
SoftmaxActivation {'data': (10000, 1)} 1016268.625 0.065 0.0922
SoftmaxActivation {'data': (10000, 100)} 1020268.625 0.9908 2.1706
UpSampling {'args': (32, 3, 256, 256), 'scale': 2, 'sample_type': 'nearest'} 4284767.0 40.5752 11.365
UpSampling {'args': (32, 3, 10000, 1), 'scale': 4, 'sample_type': 'nearest'} 4346207.0 25.3042 3.813
abs {'data': (1024, 1024)} 2097.1521 0.212 0.4136
abs {'data': (10000, 1)} 20.0 0.027 0.0252
abs {'data': (10000, 100)} 2000.0 0.2022 0.393
adam_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 2342557.25 0.8752 ---
adam_update {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 2340500.0 0.0788 ---
adam_update {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 2344480.0 0.9712 ---
add_n {'args': [(1024, 1024)]} 4083440.5 0.0252 ---
add_n {'args': [(10000, 1)]} 4083440.5 0.0242 ---
add_n {'args': [(10000, 10)]} 4083440.5 0.0268 ---
all_finite {'data': (1024, 1024)} 4371028.5 2.4546 ---
all_finite {'data': (10000, 1)} 4371028.5 0.0784 ---
all_finite {'data': (10000, 100)} 4371028.5 0.51 ---
amp_cast {'data': (1024, 1024), 'dtype': 'float32'} 2097.1521 7.6808 7.6698
amp_cast {'data': (10000, 1), 'dtype': 'int32'} 20.0 0.1086 0.101
amp_cast {'data': (10000, 100), 'dtype': 'float32'} 2000.0 7.375 7.3848
amp_multicast {'args': [(1024, 1024)], 'num_outputs': 1} 0.004 0.0886 0.0828
amp_multicast {'args': [(10000, 1)], 'num_outputs': 1} 0.004 0.0976 0.0852
arccos {'data': (1024, 1024)} 2097.1521 0.7154 0.6098
arccos {'data': (10000, 1)} 20.0 0.0314 0.0286
arccos {'data': (10000, 100)} 2000.0 0.6826 0.5764
arccosh {'data': (1024, 1024)} 2097.1521 0.5528 0.6402
arccosh {'data': (10000, 1)} 20.0 0.0312 0.0264
arccosh {'data': (10000, 100)} 2000.0 0.5322 0.6022
arcsin {'data': (1024, 1024)} 2097.1521 0.6546 0.594
arcsin {'data': (10000, 1)} 20.0 0.031 0.027
arcsin {'data': (10000, 100)} 2000.0 0.631 0.5702
arcsinh {'data': (1024, 1024)} 2097.1521 1.0848 0.4514
arcsinh {'data': (10000, 1)} 20.0 0.0342 0.0258
arcsinh {'data': (10000, 100)} 2000.0 1.0358 0.4384
arctan {'data': (1024, 1024)} 2097.1521 0.7906 0.3354
arctan {'data': (10000, 1)} 20.0 0.031 0.0244
arctan {'data': (10000, 100)} 2000.0 0.7562 0.3314
arctanh {'data': (1024, 1024)} 2097.1521 0.9712 0.3348
arctanh {'data': (10000, 1)} 20.0 0.103 0.076
arctanh {'data': (10000, 100)} 2000.0 2.0224 0.6334
argmax {'data': (1024, 1024), 'axis': 0} 727309.375 48.2922 ---
argmax {'data': (10000, 1), 'axis': 0} 727307.375 0.275 ---
argmax {'data': (10000, 100), 'axis': 0} 727307.75 25.4064 ---
argmax_channel {'data': (1024, 1024)} 2.048 0.9084 ---
argmax_channel {'data': (10000, 1)} 20.0 0.0662 ---
argmax_channel {'data': (10000, 100)} 20.0 0.8928 ---
argmin {'data': (1024, 1024), 'axis': 0} 727313.6875 37.2968 ---
argmin {'data': (10000, 1), 'axis': 0} 727311.625 0.3754 ---
argmin {'data': (10000, 100), 'axis': 0} 727311.875 37.1692 ---
argsort {'data': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 733603.3125 121.7474 ---
argsort {'data': (10000, 1), 'axis': 0, 'dtype': 'int32'} 731546.125 3.633 ---
argsort {'data': (10000, 100), 'axis': 0, 'dtype': 'float32'} 735526.125 103.1716 ---
batch_dot {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)} 134217.7344 70.0451 ---
batch_dot {'lhs': (32, 1000, 10), 'rhs': (32, 1000, 10), 'transpose_b': True} 128000.0 67.3045 ---
batch_dot {'lhs': (32, 1000, 1), 'rhs': (32, 100, 1000), 'transpose_a': True, 'transpose_b': True} 41305.7773 18.1213 ---
broadcast_add {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2344 0.3276
broadcast_add {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.047 0.055
broadcast_add {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0296 0.0306
broadcast_div {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.234 0.763
broadcast_div {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0478 0.096
broadcast_div {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.03 0.036
broadcast_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2518 ---
broadcast_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0452 ---
broadcast_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0258 ---
broadcast_greater {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.3814 ---
broadcast_greater {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0576 ---
broadcast_greater {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0276 ---
broadcast_greater_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.3804 ---
broadcast_greater_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0592 ---
broadcast_greater_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.027 ---
broadcast_hypot {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.4126 1.0688
broadcast_hypot {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0668 0.1304
broadcast_hypot {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.032 0.041
broadcast_lesser {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.362 ---
broadcast_lesser {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0566 ---
broadcast_lesser {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0276 ---
broadcast_lesser_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.3878 ---
broadcast_lesser_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0598 ---
broadcast_lesser_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0268 ---
broadcast_logical_and {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2538 ---
broadcast_logical_and {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.047 ---
broadcast_logical_and {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0258 ---
broadcast_logical_or {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2254 ---
broadcast_logical_or {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0454 ---
broadcast_logical_or {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0262 ---
broadcast_logical_xor {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.276 ---
broadcast_logical_xor {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0476 ---
broadcast_logical_xor {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0264 ---
broadcast_maximum {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.4624 0.8854
broadcast_maximum {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.07 0.1176
broadcast_maximum {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.033 0.037
broadcast_minimum {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.463 0.9152
broadcast_minimum {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0728 0.114
broadcast_minimum {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0324 0.0374
broadcast_minus {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.227 ---
broadcast_minus {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0456 ---
broadcast_minus {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0266 ---
broadcast_mod {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 1.3962 0.6158
broadcast_mod {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.1614 0.0874
broadcast_mod {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0406 0.0382
broadcast_mul {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.238 0.6024
broadcast_mul {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0496 0.084
broadcast_mul {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0304 0.0366
broadcast_not_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.253 ---
broadcast_not_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.046 ---
broadcast_not_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0274 ---
broadcast_plus {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2306 ---
broadcast_plus {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.044 ---
broadcast_plus {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.027 ---
broadcast_power {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 1.8618 4.3794
broadcast_power {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.2036 0.4468
broadcast_power {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0434 0.07
broadcast_sub {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2324 0.335
broadcast_sub {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0474 0.0548
broadcast_sub {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.03 0.0316
cast {'data': (1024, 1024), 'dtype': 'float32'} 2097.1521 --- 7.6468
cast {'data': (10000, 1), 'dtype': 'int32'} 20.0 --- 0.0992
cast {'data': (10000, 100), 'dtype': 'float32'} 2000.0 --- 7.384
cast_storage {'data': (1024, 1024), 'stype': 'default'} 4377320.0 0.8756 ---
cast_storage {'data': (10000, 1), 'stype': 'csr'} 4375423.0 0.159 ---
cast_storage {'data': (10000, 100), 'stype': 'row_sparse'} 4379403.0 0.9774 ---
cbrt {'data': (1024, 1024)} 2097.1521 1.0168 0.3444
cbrt {'data': (10000, 1)} 20.0 0.0344 0.0254
cbrt {'data': (10000, 100)} 2000.0 0.9692 0.3102
ceil {'data': (1024, 1024)} 2097.1521 0.2384 ---
ceil {'data': (10000, 1)} 20.0 0.0244 ---
ceil {'data': (10000, 100)} 2000.0 0.226 ---
choose_element_0index {'data': (1024, 1024), 'index': (1, 1024), 'axis': 0} 2.048 0.0322 ---
choose_element_0index {'data': (10000, 1), 'index': (1, 1), 'axis': 0} 0.002 0.0236 ---
choose_element_0index {'data': (10000, 100), 'index': (1, 100), 'axis': 0} 0.2 0.0252 ---
clip {'data': (1024, 1024), 'a_min': 0.1, 'a_max': 0.9} 4385751.5 0.385 0.398
clip {'data': (10000, 1), 'a_min': 0.1, 'a_max': 0.9} 4385791.5 0.0328 0.027
clip {'data': (10000, 100), 'a_min': 0.1, 'a_max': 0.9} 4389771.5 0.3686 0.3776
cos {'data': (1024, 1024)} 2097.1521 0.5498 0.6272
cos {'data': (10000, 1)} 20.0 0.0308 0.0288
cos {'data': (10000, 100)} 2000.0 0.5236 0.5956
cosh {'data': (1024, 1024)} 2097.1521 0.7586 1.2504
cosh {'data': (10000, 1)} 20.0 0.0342 0.0338
cosh {'data': (10000, 100)} 2000.0 0.7274 1.1916
ctc_loss {'data': (1024, 100, 100), 'label': (100, 100)} 4071032.0 90.3162 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 4396063.0 0.6782 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'int32'} 4398160.0 13.5304 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 4400257.5 0.6888 ---
degrees {'data': (1024, 1024)} 2097.1521 0.211 0.24
degrees {'data': (10000, 1)} 20.0 0.028 0.0248
degrees {'data': (10000, 100)} 2000.0 0.2032 0.2292
depth_to_space {'data': (1, 4, 2, 4), 'block_size': 2} 752173.5625 0.0308 ---
depth_to_space {'data': (10, 25, 10, 100), 'block_size': 5} 753173.5 0.5724 ---
dot {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 1.0748 2.4216
dot {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} 2000.0 0.132 0.4854
dot {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_a': True, 'transpose_b': True} 0.2 0.1334 0.0994
elemwise_add {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2254 ---
elemwise_add {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0444 ---
elemwise_add {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0274 ---
elemwise_div {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2454 ---
elemwise_div {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0442 ---
elemwise_div {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0264 ---
elemwise_mul {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.24 ---
elemwise_mul {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0446 ---
elemwise_mul {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0266 ---
elemwise_sub {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2406 ---
elemwise_sub {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.045 ---
elemwise_sub {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0276 ---
erf {'data': (1024, 1024)} 2097.1521 0.815 0.7676
erf {'data': (10000, 1)} 20.0 0.0332 0.0296
erf {'data': (10000, 100)} 2000.0 0.784 0.727
erfinv {'data': (1024, 1024)} 2097.1521 2.478 0.8618
erfinv {'data': (10000, 1)} 20.0 0.0514 0.031
erfinv {'data': (10000, 100)} 2000.0 2.37 0.8186
exp {'data': (1024, 1024)} 2097.1521 0.6018 ---
exp {'data': (10000, 1)} 20.0 0.0286 ---
exp {'data': (10000, 100)} 2000.0 0.5718 ---
expm1 {'data': (1024, 1024)} 2097.1521 0.8604 0.6254
expm1 {'data': (10000, 1)} 20.0 0.0328 0.0274
expm1 {'data': (10000, 100)} 2000.0 0.8238 0.5858
fill_element_0index {'lhs': (1024, 1024), 'mhs': (1024,), 'rhs': (1024,)} 4404451.5 0.4546 ---
fill_element_0index {'lhs': (10000, 1), 'mhs': (10000,), 'rhs': (10000,)} 4402394.5 0.0256 ---
fill_element_0index {'lhs': (10000, 100), 'mhs': (10000,), 'rhs': (10000,)} 4406374.5 0.4398 ---
fix {'data': (1024, 1024)} 2097.1521 0.6238 ---
fix {'data': (10000, 1)} 20.0 0.0276 ---
fix {'data': (10000, 100)} 2000.0 0.5962 ---
flatten {'data': (1024, 1024)} 2097.1521 1.1694 ---
flatten {'data': (10000, 1)} 20.0 0.0246 ---
flatten {'data': (10000, 100)} 2000.0 1.0998 ---
flip {'data': (1024, 1024), 'axis': 0} 758964.9375 1.066 ---
flip {'data': (10000, 1), 'axis': 0} 756907.8125 0.0408 ---
flip {'data': (10000, 100), 'axis': 0} 760887.8125 1.0178 ---
floor {'data': (1024, 1024)} 2097.1521 0.2388 ---
floor {'data': (10000, 1)} 20.0 0.0252 ---
floor {'data': (10000, 100)} 2000.0 0.2276 ---
ftml_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'd': (1024, 1024), 'v': (1024, 1024), 'z': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'wd': 0.1, 'rescale_grad': 0.4, 'clip_grad': -1.0} 2344577.25 3.0608 ---
ftml_update {'weight': (10000, 1), 'grad': (10000, 1), 'd': (10000, 1), 'v': (10000, 1), 'z': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'wd': 0.5, 'rescale_grad': 0.4, 'clip_grad': -1.0} 2344597.25 0.0932 ---
ftml_update {'weight': (10000, 100), 'grad': (10000, 100), 'd': (10000, 100), 'v': (10000, 100), 'z': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'wd': 0.9, 'rescale_grad': 0.4, 'clip_grad': -1.0} 2346597.25 3.1814 ---
ftrl_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'z': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 2352888.75 1.9592 ---
ftrl_update {'weight': (10000, 1), 'grad': (10000, 1), 'z': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 2350831.5 0.0744 ---
ftrl_update {'weight': (10000, 100), 'grad': (10000, 100), 'z': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 2354811.5 1.8568 ---
gamma {'data': (1024, 1024)} 2097.1521 2.5252 5.578
gamma {'data': (10000, 1)} 20.0 0.0508 0.0762
gamma {'data': (10000, 100)} 2000.0 2.4068 5.2962
gammaln {'data': (1024, 1024)} 2097.1521 30.2336 3.4212
gammaln {'data': (10000, 1)} 20.0 0.3154 0.055
gammaln {'data': (10000, 100)} 2000.0 29.218 3.2628
gather_nd {'data': (1024, 1024), 'indices': (1, 1)} 771822.6875 0.0374 ---
gather_nd {'data': (10000, 1), 'indices': (1, 1)} 771820.625 0.0336 ---
gather_nd {'data': (10000, 100), 'indices': (1, 1)} 771821.0625 0.0336 ---
hard_sigmoid {'data': (1024, 1024)} 1028657.1875 0.2768 0.2174
hard_sigmoid {'data': (10000, 1)} 1028697.1875 0.0492 0.0442
hard_sigmoid {'data': (10000, 100)} 1032697.1875 0.2654 0.2156
identity {'data': (1024, 1024)} 2097.1521 1.1592 ---
identity {'data': (10000, 1)} 20.0 0.0232 ---
identity {'data': (10000, 100)} 2000.0 1.0952 ---
khatri_rao {'args': [(32, 32), (32, 32)]} 41305.8008 0.0408 ---
khatri_rao {'args': [(64, 64), (64, 64)]} 41305.8086 0.0366 ---
lamb_update_phase1 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'wd': 0.1, 'rescale_grad': 0.4} 2359103.0 1.2922 ---
lamb_update_phase1 {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'wd': 0.5, 'rescale_grad': 0.4} 2357045.75 0.0696 ---
lamb_update_phase1 {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'wd': 0.9, 'rescale_grad': 0.4} 2361025.75 1.4844 ---
lamb_update_phase2 {'weight': (1024, 1024), 'g': (1024, 1024), 'r1': (1, 1024), 'r2': (1, 1024), 'lr': 0.1} 2365317.25 0.4618 ---
lamb_update_phase2 {'weight': (10000, 1), 'g': (10000, 1), 'r1': (1, 1), 'r2': (1, 1), 'lr': 0.5} 2363260.25 0.0556 ---
lamb_update_phase2 {'weight': (10000, 100), 'g': (10000, 100), 'r1': (1, 100), 'r2': (1, 100), 'lr': 0.9} 2367240.25 0.4386 ---
linalg_det {'A': (1024, 1024)} 4431564.5 7.1228 12.001
linalg_extractdiag {'A': (1024, 1024)} 4431572.5 0.0356 0.117
linalg_extracttrian {'A': (1024, 1024)} 4435771.0 0.693 0.775
linalg_gelqf {'A': (1024, 1024)} 4448354.0 22.3398 ---
linalg_gemm {'A': (1024, 1024), 'B': (1024, 1024), 'C': (1024, 1024), 'axis': 0} 4452548.0 2.7158 3.4128
linalg_gemm2 {'A': (1024, 1024), 'B': (1024, 1024), 'axis': 0} 4460937.0 0.872 1.8476
linalg_inverse {'A': (1024, 1024)} 4469325.5 12.5016 2.473
linalg_makediag {'A': (1024, 1024)} 13059260.0 414.9408 5.4604
linalg_maketrian {'A': (1024, 1035)} 10924218.0 2.9564 2.529
linalg_potrf {'A': [[1, 0], [0, 1]]} 4427366.0 0.0366 ---
linalg_potrf {'A': [[2, -1, 0], [-1, 2, -1], [0, -1, 2]]} 4427366.0 0.0374 ---
linalg_potri {'A': (1024, 1024)} 10932607.0 10.259 3.7036
linalg_slogdet {'A': (1024, 1024)} 10936805.0 6.1854 ---
linalg_sumlogdiag {'A': (1024, 1024)} 10936805.0 0.0798 0.4576
linalg_syrk {'A': (1024, 1024)} 10945194.0 1.9796 2.1174
linalg_trmm {'A': (1024, 1024), 'B': (1024, 1024)} 10951485.0 2.0516 3.887
linalg_trsm {'A': (1024, 1024), 'B': (1024, 1024)} 10959874.0 2.4162 3.7948
log {'data': (1024, 1024)} 2097.1521 0.7208 0.2924
log {'data': (10000, 1)} 20.0 0.031 0.0242
log {'data': (10000, 100)} 2000.0 0.6834 0.258
log10 {'data': (1024, 1024)} 2097.1521 0.7662 0.251
log10 {'data': (10000, 1)} 20.0 0.0316 0.0248
log10 {'data': (10000, 100)} 2000.0 0.732 0.2488
log1p {'data': (1024, 1024)} 2097.1521 0.8328 0.285
log1p {'data': (10000, 1)} 20.0 0.0318 0.0248
log1p {'data': (10000, 100)} 2000.0 0.7932 0.2696
log2 {'data': (1024, 1024)} 2097.1521 0.6838 0.2668
log2 {'data': (10000, 1)} 20.0 0.0312 0.0242
log2 {'data': (10000, 100)} 2000.0 0.6498 0.2356
log_softmax {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 1036891.5 2.271 1.8714
log_softmax {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 1036931.5 0.6102 0.3026
log_softmax {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 1044931.5 2.3142 2.7954
logical_not {'data': (1024, 1024)} 2097.1521 0.1844 ---
logical_not {'data': (10000, 1)} 20.0 0.029 ---
logical_not {'data': (10000, 100)} 2000.0 0.271 ---
make_loss {'data': (1024, 1024)} 2097.1521 1.13 ---
make_loss {'data': (10000, 1)} 20.0 0.0234 ---
make_loss {'data': (10000, 100)} 2000.0 1.0682 ---
max {'data': (1024, 1024), 'axis': 0} 727233.875 1.6338 1.8232
max {'data': (10000, 1), 'axis': 0} 727233.875 0.6616 0.074
max {'data': (10000, 100), 'axis': 0} 727234.3125 2.1136 1.7612
max_axis {'data': (1024, 1024), 'axis': 0} 727240.4375 1.3418 ---
max_axis {'data': (10000, 1), 'axis': 0} 727238.375 0.5166 ---
max_axis {'data': (10000, 100), 'axis': 0} 727238.8125 1.8182 ---
mean {'data': (1024, 1024), 'axis': 0} 727246.75 2.6064 12.5616
mean {'data': (10000, 1), 'axis': 0} 727246.8125 0.7466 0.1882
mean {'data': (10000, 100), 'axis': 0} 727247.1875 1.9712 11.9828
min {'data': (1024, 1024), 'axis': 0} 727255.375 1.6338 1.818
min {'data': (10000, 1), 'axis': 0} 727255.375 0.6574 0.0738
min {'data': (10000, 100), 'axis': 0} 727255.75 2.0888 1.7288
min_axis {'data': (1024, 1024), 'axis': 0} 727261.9375 1.3366 ---
min_axis {'data': (10000, 1), 'axis': 0} 727259.875 0.5204 ---
min_axis {'data': (10000, 100), 'axis': 0} 727260.25 1.804 ---
moments {'data': (1024, 1024), 'axes': [0, 1]} 10959874.0 109.9294 ---
moments {'data': (10000, 1), 'axes': [0, 1]} 10959874.0 1.07 ---
moments {'data': (10000, 100), 'axes': [0, 1]} 10959874.0 103.1208 ---
mp_nag_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 2371531.5 0.7488 ---
mp_nag_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 2369474.5 0.0614 ---
mp_nag_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 2373454.5 0.6854 ---
mp_sgd_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 2375648.75 0.577 ---
mp_sgd_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 2375668.75 0.0484 ---
mp_sgd_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 2377668.75 0.5378 ---
mp_sgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 2383960.25 0.4572 ---
mp_sgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 2381903.0 0.0592 ---
mp_sgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 2385883.0 0.4364 ---
multi_all_finite {'args': [(1024, 1024)], 'num_arrays': 1} 4083440.5 0.0286 ---
multi_all_finite {'args': [(10000, 1)], 'num_arrays': 1} 4083440.5 0.0352 ---
multi_all_finite {'args': [(10000, 10)], 'num_arrays': 1} 4083440.5 0.029 ---
multi_lars {'lrs': (1024, 1024), 'weights_sum_sq': (1024, 1024), 'grads_sum_sq': (1024, 1024), 'wds': (1024, 1024), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 4410666.0 0.8282 ---
multi_lars {'lrs': (10000, 1), 'weights_sum_sq': (10000, 1), 'grads_sum_sq': (10000, 1), 'wds': (10000, 1), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 4408609.0 0.054 ---
multi_lars {'lrs': (10000, 100), 'weights_sum_sq': (10000, 100), 'grads_sum_sq': (10000, 100), 'wds': (10000, 100), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 4412589.0 0.7576 ---
multi_mp_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0256 ---
multi_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0265 ---
multi_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0302 ---
multi_sum_sq {'args': [(1024, 1024)], 'num_arrays': 1} 4083440.5 0.028 ---
multi_sum_sq {'args': [(10000, 1)], 'num_arrays': 1} 4083440.5 0.0224 ---
multi_sum_sq {'args': [(10000, 10)], 'num_arrays': 1} 4083440.5 0.0218 ---
nag_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 2390174.5 0.6658 ---
nag_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 2388117.25 0.0314 ---
nag_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 2392097.25 0.591 ---
nanprod {'data': (1024, 1024), 'axis': 0} 727264.1875 2.2062 3.0144
nanprod {'data': (10000, 1), 'axis': 0} 727264.1875 0.6608 0.0616
nanprod {'data': (10000, 100), 'axis': 0} 727264.5625 2.6818 2.8756
nansum {'data': (1024, 1024), 'axis': 0} 727272.75 2.1792 3.0018
nansum {'data': (10000, 1), 'axis': 0} 727272.75 0.657 0.0602
nansum {'data': (10000, 100), 'axis': 0} 727273.1875 2.2692 2.8666
negative {'data': (1024, 1024)} 2097.1521 0.1794 ---
negative {'data': (10000, 1)} 20.0 0.0228 ---
negative {'data': (10000, 100)} 2000.0 0.1722 ---
norm {'data': (1024, 1024), 'axis': 0} 727281.375 2.2716 2.9272
norm {'data': (10000, 1), 'axis': 0} 727281.375 0.7012 0.0664
norm {'data': (10000, 100), 'axis': 0} 727281.75 14.4066 3.6176
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'float32'} --- 0.0152 ---
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'int32'} --- 0.014 ---
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'float32'} --- 0.0158 ---
ones_like {'data': (1024, 1024)} 2097.1521 0.1024 ---
ones_like {'data': (10000, 1)} 20.0 0.0238 ---
ones_like {'data': (10000, 100)} 2000.0 0.104 ---
pick {'data': (1024, 1024), 'index': (1, 1024), 'axis': 0} 771827.0 0.0328 7.6398
pick {'data': (10000, 1), 'index': (1, 1), 'axis': 0} 771824.9375 0.023 0.0978
pick {'data': (10000, 100), 'index': (1, 100), 'axis': 0} 771825.3125 0.0284 7.3404
preloaded_multi_mp_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 5x5 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'args5': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0321 ---
preloaded_multi_mp_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 1 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0302 ---
preloaded_multi_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 1 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0307 ---
prod {'data': (1024, 1024), 'axis': 0} 727289.9375 2.0932 2.993
prod {'data': (10000, 1), 'axis': 0} 727289.9375 0.6086 0.0736
prod {'data': (10000, 100), 'axis': 0} 727290.375 2.5532 2.853
radians {'data': (1024, 1024)} 2097.1521 0.2124 0.2554
radians {'data': (10000, 1)} 20.0 0.0262 0.0234
radians {'data': (10000, 100)} 2000.0 0.2012 0.243
random_exponential {'shape': (1024, 1024), 'dtype': 'float16'} 131492.5469 7.2024 ---
random_exponential {'shape': (10000, 1), 'dtype': 'float32'} 130483.9766 0.1016 ---
random_exponential {'shape': (10000, 100), 'dtype': 'float64'} 138463.9844 6.8354 ---
random_gamma {'shape': (1024, 1024), 'dtype': 'float16'} 137609.7031 19.3216 ---
random_gamma {'shape': (10000, 1), 'dtype': 'float32'} 136601.125 0.2718 ---
random_gamma {'shape': (10000, 100), 'dtype': 'float64'} 144581.125 18.5524 ---
random_generalized_negative_binomial {'shape': (1024, 1024), 'dtype': 'float16'} 143726.8594 26.0656 ---
random_generalized_negative_binomial {'shape': (10000, 1), 'dtype': 'float32'} 142718.2812 0.3742 ---
random_generalized_negative_binomial {'shape': (10000, 100), 'dtype': 'float64'} 150698.2812 24.8472 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (1024, 1024), 'dtype': 'float16'} 149844.0156 21.4266 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (10000, 1), 'dtype': 'float32'} 148835.4375 0.318 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (10000, 100), 'dtype': 'float64'} 156815.4375 20.3856 ---
random_normal {'shape': (1024, 1024), 'dtype': 'float16'} 155961.1562 12.1396 ---
random_normal {'shape': (10000, 1), 'dtype': 'float32'} 154952.5938 0.185 ---
random_normal {'shape': (10000, 100), 'dtype': 'float64'} 162932.5938 11.547 ---
random_pdf_dirichlet {'sample': (2,), 'alpha': [0.0, 2.5]} 158932.5938 0.0482 ---
random_pdf_exponential {'sample': (2,), 'lam': [1.0, 8.5]} 158932.6094 0.0472 ---
random_pdf_gamma {'sample': (2,), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]} 158932.6094 0.0504 ---
random_pdf_generalized_negative_binomial {'sample': (2,), 'mu': [2.0, 2.5], 'alpha': [0.0, 2.5]} 158932.6094 0.0384 ---
random_pdf_negative_binomial {'sample': (2,), 'k': [20, 49], 'p': [0.4, 0.77]} 158932.625 0.0378 ---
random_pdf_normal {'sample': (2,), 'mu': [2.0, 2.5], 'sigma': [1.0, 3.7]} 158932.625 0.0366 ---
random_pdf_poisson {'sample': (2,), 'lam': [1.0, 8.5]} 158932.6406 0.0316 ---
random_pdf_uniform {'sample': (2,), 'low': [0.0, 2.5], 'high': [1.0, 3.7]} 158932.6406 0.0274 ---
random_poisson {'shape': (1024, 1024), 'dtype': 'float16'} 162078.3594 11.0416 ---
random_poisson {'shape': (10000, 1), 'dtype': 'float32'} 161069.7969 0.1528 ---
random_poisson {'shape': (10000, 100), 'dtype': 'float64'} 169049.7969 10.2932 ---
random_randint {'low': 0, 'high': 5, 'shape': (1024, 1024), 'dtype': 'int32'} 171341.25 2.6056 ---
random_randint {'low': 0, 'high': 5, 'shape': (10000, 1), 'dtype': 'int64'} 169324.0938 0.0616 ---
random_randint {'low': 0, 'high': 5, 'shape': (10000, 100), 'dtype': 'int32'} 173284.0938 2.5188 ---
random_uniform {'low': 0, 'high': 5, 'shape': (1024, 1024), 'dtype': 'float16'} 174429.8281 6.8714 ---
random_uniform {'low': 0, 'high': 5, 'shape': (10000, 1), 'dtype': 'float32'} 173421.25 0.098 ---
random_uniform {'low': 0, 'high': 5, 'shape': (10000, 100), 'dtype': 'float64'} 181401.25 6.427 ---
ravel_multi_index {'data': (2, 1024), 'shape': (1024, 1024)} 771831.25 0.031 ---
ravel_multi_index {'data': (2, 1024), 'shape': (10000, 1)} 771833.3125 0.027 ---
ravel_multi_index {'data': (2, 1024), 'shape': (10000, 100)} 771835.375 0.0272 ---
rcbrt {'data': (1024, 1024)} 2097.1521 1.026 1.114
rcbrt {'data': (10000, 1)} 20.0 0.0362 0.0322
rcbrt {'data': (10000, 100)} 2000.0 0.9796 1.0628
reciprocal {'data': (1024, 1024)} 2097.1521 0.212 0.3184
reciprocal {'data': (10000, 1)} 20.0 0.0284 0.0258
reciprocal {'data': (10000, 100)} 2000.0 0.2028 0.3016
relu {'data': (1024, 1024)} 2097.1521 0.4144 0.4808
relu {'data': (10000, 1)} 20.0 0.0308 0.0272
relu {'data': (10000, 100)} 2000.0 0.396 0.4534
reset_arrays {'args': [(1024, 1024)], 'num_arrays': 1} --- 0.0122 ---
reset_arrays {'args': [(10000, 1)], 'num_arrays': 1} --- 0.0106 ---
reset_arrays {'args': [(10000, 10)], 'num_arrays': 1} --- 0.0144 ---
reshape_like {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 1.1102 ---
reshape_like {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0676 ---
reshape_like {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0228 ---
rint {'data': (1024, 1024)} 2097.1521 0.6204 ---
rint {'data': (10000, 1)} 20.0 0.0284 ---
rint {'data': (10000, 100)} 2000.0 0.5922 ---
rmsprop_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4} 2392194.5 0.897 ---
rmsprop_update {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4} 2392214.5 0.0484 ---
rmsprop_update {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4} 2394214.5 0.975 ---
rmspropalex_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'g': (1024, 1024), 'delta': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'gamma2': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4} 2400506.0 1.2392 ---
rmspropalex_update {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'g': (10000, 1), 'delta': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'gamma2': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4} 2398448.75 0.083 ---
rmspropalex_update {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'g': (10000, 100), 'delta': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'gamma2': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4} 2402428.75 1.171 ---
round {'data': (1024, 1024)} 2097.1521 0.3796 ---
round {'data': (10000, 1)} 20.0 0.0258 ---
round {'data': (10000, 100)} 2000.0 0.3668 ---
rsqrt {'data': (1024, 1024)} 2097.1521 0.5206 0.599
rsqrt {'data': (10000, 1)} 20.0 0.1318 0.1184
rsqrt {'data': (10000, 100)} 2000.0 1.672 0.5748
sample_exponential {'lam': [1.0, 8.5], 'shape': (1024, 1024), 'dtype': 'float16'} 183692.7031 14.451 ---
sample_exponential {'lam': [1.0, 8.5], 'shape': (10000, 1), 'dtype': 'float32'} 181675.5469 0.1636 ---
sample_exponential {'lam': [1.0, 8.5], 'shape': (10000, 100), 'dtype': 'float64'} 197635.5469 13.569 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'beta': [1.0, 0.7]} 195927.0 53.41 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'beta': [1.0, 0.7]} 193909.8594 0.5882 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'beta': [1.0, 0.7]} 209869.8594 58.1334 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'alpha': [0.0, 2.5]} 208161.3125 89.2 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'alpha': [0.0, 2.5]} 206144.1562 0.8816 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'alpha': [0.0, 2.5]} 222104.1562 85.515 ---
sample_multinomial {'data': (32, 32), 'shape': (1024, 1024), 'dtype': 'float16'} 583202.9375 5384.4893 ---
sample_multinomial {'data': (32, 32), 'shape': (10000, 1), 'dtype': 'float32'} 550928.5 51.4338 ---
sample_multinomial {'data': (32, 32), 'shape': (10000, 100), 'dtype': 'float64'} 806288.5 5108.4062 ---
sample_negative_binomial {'k': [20, 49], 'shape': (1024, 1024), 'dtype': 'float16', 'p': [0.4, 0.77]} 684579.9375 428.873 ---
sample_negative_binomial {'k': [20, 49], 'shape': (10000, 1), 'dtype': 'float32', 'p': [0.4, 0.77]} 682522.8125 4.232 ---
sample_negative_binomial {'k': [20, 49], 'shape': (10000, 100), 'dtype': 'float64', 'p': [0.4, 0.77]} 698522.8125 404.8968 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'sigma': [1.0, 3.7]} 696814.25 24.144 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'sigma': [1.0, 3.7]} 694797.0625 1.693 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'sigma': [1.0, 3.7]} 710757.0625 46.6316 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (1024, 1024), 'dtype': 'float16'} 709048.5625 80.0988 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (10000, 1), 'dtype': 'float32'} 706991.375 0.7566 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (10000, 100), 'dtype': 'float64'} 722991.375 67.3928 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'high': [1.0, 3.7]} 721282.875 13.6692 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'high': [1.0, 3.7]} 719265.6875 0.1758 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'high': [1.0, 3.7]} 735225.6875 9.6772 ---
sgd_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 2404623.25 0.4874 ---
sgd_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 2404643.25 0.045 ---
sgd_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 2406643.25 0.4482 ---
sgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 2408740.25 0.3374 ---
sgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 2408780.25 0.038 ---
sgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 2412760.25 0.32 ---
shape_array {'data': (1024, 1024)} 0.008 0.0184 ---
shape_array {'data': (10000, 1)} 0.008 0.0164 ---
shape_array {'data': (10000, 100)} 0.008 0.0156 ---
shuffle {'data': (1024, 1024)} 2097.1521 2.2788 ---
shuffle {'data': (10000, 1)} 20.0 0.647 ---
shuffle {'data': (10000, 100)} 2000.0 2.394 ---
sigmoid {'data': (1024, 1024)} 2097.1521 0.605 0.3084
sigmoid {'data': (10000, 1)} 20.0 0.0336 0.0268
sigmoid {'data': (10000, 100)} 2000.0 0.5842 0.2892
sign {'data': (1024, 1024)} 2097.1521 0.3576 0.228
sign {'data': (10000, 1)} 20.0 0.0296 0.025
sign {'data': (10000, 100)} 2000.0 0.3418 0.2182
signsgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 2417051.75 0.3604 ---
signsgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 2414994.5 0.0304 ---
signsgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 2418974.5 0.3482 ---
signum_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 2423266.0 0.5978 ---
signum_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 2421209.0 0.0376 ---
signum_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 2425189.0 0.5558 ---
sin {'data': (1024, 1024)} 2097.1521 0.5496 0.608
sin {'data': (10000, 1)} 20.0 0.0302 0.0276
sin {'data': (10000, 100)} 2000.0 0.526 0.5756
sinh {'data': (1024, 1024)} 2097.1521 1.1628 0.7774
sinh {'data': (10000, 1)} 20.0 0.0374 0.0292
sinh {'data': (10000, 100)} 2000.0 1.1074 0.7492
size_array {'data': (1024, 1024)} 0.004 0.0182 ---
size_array {'data': (10000, 1)} 0.004 0.0158 ---
size_array {'data': (10000, 100)} 0.004 0.0154 ---
slice {'data': (1024, 1024), 'begin': 0, 'end': 1} 771841.5 0.0864 0.134
slice {'data': (10000, 1), 'begin': 0, 'end': 1} 771841.5 0.0492 0.043
slice {'data': (10000, 100), 'begin': 0, 'end': 1} 771841.9375 0.0526 0.1092
slice_axis {'data': (1024, 1024), 'axis': 0, 'begin': 0, 'end': 1} 771850.125 0.1092 0.2768
slice_axis {'data': (10000, 1), 'axis': 0, 'begin': 0, 'end': 1} 771850.125 0.05 0.0526
slice_axis {'data': (10000, 100), 'axis': 0, 'begin': 0, 'end': 1} 771850.5 0.0594 0.2204
slice_like {'data': (1024, 1024), 'shape_like': (100, 100), 'axes': [0, 1]} 771930.5 0.0786 0.16
slice_like {'data': (10000, 1), 'shape_like': (10, 1), 'axes': [0, 1]} 771930.5625 0.0648 0.06
slice_like {'data': (10000, 100), 'shape_like': (100, 10), 'axes': [0, 1]} 771934.5625 0.0678 0.1442
smooth_l1 {'data': (1024, 1024)} 4079420.5 0.5924 0.6638
smooth_l1 {'data': (10000, 1)} 4079460.5 0.0354 0.031
smooth_l1 {'data': (10000, 100)} 4083440.5 0.5672 0.631
softmax {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 1049125.875 2.5002 2.1656
softmax {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 1049165.875 0.6192 0.1764
softmax {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 1057165.875 2.4482 2.0212
softmax_cross_entropy {'data': (1024, 1024), 'label': (1024,)} 4083440.5 1.2728 ---
softmin {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 1061360.125 2.5134 2.1578
softmin {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 1061400.125 0.625 0.183
softmin {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 1069400.125 2.3696 2.0374
softsign {'data': (1024, 1024)} 2097.1521 0.2364 0.3586
softsign {'data': (10000, 1)} 20.0 0.027 0.0254
softsign {'data': (10000, 100)} 2000.0 0.2252 0.3428
sort {'data': (1024, 1024), 'axis': 0} 748206.25 123.189 ---
sort {'data': (10000, 1), 'axis': 0} 746189.0625 3.9742 ---
sort {'data': (10000, 100), 'axis': 0} 754169.0625 104.927 ---
space_to_depth {'data': (1, 4, 2, 4), 'block_size': 2} 758888.0 0.0262 ---
space_to_depth {'data': (10, 25, 10, 100), 'block_size': 5} 759887.9375 0.6516 ---
sqrt {'data': (1024, 1024)} 2097.1521 0.4914 0.2722
sqrt {'data': (10000, 1)} 20.0 0.0304 0.0244
sqrt {'data': (10000, 100)} 2000.0 0.4744 0.2782
square {'data': (1024, 1024)} 2097.1521 0.2452 0.2684
square {'data': (10000, 1)} 20.0 0.0284 0.024
square {'data': (10000, 100)} 2000.0 0.2332 0.2554
squeeze {'data': (1, 1024, 1024), 'axis': 0} 4418977.5 1.0562 1.032
squeeze {'data': (32, 1, 256, 256), 'axis': 1} 4427366.0 2.2192 1.9254
stop_gradient {'data': (1024, 1024)} 2097.1521 1.0836 ---
stop_gradient {'data': (10000, 1)} 20.0 0.0226 ---
stop_gradient {'data': (10000, 100)} 2000.0 1.0606 ---
sum {'data': (1024, 1024), 'axis': 0} 727298.5625 2.2956 2.8428
sum {'data': (10000, 1), 'axis': 0} 727298.5625 0.7038 0.0538
sum {'data': (10000, 100), 'axis': 0} 727298.9375 2.3928 2.7378
sum_axis {'data': (1024, 1024), 'axis': 0} 727305.125 2.2698 ---
sum_axis {'data': (10000, 1), 'axis': 0} 727303.0625 0.7012 ---
sum_axis {'data': (10000, 100), 'axis': 0} 727303.4375 2.3688 ---
swapaxes {'data': (1024, 1024), 'dim1': 0, 'dim2': 1} 765679.375 2.5424 ---
swapaxes {'data': (10000, 1), 'dim1': 0, 'dim2': 1} 763622.25 0.0534 ---
swapaxes {'data': (10000, 100), 'dim1': 0, 'dim2': 1} 767602.25 2.2424 ---
take {'a': (1024, 1024), 'indices': (1, 1), 'axis': 0} 771942.75 0.0502 0.3292
tan {'data': (1024, 1024)} 2097.1521 0.9126 0.3128
tan {'data': (10000, 1)} 20.0 0.0324 0.0242
tan {'data': (10000, 100)} 2000.0 0.8686 0.2986
tanh {'data': (1024, 1024)} 2097.1521 1.0974 0.3138
tanh {'data': (10000, 1)} 20.0 0.0358 0.0246
tanh {'data': (10000, 100)} 2000.0 1.0448 0.2942
topk {'data': (1024, 1024), 'axis': 0, 'k': 1, 'dtype': 'float32'} 752175.1875 111.0136 ---
topk {'data': (10000, 1), 'axis': 0, 'k': 1, 'dtype': 'int32'} 752173.1875 1.0394 ---
topk {'data': (10000, 100), 'axis': 0, 'k': 1, 'dtype': 'float32'} 752173.5625 92.158 ---
transpose {'data': (1024, 1024), 'axes': [0, 1]} 771893.6875 0.8228 ---
transpose {'data': (10000, 1), 'axes': [0, 1]} 769836.5625 0.023 ---
transpose {'data': (10000, 100), 'axes': [0, 1]} 773816.5625 0.7946 ---
trunc {'data': (1024, 1024)} 2097.1521 0.3674 ---
trunc {'data': (10000, 1)} 20.0 0.0256 ---
trunc {'data': (10000, 100)} 2000.0 0.3492 ---
where {'condition': (1024,), 'x': (1024, 1024), 'y': (1024, 1024)} 780331.375 0.4704 0.8414
where {'condition': (10000,), 'x': (10000, 1), 'y': (10000, 1)} 780371.375 0.1902 0.0448
where {'condition': (10000,), 'x': (10000, 100), 'y': (10000, 100)} 784351.375 0.454 0.7998
zeros_like {'data': (1024, 1024)} 2097.1521 0.1076 ---
zeros_like {'data': (10000, 1)} 20.0 0.018 ---
zeros_like {'data': (10000, 100)} 2000.0 0.1062 ---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment