Skip to content

Instantly share code, notes, and snippets.

@connorgoggins
Created February 15, 2020 00:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save connorgoggins/09131cdbbdcbba1dc39a93099dccaad4 to your computer and use it in GitHub Desktop.
Save connorgoggins/09131cdbbdcbba1dc39a93099dccaad4 to your computer and use it in GitHub Desktop.

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✖ CUDA
  15. CUDA_RTC : ✖ CUDA_RTC
  16. CUDNN : ✖ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✖ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✔ INT64_TENSOR_SIZE
  22. JEMALLOC : ✖ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✔ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✖ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✖ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Inputs Max Mem Usage (Storage) (Bytes) Avg Forward Time (ms) Avg. Backward Time (ms)
Activation {'data': (1024, 1024), 'act_type': 'relu'} 357422.5312 0.0552 0.1204
Activation {'data': (1024, 1024), 'act_type': 'sigmoid'} 359519.6875 0.1732 0.9284
Activation {'data': (1024, 1024), 'act_type': 'softrelu'} 361616.8438 0.1674 0.8194
Activation {'data': (1024, 1024), 'act_type': 'softsign'} 363714.0 0.0536 0.0596
Activation {'data': (1024, 1024), 'act_type': 'tanh'} 365811.1562 0.1902 1.7102
BatchNorm {'data': (32, 3, 256, 256), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)} 344514.0 2.6362 1.4606
BatchNorm {'data': (32, 3, 10000, 10), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)} 370331.0938 6.5164 1.9314
BilinearSampler {'data': (32, 2, 256, 256), 'grid': (32, 2, 256, 256)} 25165.8242 103.5358 117.9724
BlockGrad {'data': (1024, 1024)} 2097.1521 0.4334 ---
BlockGrad {'data': (10000, 1)} 20.0 0.0216 ---
BlockGrad {'data': (10000, 100)} 2000.0 0.4104 ---
CTCLoss {'data': (1024, 100, 100), 'label': (100, 100)} 4147407.75 80.882 ---
Convolution {'data': (32, 3, 64, 64), 'weight': (64, 3, 3, 3), 'bias': (64,), 'kernel': (3, 3), 'stride': (1, 1), 'dilate': (1, 1), 'pad': (0, 0), 'num_filter': 64, 'layout': 'NCHW'} 3172700.5 6.7754 4.3082
Custom {'args': [(1024, 1024)], 'op_type': 'CustomAddOne'} 4270740.0 0.007 0.0088
Custom {'args': [(10000, 1)], 'op_type': 'CustomAddOne'} 4270740.0 0.007 0.0072
Custom {'args': [(10000, 10)], 'op_type': 'CustomAddOne'} 4270740.0 0.006 0.0068
Deconvolution {'data': (32, 3, 64, 64), 'weight': (3, 64, 3, 3), 'bias': (64,), 'kernel': (3, 3), 'stride': (1, 1), 'dilate': (1, 1), 'pad': (0, 0), 'num_filter': 64, 'no_bias': False, 'layout': 'NCHW'} 4124289.25 25.1678 24.35
Dropout {'data': (32, 3, 256, 256), 'p': 0.5, 'mode': 'always'} 318948.1875 3.5208 0.756
Dropout {'data': (10000, 10), 'p': 0.5, 'mode': 'always'} 306965.2812 0.0868 0.0264
ElementWiseSum {'args': (1024, 1024)} 2097.1521 0.0906 ---
FullyConnected {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 196608), 'bias': (64,), 'flatten': True} 252891.9844 1.6274 4.4118
FullyConnected {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 256), 'bias': (64,), 'flatten': False} 259179.3438 1.3982 6.1992
GridGenerator {'data': (32, 2, 256, 256), 'transform_type': 'warp', 'target_shape': (256, 6)} 8650.752 29.8418 21.215
GridGenerator {'data': (256, 6), 'transform_type': 'affine', 'target_shape': (256, 6)} 1582.08 0.4098 0.578
LeakyReLU {'data': (1024, 1024), 'act_type': 'leaky'} 370005.4688 0.0488 0.0556
LeakyReLU {'data': (1024, 1024), 'act_type': 'elu'} 372102.625 0.1116 0.721
LeakyReLU {'data': (1024, 1024), 'act_type': 'selu'} 374199.75 0.6926 0.2626
LeakyReLU {'data': (1024, 1024), 'act_type': 'gelu'} 376296.9062 0.2298 2.0292
MakeLoss {'data': (1024, 1024)} 4133219.0 0.071 0.03
MakeLoss {'data': (10000, 1)} 4131161.75 0.0198 0.0106
MakeLoss {'data': (10000, 100)} 4135141.75 4.8696 0.0288
Pooling {'data': (32, 3, 64, 64), 'kernel': (3, 3), 'pool_type': 'sum', 'global_pool': 1, 'stride': (1, 1), 'pad': (0, 0)} 477931.0312 0.4686 0.2812
ROIPooling {'data': (32, 3, 64, 64), 'rois': (32, 5), 'pooled_size': (2, 2), 'spatial_scale': 0.5} 477938.5 1.19 0.095
SequenceLast {'data': (1024, 1024), 'axis': 0} 4270746.0 0.0308 ---
SequenceLast {'data': (10000, 1), 'axis': 0} 4270744.0 0.0186 ---
SequenceLast {'data': (10000, 100), 'axis': 0} 4270744.5 0.019 ---
SequenceMask {'data': (1024, 1024), 'axis': 0} 4277035.5 0.2584 0.2368
SequenceMask {'data': (10000, 1), 'axis': 0} 4274978.5 0.021 0.0134
SequenceMask {'data': (10000, 100), 'axis': 0} 4278958.5 0.2248 0.2208
SequenceReverse {'data': (1024, 1024), 'axis': 0} 4283250.0 2.4684 2.607
SequenceReverse {'data': (10000, 1), 'axis': 0} 4281192.5 0.0282 0.0242
SequenceReverse {'data': (10000, 100), 'axis': 0} 4285172.5 1.9482 2.0684
Softmax {'data': (1024, 1024), 'label': (1024, 1024)} 380491.2188 0.1542 ---
Softmax {'data': (10000, 1), 'label': (10000, 1)} 378434.0625 0.1078 ---
Softmax {'data': (10000, 100), 'label': (10000, 100)} 382414.0625 0.2178 ---
SoftmaxActivation {'data': (1024, 1024)} 386705.5312 0.7064 2.5148
SoftmaxActivation {'data': (10000, 1)} 384648.375 0.0186 0.0558
SoftmaxActivation {'data': (10000, 100)} 388628.375 0.6738 2.4004
UpSampling {'args': (32, 3, 256, 256), 'scale': 2, 'sample_type': 'nearest'} 4290351.5 23.397 3.6678
UpSampling {'args': (32, 3, 10000, 1), 'scale': 4, 'sample_type': 'nearest'} 4301460.0 16.8458 2.4004
abs {'data': (1024, 1024)} 4194.3042 0.1058 0.6502
abs {'data': (10000, 1)} 40.0 0.0628 0.0586
abs {'data': (10000, 100)} 4000.0 0.1046 0.629
adam_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 3163247.0 0.4316 ---
adam_update {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 3161189.75 0.4754 ---
adam_update {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 3165169.75 2.5598 ---
add_n {'args': [(1024, 1024)]} 4139356.5 0.0234 ---
add_n {'args': [(10000, 1)]} 4139356.5 0.0166 ---
add_n {'args': [(10000, 10)]} 4139356.5 0.0162 ---
all_finite {'data': (1024, 1024)} 4283172.5 0.068 ---
all_finite {'data': (10000, 1)} 4283172.5 0.0128 ---
all_finite {'data': (10000, 100)} 4283172.5 0.0622 ---
amp_cast {'data': (1024, 1024), 'dtype': 'float32'} 4194.3042 0.3786 0.4678
amp_cast {'data': (10000, 1), 'dtype': 'int32'} 40.0 0.0494 0.0424
amp_cast {'data': (10000, 100), 'dtype': 'float32'} 4000.0 0.3944 0.4732
amp_multicast {'args': [(1024, 1024)], 'num_outputs': 1} 0.004 0.1572 0.162
amp_multicast {'args': [(10000, 1)], 'num_outputs': 1} 0.004 0.1478 0.134
arccos {'data': (1024, 1024)} 4194.3042 1.1568 0.8204
arccos {'data': (10000, 1)} 20.0 0.0188 0.0144
arccos {'data': (10000, 100)} 2000.0 0.9992 0.7378
arccosh {'data': (1024, 1024)} 2097.1521 0.7904 0.8444
arccosh {'data': (10000, 1)} 20.0 0.0168 0.015
arccosh {'data': (10000, 100)} 2000.0 0.6904 0.743
arcsin {'data': (1024, 1024)} 2097.1521 0.9504 0.7368
arcsin {'data': (10000, 1)} 20.0 0.0174 0.0148
arcsin {'data': (10000, 100)} 2000.0 1.0078 0.7752
arcsinh {'data': (1024, 1024)} 2097.1521 1.7784 0.4492
arcsinh {'data': (10000, 1)} 20.0 0.0232 0.0134
arcsinh {'data': (10000, 100)} 2000.0 1.6044 0.4174
arctan {'data': (1024, 1024)} 2097.1521 0.8594 0.0626
arctan {'data': (10000, 1)} 20.0 0.0178 0.0098
arctan {'data': (10000, 100)} 2000.0 0.8828 0.0512
arctanh {'data': (1024, 1024)} 4194.3042 1.2732 0.0628
arctanh {'data': (10000, 1)} 40.0 0.038 0.0084
arctanh {'data': (10000, 100)} 4000.0 1.2022 0.0754
argmax {'data': (1024, 1024), 'axis': 0} 214515.375 27.4112 ---
argmax {'data': (10000, 1), 'axis': 0} 214513.3281 0.189 ---
argmax {'data': (10000, 100), 'axis': 0} 214513.7188 24.7242 ---
argmax_channel {'data': (1024, 1024)} 2.048 0.646 ---
argmax_channel {'data': (10000, 1)} 20.0 0.1812 ---
argmax_channel {'data': (10000, 100)} 20.0 0.7264 ---
argmin {'data': (1024, 1024), 'axis': 0} 214519.6719 29.0478 ---
argmin {'data': (10000, 1), 'axis': 0} 214517.625 0.21 ---
argmin {'data': (10000, 100), 'axis': 0} 214518.0156 15.3104 ---
argsort {'data': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 220809.2812 63.8092 ---
argsort {'data': (10000, 1), 'axis': 0, 'dtype': 'int32'} 218752.125 1.3494 ---
argsort {'data': (10000, 100), 'axis': 0, 'dtype': 'float32'} 222732.125 76.6904 ---
batch_dot {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)} 134217.7344 88.9665 ---
batch_dot {'lhs': (32, 1000, 10), 'rhs': (32, 1000, 10), 'transpose_b': True} 128000.0 130.6619 ---
batch_dot {'lhs': (32, 1000, 1), 'rhs': (32, 100, 1000), 'transpose_a': True, 'transpose_b': True} 12.8 0.5856 ---
broadcast_add {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.058 0.1162
broadcast_add {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0228 0.0196
broadcast_add {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0184 0.0218
broadcast_div {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0658 0.1528
broadcast_div {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.021 0.0342
broadcast_div {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0188 0.0212
broadcast_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0514 ---
broadcast_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0224 ---
broadcast_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0176 ---
broadcast_greater {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0514 ---
broadcast_greater {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0184 ---
broadcast_greater {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0178 ---
broadcast_greater_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0494 ---
broadcast_greater_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0182 ---
broadcast_greater_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0176 ---
broadcast_hypot {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.4218 0.9334
broadcast_hypot {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0612 0.1666
broadcast_hypot {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.021 0.0276
broadcast_lesser {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0508 ---
broadcast_lesser {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0222 ---
broadcast_lesser {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0164 ---
broadcast_lesser_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0508 ---
broadcast_lesser_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.019 ---
broadcast_lesser_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0178 ---
broadcast_logical_and {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0902 ---
broadcast_logical_and {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0622 ---
broadcast_logical_and {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0108 ---
broadcast_logical_or {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.109 ---
broadcast_logical_or {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0228 ---
broadcast_logical_or {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.012 ---
broadcast_logical_xor {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.1376 ---
broadcast_logical_xor {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0238 ---
broadcast_logical_xor {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0128 ---
broadcast_maximum {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0694 0.1108
broadcast_maximum {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0184 0.022
broadcast_maximum {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0134 0.0158
broadcast_minimum {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0966 0.1106
broadcast_minimum {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.018 0.024
broadcast_minimum {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.013 0.0156
broadcast_minus {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0478 ---
broadcast_minus {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0188 ---
broadcast_minus {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0126 ---
broadcast_mod {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 2.011 0.169
broadcast_mod {'lhs': (10000, 10), 'rhs': (10000, 10)} 400.0 0.2096 0.028
broadcast_mod {'lhs': (10000, 1), 'rhs': (10000, 1)} 40.0 0.0344 0.021
broadcast_mul {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0532 0.0872
broadcast_mul {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0168 0.0196
broadcast_mul {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0246 0.0156
broadcast_not_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0458 ---
broadcast_not_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0216 ---
broadcast_not_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0128 ---
broadcast_plus {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0464 ---
broadcast_plus {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0154 ---
broadcast_plus {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0122 ---
broadcast_power {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 2.4362 6.0946
broadcast_power {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.381 0.8694
broadcast_power {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0612 0.0946
broadcast_sub {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0514 0.0648
broadcast_sub {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0154 0.0162
broadcast_sub {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0128 0.0134
cast {'data': (1024, 1024), 'dtype': 'float32'} 2097.1521 --- 0.568
cast {'data': (10000, 1), 'dtype': 'int32'} 20.0 --- 0.015
cast {'data': (10000, 100), 'dtype': 'float32'} 2000.0 --- 0.577
cast_storage {'data': (1024, 1024), 'stype': 'default'} 4289464.0 0.3226 ---
cast_storage {'data': (10000, 1), 'stype': 'csr'} 4287507.0 0.1238 ---
cast_storage {'data': (10000, 100), 'stype': 'row_sparse'} 4289547.0 0.6306 ---
cbrt {'data': (1024, 1024)} 2097.1521 1.429 0.06
cbrt {'data': (10000, 1)} 20.0 0.0248 0.0108
cbrt {'data': (10000, 100)} 2000.0 1.287 0.06
ceil {'data': (1024, 1024)} 2097.1521 0.1252 ---
ceil {'data': (10000, 1)} 20.0 0.0112 ---
ceil {'data': (10000, 100)} 2000.0 0.1316 ---
choose_element_0index {'data': (1024, 1024), 'index': (1, 1024), 'axis': 0} 2.048 0.013 ---
choose_element_0index {'data': (10000, 1), 'index': (1, 1), 'axis': 0} 0.002 0.011 ---
choose_element_0index {'data': (10000, 100), 'index': (1, 100), 'axis': 0} 0.2 0.0118 ---
clip {'data': (1024, 1024), 'a_min': 0.1, 'a_max': 0.9} 4295798.5 0.1544 0.2572
clip {'data': (10000, 1), 'a_min': 0.1, 'a_max': 0.9} 4293741.5 0.0122 0.0096
clip {'data': (10000, 100), 'a_min': 0.1, 'a_max': 0.9} 4297721.5 0.1334 0.2468
cos {'data': (1024, 1024)} 4194.3042 0.476 0.515
cos {'data': (10000, 1)} 40.0 0.0192 0.0134
cos {'data': (10000, 100)} 4000.0 0.4694 0.5726
cosh {'data': (1024, 1024)} 2097.1521 0.8858 1.5906
cosh {'data': (10000, 1)} 20.0 0.0302 0.0242
cosh {'data': (10000, 100)} 2000.0 0.8496 1.5106
ctc_loss {'data': (1024, 100, 100), 'label': (100, 100)} 4174102.25 56.0514 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 4302013.0 1.1744 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'int32'} 4304110.0 18.6308 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 4306207.0 1.1912 ---
degrees {'data': (1024, 1024)} 2097.1521 0.0408 0.037
degrees {'data': (10000, 1)} 20.0 0.0118 0.01
degrees {'data': (10000, 100)} 2000.0 0.111 0.0378
depth_to_space {'data': (1, 4, 2, 4), 'block_size': 2} 226950.9219 0.0202 ---
depth_to_space {'data': (10, 25, 10, 100), 'block_size': 5} 227950.8594 1.873 ---
dot {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 2.3134 5.078
dot {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} 2000.0 0.1722 0.2912
dot {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_a': True, 'transpose_b': True} 0.2 0.0896 0.0506
elemwise_add {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.1428 ---
elemwise_add {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0472 ---
elemwise_add {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0346 ---
elemwise_div {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.2986 ---
elemwise_div {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0194 ---
elemwise_div {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0128 ---
elemwise_mul {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.3588 ---
elemwise_mul {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0166 ---
elemwise_mul {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0128 ---
elemwise_sub {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0494 ---
elemwise_sub {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0178 ---
elemwise_sub {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0154 ---
erf {'data': (1024, 1024)} 4194.3042 0.929 0.7918
erf {'data': (10000, 1)} 40.0 0.0236 0.0158
erf {'data': (10000, 100)} 4000.0 0.859 0.7768
erfinv {'data': (1024, 1024)} 2097.1521 3.3352 0.8054
erfinv {'data': (10000, 1)} 20.0 0.0474 0.0186
erfinv {'data': (10000, 100)} 2000.0 4.061 0.8742
exp {'data': (1024, 1024)} 2097.1521 0.6932 ---
exp {'data': (10000, 1)} 20.0 0.0206 ---
exp {'data': (10000, 100)} 2000.0 0.6642 ---
expm1 {'data': (1024, 1024)} 2097.1521 1.1448 0.6482
expm1 {'data': (10000, 1)} 20.0 0.0236 0.043
expm1 {'data': (10000, 100)} 2000.0 1.0888 0.603
fill_element_0index {'lhs': (1024, 1024), 'mhs': (1024,), 'rhs': (1024,)} 4310401.5 0.0804 ---
fill_element_0index {'lhs': (10000, 1), 'mhs': (10000,), 'rhs': (10000,)} 4308344.0 0.0116 ---
fill_element_0index {'lhs': (10000, 100), 'mhs': (10000,), 'rhs': (10000,)} 4312324.0 0.0772 ---
fix {'data': (1024, 1024)} 2097.1521 0.3224 ---
fix {'data': (10000, 1)} 20.0 0.0166 ---
fix {'data': (10000, 100)} 2000.0 0.3128 ---
flatten {'data': (1024, 1024)} 2097.1521 0.0556 ---
flatten {'data': (10000, 1)} 20.0 0.0354 ---
flatten {'data': (10000, 100)} 2000.0 0.074 ---
flip {'data': (1024, 1024), 'axis': 0} 233742.3125 1.196 ---
flip {'data': (10000, 1), 'axis': 0} 231685.1562 0.0278 ---
flip {'data': (10000, 100), 'axis': 0} 235665.1562 1.1392 ---
floor {'data': (1024, 1024)} 2097.1521 0.1414 ---
floor {'data': (10000, 1)} 20.0 0.0154 ---
floor {'data': (10000, 100)} 2000.0 0.119 ---
ftml_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'd': (1024, 1024), 'v': (1024, 1024), 'z': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'wd': 0.1, 'rescale_grad': 0.4, 'clip_grad': -1.0} 3169461.25 1.565 ---
ftml_update {'weight': (10000, 1), 'grad': (10000, 1), 'd': (10000, 1), 'v': (10000, 1), 'z': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'wd': 0.5, 'rescale_grad': 0.4, 'clip_grad': -1.0} 3167404.0 0.0352 ---
ftml_update {'weight': (10000, 100), 'grad': (10000, 100), 'd': (10000, 100), 'v': (10000, 100), 'z': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'wd': 0.9, 'rescale_grad': 0.4, 'clip_grad': -1.0} 3171384.0 2.5164 ---
ftrl_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'z': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 3175675.5 2.263 ---
ftrl_update {'weight': (10000, 1), 'grad': (10000, 1), 'z': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 3173618.5 0.0584 ---
ftrl_update {'weight': (10000, 100), 'grad': (10000, 100), 'z': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 3177598.5 2.2906 ---
gamma {'data': (1024, 1024)} 4194.3042 3.8958 7.119
gamma {'data': (10000, 1)} 40.0 0.0518 0.0768
gamma {'data': (10000, 100)} 4000.0 3.696 6.9238
gammaln {'data': (1024, 1024)} 4194.3042 23.4108 3.4744
gammaln {'data': (10000, 1)} 20.0 0.2374 0.0414
gammaln {'data': (10000, 100)} 2000.0 22.1348 3.2474
gather_nd {'data': (1024, 1024), 'indices': (1, 1)} 246597.9844 0.0224 ---
gather_nd {'data': (10000, 1), 'indices': (1, 1)} 246597.9844 0.0208 ---
gather_nd {'data': (10000, 100), 'indices': (1, 1)} 246598.1875 0.0214 ---
hard_sigmoid {'data': (1024, 1024)} 392919.8125 0.0974 0.1074
hard_sigmoid {'data': (10000, 1)} 390862.6875 0.0126 0.0102
hard_sigmoid {'data': (10000, 100)} 394842.6875 0.097 0.1034
identity {'data': (1024, 1024)} 2097.1521 0.337 ---
identity {'data': (10000, 1)} 20.0 0.0186 ---
identity {'data': (10000, 100)} 2000.0 0.4072 ---
khatri_rao {'args': [(32, 32), (32, 32)]} 0.008 0.017 ---
khatri_rao {'args': [(64, 64), (64, 64)]} 0.008 0.0158 ---
lamb_update_phase1 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'wd': 0.1, 'rescale_grad': 0.4} 3181889.75 0.6372 ---
lamb_update_phase1 {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'wd': 0.5, 'rescale_grad': 0.4} 3179832.75 0.0384 ---
lamb_update_phase1 {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'wd': 0.9, 'rescale_grad': 0.4} 3183812.75 1.4582 ---
lamb_update_phase2 {'weight': (1024, 1024), 'g': (1024, 1024), 'r1': (1, 1024), 'r2': (1, 1024), 'lr': 0.1} 3188104.25 0.2096 ---
lamb_update_phase2 {'weight': (10000, 1), 'g': (10000, 1), 'r1': (1, 1), 'r2': (1, 1), 'lr': 0.5} 3186047.0 0.1938 ---
lamb_update_phase2 {'weight': (10000, 100), 'g': (10000, 100), 'r1': (1, 100), 'r2': (1, 100), 'lr': 0.9} 3190027.0 0.2496 ---
linalg_det {'A': (1024, 1024)} 4327026.5 7.4042 13.9204
linalg_extractdiag {'A': (1024, 1024)} 4324933.5 0.0152 0.044
linalg_extracttrian {'A': (1024, 1024)} 4329131.5 0.5524 0.5548
linalg_gelqf {'A': (1024, 1024)} 4340665.0 32.4998 ---
linalg_gemm {'A': (1024, 1024), 'B': (1024, 1024), 'C': (1024, 1024), 'axis': 0} 4342762.0 3.7104 6.553
linalg_gemm2 {'A': (1024, 1024), 'B': (1024, 1024), 'axis': 0} 4346956.5 2.1652 4.6222
linalg_inverse {'A': (1024, 1024)} 4353248.0 12.8776 4.9036
linalg_makediag {'A': (1024, 1024)} 10795699.0 393.1286 63.4552
linalg_maketrian {'A': (1024, 1035)} 8664804.0 1.3042 1.2714
linalg_potrf {'A': [[1, 0], [0, 1]]} 4324927.0 0.1056 ---
linalg_potrf {'A': [[2, -1, 0], [-1, 2, -1], [0, -1, 2]]} 4324927.0 0.0994 ---
linalg_potri {'A': (1024, 1024)} 8666948.0 9.8314 6.9932
linalg_slogdet {'A': (1024, 1024)} 8666950.0 6.4578 ---
linalg_sumlogdiag {'A': (1024, 1024)} 8664851.0 0.0534 0.4248
linalg_syrk {'A': (1024, 1024)} 8671143.0 2.8064 4.4366
linalg_trmm {'A': (1024, 1024), 'B': (1024, 1024)} 8675337.0 3.321 6.7742
linalg_trsm {'A': (1024, 1024), 'B': (1024, 1024)} 8679531.0 3.2282 6.8102
log {'data': (1024, 1024)} 2097.1521 1.0424 0.0922
log {'data': (10000, 1)} 20.0 0.0188 0.0106
log {'data': (10000, 100)} 2000.0 0.9218 0.0502
log10 {'data': (1024, 1024)} 2097.1521 1.139 0.055
log10 {'data': (10000, 1)} 20.0 0.0198 0.0108
log10 {'data': (10000, 100)} 2000.0 0.9732 0.0496
log1p {'data': (1024, 1024)} 2097.1521 1.0444 0.0556
log1p {'data': (10000, 1)} 20.0 0.0264 0.015
log1p {'data': (10000, 100)} 2000.0 0.9316 0.0512
log2 {'data': (1024, 1024)} 2097.1521 0.8226 0.0502
log2 {'data': (10000, 1)} 20.0 0.0242 0.0146
log2 {'data': (10000, 100)} 2000.0 0.8384 0.0494
log_softmax {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 395988.4062 3.0658 1.8808
log_softmax {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 394979.8125 0.702 0.319
log_softmax {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 402959.8125 5.6274 4.5846
logical_not {'data': (1024, 1024)} 2097.1521 0.1672 ---
logical_not {'data': (10000, 1)} 20.0 0.0164 ---
logical_not {'data': (10000, 100)} 2000.0 0.0398 ---
make_loss {'data': (1024, 1024)} 2097.1521 0.3752 ---
make_loss {'data': (10000, 1)} 20.0 0.0162 ---
make_loss {'data': (10000, 100)} 2000.0 0.432 ---
max {'data': (1024, 1024), 'axis': 0} 214467.8906 1.7804 1.47
max {'data': (10000, 1), 'axis': 0} 214465.8438 0.3502 0.0246
max {'data': (10000, 100), 'axis': 0} 214466.25 1.7192 1.424
max_axis {'data': (1024, 1024), 'axis': 0} 214472.1875 1.8448 ---
max_axis {'data': (10000, 1), 'axis': 0} 214470.1406 0.432 ---
max_axis {'data': (10000, 100), 'axis': 0} 214470.5469 1.8058 ---
mean {'data': (1024, 1024), 'axis': 0} 214476.4844 1.86 2.1958
mean {'data': (10000, 1), 'axis': 0} 214474.4375 0.4254 0.0362
mean {'data': (10000, 100), 'axis': 0} 214474.8438 1.9616 2.2664
min {'data': (1024, 1024), 'axis': 0} 214480.9844 1.7412 1.4844
min {'data': (10000, 1), 'axis': 0} 214478.9375 0.3492 0.0248
min {'data': (10000, 100), 'axis': 0} 214479.3438 1.7012 1.4112
min_axis {'data': (1024, 1024), 'axis': 0} 214485.2812 1.7702 ---
min_axis {'data': (10000, 1), 'axis': 0} 214483.2344 0.3404 ---
min_axis {'data': (10000, 100), 'axis': 0} 214483.6406 1.669 ---
moments {'data': (1024, 1024), 'axes': [0, 1]} 8677434.0 50.9642 ---
moments {'data': (10000, 1), 'axes': [0, 1]} 8677434.0 0.5358 ---
moments {'data': (10000, 100), 'axes': [0, 1]} 8677434.0 49.0232 ---
mp_nag_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 3194318.5 0.299 ---
mp_nag_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 3192261.25 0.0306 ---
mp_nag_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 3196241.25 0.2694 ---
mp_sgd_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 3200532.75 0.2008 ---
mp_sgd_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 3198475.5 0.0392 ---
mp_sgd_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 3202455.5 0.21 ---
mp_sgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 3206747.0 0.168 ---
mp_sgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 3204690.0 0.018 ---
mp_sgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 3208670.0 0.1958 ---
multi_all_finite {'args': [(1024, 1024)], 'num_arrays': 1} 4139356.5 0.0152 ---
multi_all_finite {'args': [(10000, 1)], 'num_arrays': 1} 4139356.5 0.0636 ---
multi_all_finite {'args': [(10000, 10)], 'num_arrays': 1} 4139356.5 0.0208 ---
multi_lars {'lrs': (1024, 1024), 'weights_sum_sq': (1024, 1024), 'grads_sum_sq': (1024, 1024), 'wds': (1024, 1024), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 4316615.5 1.699 ---
multi_lars {'lrs': (10000, 1), 'weights_sum_sq': (10000, 1), 'grads_sum_sq': (10000, 1), 'wds': (10000, 1), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 4314558.5 0.02 ---
multi_lars {'lrs': (10000, 100), 'weights_sum_sq': (10000, 100), 'grads_sum_sq': (10000, 100), 'wds': (10000, 100), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 4318538.5 3.399 ---
multi_mp_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0146 ---
multi_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0115 ---
multi_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0118 ---
multi_sum_sq {'args': [(1024, 1024)], 'num_arrays': 1} 4139356.5 0.013 ---
multi_sum_sq {'args': [(10000, 1)], 'num_arrays': 1} 4139356.5 0.0134 ---
multi_sum_sq {'args': [(10000, 10)], 'num_arrays': 1} 4139356.5 0.0132 ---
nag_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 3212961.25 0.3648 ---
nag_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 3210904.25 0.0214 ---
nag_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 3214884.25 0.2836 ---
nanprod {'data': (1024, 1024), 'axis': 0} 214489.5781 1.901 1.492
nanprod {'data': (10000, 1), 'axis': 0} 214487.5312 0.3438 0.0244
nanprod {'data': (10000, 100), 'axis': 0} 214487.9375 1.6876 1.4132
nansum {'data': (1024, 1024), 'axis': 0} 214493.875 1.91 1.7302
nansum {'data': (10000, 1), 'axis': 0} 214491.8281 0.3524 0.0266
nansum {'data': (10000, 100), 'axis': 0} 214492.2344 1.658 1.4848
negative {'data': (1024, 1024)} 2097.1521 0.0514 ---
negative {'data': (10000, 1)} 20.0 0.0146 ---
negative {'data': (10000, 100)} 2000.0 0.0476 ---
norm {'data': (1024, 1024), 'axis': 0} 214498.1719 2.0084 1.44
norm {'data': (10000, 1), 'axis': 0} 214496.125 0.378 0.025
norm {'data': (10000, 100), 'axis': 0} 214496.5312 1.7686 1.386
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'float32'} --- 0.0048 ---
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'int32'} --- 0.005 ---
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'float32'} --- 0.0048 ---
ones_like {'data': (1024, 1024)} 2097.1521 0.0268 ---
ones_like {'data': (10000, 1)} 20.0 0.0734 ---
ones_like {'data': (10000, 100)} 2000.0 0.026 ---
pick {'data': (1024, 1024), 'index': (1, 1024), 'axis': 0} 246604.3438 0.016 0.2774
pick {'data': (10000, 1), 'index': (1, 1), 'axis': 0} 246602.2969 0.0132 0.0158
pick {'data': (10000, 100), 'index': (1, 100), 'axis': 0} 246602.6875 0.0126 0.3116
preloaded_multi_mp_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 5x5 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'args5': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0119 ---
preloaded_multi_mp_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 1 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0111 ---
preloaded_multi_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 1 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0114 ---
prod {'data': (1024, 1024), 'axis': 0} 214502.4688 1.8732 1.4706
prod {'data': (10000, 1), 'axis': 0} 214500.4375 0.33 0.0254
prod {'data': (10000, 100), 'axis': 0} 214500.8281 1.6508 1.3866
radians {'data': (1024, 1024)} 4194.3042 0.05 0.0374
radians {'data': (10000, 1)} 20.0 0.0162 0.0108
radians {'data': (10000, 100)} 2000.0 0.047 0.0356
random_exponential {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 2.7968 ---
random_exponential {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0424 ---
random_exponential {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 2.6212 ---
random_gamma {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 7.1436 ---
random_gamma {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0842 ---
random_gamma {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 8.557 ---
random_generalized_negative_binomial {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 9.6646 ---
random_generalized_negative_binomial {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.113 ---
random_generalized_negative_binomial {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 9.2784 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 8.717 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0988 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (10000, 100), 'dtype': 'float64'} 4000.0 8.1354 ---
random_normal {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 4.206 ---
random_normal {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0464 ---
random_normal {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 3.9712 ---
random_pdf_dirichlet {'sample': (2,), 'alpha': [0.0, 2.5]} 0.002 0.0126 ---
random_pdf_exponential {'sample': (2,), 'lam': [1.0, 8.5]} 0.004 0.0118 ---
random_pdf_gamma {'sample': (2,), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]} 0.004 0.0128 ---
random_pdf_generalized_negative_binomial {'sample': (2,), 'mu': [2.0, 2.5], 'alpha': [0.0, 2.5]} 0.004 0.0128 ---
random_pdf_negative_binomial {'sample': (2,), 'k': [20, 49], 'p': [0.4, 0.77]} 0.004 0.0128 ---
random_pdf_normal {'sample': (2,), 'mu': [2.0, 2.5], 'sigma': [1.0, 3.7]} 0.004 0.0126 ---
random_pdf_poisson {'sample': (2,), 'lam': [1.0, 8.5]} 0.004 0.0118 ---
random_pdf_uniform {'sample': (2,), 'low': [0.0, 2.5], 'high': [1.0, 3.7]} 0.004 0.0134 ---
random_poisson {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 2.4058 ---
random_poisson {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0372 ---
random_poisson {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 2.278 ---
random_randint {'low': 0, 'high': 5, 'shape': (1024, 1024), 'dtype': 'int32'} 2097.1521 1.7014 ---
random_randint {'low': 0, 'high': 5, 'shape': (10000, 1), 'dtype': 'int64'} 40.0 0.0526 ---
random_randint {'low': 0, 'high': 5, 'shape': (10000, 100), 'dtype': 'int32'} 2000.0 2.5374 ---
random_uniform {'low': 0, 'high': 5, 'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 1.2788 ---
random_uniform {'low': 0, 'high': 5, 'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0228 ---
random_uniform {'low': 0, 'high': 5, 'shape': (10000, 100), 'dtype': 'float64'} 8000.0 1.1772 ---
ravel_multi_index {'data': (2, 1024), 'shape': (1024, 1024)} 246608.6406 0.0124 ---
ravel_multi_index {'data': (2, 1024), 'shape': (10000, 1)} 246610.6875 0.0122 ---
ravel_multi_index {'data': (2, 1024), 'shape': (10000, 100)} 246612.7344 0.0124 ---
rcbrt {'data': (1024, 1024)} 2097.1521 1.3436 1.3696
rcbrt {'data': (10000, 1)} 20.0 0.0232 0.0222
rcbrt {'data': (10000, 100)} 4000.0 1.3468 1.634
reciprocal {'data': (1024, 1024)} 2097.1521 0.048 0.051
reciprocal {'data': (10000, 1)} 20.0 0.0114 0.0098
reciprocal {'data': (10000, 100)} 2000.0 0.0428 0.0498
relu {'data': (1024, 1024)} 2097.1521 0.0882 0.0894
relu {'data': (10000, 1)} 20.0 0.012 0.0108
relu {'data': (10000, 100)} 2000.0 0.0684 0.0862
reset_arrays {'args': [(1024, 1024)], 'num_arrays': 1} --- 0.0044 ---
reset_arrays {'args': [(10000, 1)], 'num_arrays': 1} --- 0.0034 ---
reset_arrays {'args': [(10000, 10)], 'num_arrays': 1} --- 0.0032 ---
reshape_like {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 1.642 ---
reshape_like {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0638 ---
reshape_like {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0176 ---
rint {'data': (1024, 1024)} 2097.1521 0.2874 ---
rint {'data': (10000, 1)} 20.0 0.0164 ---
rint {'data': (10000, 100)} 2000.0 0.2702 ---
rmsprop_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4} 3219175.75 0.3986 ---
rmsprop_update {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4} 3217118.5 0.026 ---
rmsprop_update {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4} 3221098.5 0.8536 ---
rmspropalex_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'g': (1024, 1024), 'delta': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'gamma2': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4} 3225390.0 0.8296 ---
rmspropalex_update {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'g': (10000, 1), 'delta': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'gamma2': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4} 3223332.75 0.379 ---
rmspropalex_update {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'g': (10000, 100), 'delta': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'gamma2': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4} 3227312.75 1.2188 ---
round {'data': (1024, 1024)} 2097.1521 0.3074 ---
round {'data': (10000, 1)} 20.0 0.016 ---
round {'data': (10000, 100)} 2000.0 0.2922 ---
rsqrt {'data': (1024, 1024)} 2097.1521 0.6496 0.6744
rsqrt {'data': (10000, 1)} 20.0 0.0182 0.0164
rsqrt {'data': (10000, 100)} 2000.0 0.6212 0.6388
sample_exponential {'lam': [1.0, 8.5], 'shape': (1024, 1024), 'dtype': 'float16'} 2097.1521 5.5818 ---
sample_exponential {'lam': [1.0, 8.5], 'shape': (10000, 1), 'dtype': 'float32'} 40.0 0.0626 ---
sample_exponential {'lam': [1.0, 8.5], 'shape': (10000, 100), 'dtype': 'float64'} 16000.0 5.1922 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'beta': [1.0, 0.7]} 2097.1521 18.2952 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'beta': [1.0, 0.7]} 40.0 0.1946 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'beta': [1.0, 0.7]} 8000.0 21.4168 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'alpha': [0.0, 2.5]} 4194.3042 31.4892 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'alpha': [0.0, 2.5]} 40.0 0.339 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'alpha': [0.0, 2.5]} 8000.0 29.4162 ---
sample_multinomial {'data': (32, 32), 'shape': (1024, 1024), 'dtype': 'float16'} 70438.9609 169.9096 ---
sample_multinomial {'data': (32, 32), 'shape': (10000, 1), 'dtype': 'float32'} 38164.5312 1.603 ---
sample_multinomial {'data': (32, 32), 'shape': (10000, 100), 'dtype': 'float64'} 293524.5312 165.1896 ---
sample_negative_binomial {'k': [20, 49], 'shape': (1024, 1024), 'dtype': 'float16', 'p': [0.4, 0.77]} 169718.8281 373.159 ---
sample_negative_binomial {'k': [20, 49], 'shape': (10000, 1), 'dtype': 'float32', 'p': [0.4, 0.77]} 169798.8281 3.4792 ---
sample_negative_binomial {'k': [20, 49], 'shape': (10000, 100), 'dtype': 'float64', 'p': [0.4, 0.77]} 185758.8281 353.4468 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'sigma': [1.0, 3.7]} 184050.2969 8.488 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'sigma': [1.0, 3.7]} 182033.1406 0.0724 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'sigma': [1.0, 3.7]} 197993.1406 8.0432 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (1024, 1024), 'dtype': 'float16'} 196284.5938 10.3098 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (10000, 1), 'dtype': 'float32'} 194227.4375 0.132 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (10000, 100), 'dtype': 'float64'} 210227.4375 10.787 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'high': [1.0, 3.7]} 208518.8906 2.666 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'high': [1.0, 3.7]} 206501.75 0.032 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'high': [1.0, 3.7]} 222461.75 2.4786 ---
sgd_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 3231604.25 0.1892 ---
sgd_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 3229547.0 0.0742 ---
sgd_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 3233527.0 0.1926 ---
sgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 3237818.5 0.167 ---
sgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 3235761.5 0.0148 ---
sgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 3239741.5 0.1484 ---
shape_array {'data': (1024, 1024)} 0.008 0.006 ---
shape_array {'data': (10000, 1)} 0.008 0.0052 ---
shape_array {'data': (10000, 100)} 0.008 0.0054 ---
shuffle {'data': (1024, 1024)} 2097.1521 0.9654 ---
shuffle {'data': (10000, 1)} 20.0 0.5262 ---
shuffle {'data': (10000, 100)} 2000.0 1.6004 ---
sigmoid {'data': (1024, 1024)} 2097.1521 0.6882 0.0508
sigmoid {'data': (10000, 1)} 20.0 0.018 0.0114
sigmoid {'data': (10000, 100)} 2000.0 0.665 0.0492
sign {'data': (1024, 1024)} 4194.3042 0.2434 0.04
sign {'data': (10000, 1)} 40.0 0.0174 0.0098
sign {'data': (10000, 100)} 2000.0 0.2352 0.043
signsgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 3244033.0 0.1864 ---
signsgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 3241975.75 0.015 ---
signsgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 3245955.75 0.1788 ---
signum_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 3250247.25 0.3112 ---
signum_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 3248190.0 0.2088 ---
signum_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 3252170.0 0.325 ---
sin {'data': (1024, 1024)} 2097.1521 0.4714 0.4946
sin {'data': (10000, 1)} 20.0 0.015 0.0134
sin {'data': (10000, 100)} 2000.0 0.4596 0.4754
sinh {'data': (1024, 1024)} 2097.1521 1.5124 0.9122
sinh {'data': (10000, 1)} 20.0 0.0284 0.0176
sinh {'data': (10000, 100)} 2000.0 1.4062 0.8826
size_array {'data': (1024, 1024)} 0.004 0.006 ---
size_array {'data': (10000, 1)} 0.004 0.0056 ---
size_array {'data': (10000, 100)} 0.004 0.006 ---
slice {'data': (1024, 1024), 'begin': 0, 'end': 1} 246616.8281 0.0232 0.026
slice {'data': (10000, 1), 'begin': 0, 'end': 1} 246614.7812 0.0324 0.0126
slice {'data': (10000, 100), 'begin': 0, 'end': 1} 246615.1875 0.0314 0.0268
slice_axis {'data': (1024, 1024), 'axis': 0, 'begin': 0, 'end': 1} 246621.125 0.0126 0.0244
slice_axis {'data': (10000, 1), 'axis': 0, 'begin': 0, 'end': 1} 246619.0781 0.0108 0.0136
slice_axis {'data': (10000, 100), 'axis': 0, 'begin': 0, 'end': 1} 246619.4844 0.0112 0.0254
slice_like {'data': (1024, 1024), 'shape_like': (100, 100), 'axes': [0, 1]} 246679.2812 0.0166 0.0324
slice_like {'data': (10000, 1), 'shape_like': (10, 1), 'axes': [0, 1]} 246659.3125 0.0146 0.0142
slice_like {'data': (10000, 100), 'shape_like': (100, 10), 'axes': [0, 1]} 246663.2969 0.015 0.0292
smooth_l1 {'data': (1024, 1024)} 4139433.5 0.2902 0.29
smooth_l1 {'data': (10000, 1)} 4137376.5 0.0294 0.011
smooth_l1 {'data': (10000, 100)} 4141356.5 0.3582 0.2798
softmax {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 402105.5625 2.1964 0.8692
softmax {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 401096.9688 0.4202 0.033
softmax {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 409076.9688 5.5678 1.7112
softmax_cross_entropy {'data': (1024, 1024), 'label': (1024,)} 4139356.5 0.8222 ---
softmin {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 408222.7188 2.0126 0.8584
softmin {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 407214.125 0.3604 0.0294
softmin {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 415194.125 5.5684 1.6654
softsign {'data': (1024, 1024)} 2097.1521 0.0476 0.0474
softsign {'data': (10000, 1)} 20.0 0.011 0.009
softsign {'data': (10000, 100)} 2000.0 0.046 0.0442
sort {'data': (1024, 1024), 'axis': 0} 231217.8906 67.7188 ---
sort {'data': (10000, 1), 'axis': 0} 224986.4219 1.657 ---
sort {'data': (10000, 100), 'axis': 0} 232946.4219 80.4938 ---
space_to_depth {'data': (1, 4, 2, 4), 'block_size': 2} 233665.3438 0.0206 ---
space_to_depth {'data': (10, 25, 10, 100), 'block_size': 5} 234665.2812 0.918 ---
sqrt {'data': (1024, 1024)} 2097.1521 0.8226 0.051
sqrt {'data': (10000, 1)} 20.0 0.0164 0.0084
sqrt {'data': (10000, 100)} 2000.0 0.7366 0.0484
square {'data': (1024, 1024)} 2097.1521 0.0382 0.046
square {'data': (10000, 1)} 20.0 0.0094 0.0108
square {'data': (10000, 100)} 2000.0 0.0374 0.1336
squeeze {'data': (1, 1024, 1024), 'axis': 0} 4322830.0 0.351 0.591
squeeze {'data': (32, 1, 256, 256), 'axis': 1} 4329121.5 13.4974 4.5714
stop_gradient {'data': (1024, 1024)} 2097.1521 0.3172 ---
stop_gradient {'data': (10000, 1)} 20.0 0.0196 ---
stop_gradient {'data': (10000, 100)} 2000.0 0.4142 ---
sum {'data': (1024, 1024), 'axis': 0} 214506.7656 1.898 1.421
sum {'data': (10000, 1), 'axis': 0} 214504.7344 0.3772 0.0212
sum {'data': (10000, 100), 'axis': 0} 214505.125 1.6958 1.358
sum_axis {'data': (1024, 1024), 'axis': 0} 214511.0625 1.873 ---
sum_axis {'data': (10000, 1), 'axis': 0} 214509.0312 0.3752 ---
sum_axis {'data': (10000, 100), 'axis': 0} 214509.4219 1.7654 ---
swapaxes {'data': (1024, 1024), 'dim1': 0, 'dim2': 1} 240456.7344 3.0176 ---
swapaxes {'data': (10000, 1), 'dim1': 0, 'dim2': 1} 238399.5938 0.0414 ---
swapaxes {'data': (10000, 100), 'dim1': 0, 'dim2': 1} 242379.5938 2.544 ---
take {'a': (1024, 1024), 'indices': (1, 1), 'axis': 0} 246667.4375 0.0134 0.03
tan {'data': (1024, 1024)} 2097.1521 1.1328 0.0546
tan {'data': (10000, 1)} 20.0 0.0268 0.014
tan {'data': (10000, 100)} 2000.0 1.0948 0.051
tanh {'data': (1024, 1024)} 4194.3042 1.3814 0.059
tanh {'data': (10000, 1)} 40.0 0.0276 0.0096
tanh {'data': (10000, 100)} 4000.0 1.3408 0.052
topk {'data': (1024, 1024), 'axis': 0, 'k': 1, 'dtype': 'float32'} 226952.5781 59.3784 ---
topk {'data': (10000, 1), 'axis': 0, 'k': 1, 'dtype': 'int32'} 226950.5312 0.5862 ---
topk {'data': (10000, 100), 'axis': 0, 'k': 1, 'dtype': 'float32'} 226950.9219 54.2758 ---
transpose {'data': (1024, 1024), 'axes': [0, 1]} 246671.0469 0.0488 ---
transpose {'data': (10000, 1), 'axes': [0, 1]} 244613.8906 0.0418 ---
transpose {'data': (10000, 100), 'axes': [0, 1]} 248593.8906 0.0572 ---
trunc {'data': (1024, 1024)} 2097.1521 0.2644 ---
trunc {'data': (10000, 1)} 20.0 0.0166 ---
trunc {'data': (10000, 100)} 2000.0 0.2662 ---
where {'condition': (1024,), 'x': (1024, 1024), 'y': (1024, 1024)} 252956.8438 0.5552 1.1214
where {'condition': (10000,), 'x': (10000, 1), 'y': (10000, 1)} 250899.7031 0.0242 0.0222
where {'condition': (10000,), 'x': (10000, 100), 'y': (10000, 100)} 254879.7031 0.5366 1.0564
zeros_like {'data': (1024, 1024)} 2097.1521 0.0288 ---
zeros_like {'data': (10000, 1)} 20.0 0.0142 ---
zeros_like {'data': (10000, 100)} 2000.0 0.0784 ---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment