Skip to content

Instantly share code, notes, and snippets.

@connorgoggins
Created February 17, 2020 23:24
Show Gist options
  • Save connorgoggins/8b0563eaf98119980a5d36b7c61d796e to your computer and use it in GitHub Desktop.
Save connorgoggins/8b0563eaf98119980a5d36b7c61d796e to your computer and use it in GitHub Desktop.

Runtime Features

  1. BLAS_APPLE : ✖ BLAS_APPLE
  2. BLAS_ATLAS : ✖ BLAS_ATLAS
  3. BLAS_MKL : ✖ BLAS_MKL
  4. BLAS_OPEN : ✔ BLAS_OPEN
  5. CAFFE : ✖ CAFFE
  6. CPU_AVX : ✔ CPU_AVX
  7. CPU_AVX2 : ✖ CPU_AVX2
  8. CPU_SSE : ✔ CPU_SSE
  9. CPU_SSE2 : ✔ CPU_SSE2
  10. CPU_SSE3 : ✔ CPU_SSE3
  11. CPU_SSE4A : ✖ CPU_SSE4A
  12. CPU_SSE4_1 : ✔ CPU_SSE4_1
  13. CPU_SSE4_2 : ✔ CPU_SSE4_2
  14. CUDA : ✖ CUDA
  15. CUDA_RTC : ✖ CUDA_RTC
  16. CUDNN : ✖ CUDNN
  17. CXX14 : ✖ CXX14
  18. DEBUG : ✖ DEBUG
  19. DIST_KVSTORE : ✖ DIST_KVSTORE
  20. F16C : ✔ F16C
  21. INT64_TENSOR_SIZE : ✔ INT64_TENSOR_SIZE
  22. JEMALLOC : ✖ JEMALLOC
  23. LAPACK : ✔ LAPACK
  24. MKLDNN : ✖ MKLDNN
  25. NCCL : ✖ NCCL
  26. OPENCV : ✖ OPENCV
  27. OPENMP : ✔ OPENMP
  28. PROFILER : ✖ PROFILER
  29. SIGNAL_HANDLER : ✖ SIGNAL_HANDLER
  30. SSE : ✖ SSE
  31. TENSORRT : ✖ TENSORRT
  32. TVM_OP : ✖ TVM_OP

Benchmark Results

Operator Inputs Max Mem Usage (Storage) (Bytes) Avg Forward Time (ms) Avg. Backward Time (ms)
Activation {'data': (1024, 1024), 'act_type': 'relu'} 366488.4062 0.0612 0.1002
Activation {'data': (1024, 1024), 'act_type': 'sigmoid'} 368585.5625 0.6748 0.056
Activation {'data': (1024, 1024), 'act_type': 'softrelu'} 370682.7188 1.7744 0.9588
Activation {'data': (1024, 1024), 'act_type': 'softsign'} 372779.8438 0.0532 0.0534
Activation {'data': (1024, 1024), 'act_type': 'tanh'} 374877.0 1.4206 0.0676
BatchNorm {'data': (32, 3, 256, 256), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)} 353579.875 14.6506 8.1104
BatchNorm {'data': (32, 3, 10000, 10), 'gamma': (3,), 'beta': (3,), 'moving_mean': (3,), 'moving_var': (3,)} 379396.9688 17.5304 13.6382
BilinearSampler {'data': (32, 2, 256, 256), 'grid': (32, 2, 256, 256)} 16777.2168 100.6782 119.2164
BlockGrad {'data': (1024, 1024)} 2097.1521 0.3224 ---
BlockGrad {'data': (10000, 1)} 20.0 0.011 ---
BlockGrad {'data': (10000, 100)} 2000.0 0.3918 ---
CTCLoss {'data': (1024, 100, 100), 'label': (100, 100)} 2663714.0 51.9336 ---
Convolution {'data': (32, 3, 64, 64), 'weight': (64, 3, 3, 3), 'bias': (64,), 'kernel': (3, 3), 'stride': (1, 1), 'dilate': (1, 1), 'pad': (0, 0), 'num_filter': 64, 'layout': 'NCHW'} 1341985.625 7.8122 13.3234
Custom {'args': [(1024, 1024)], 'op_type': 'CustomAddOne'} 2787046.25 0.0055 0.0072
Custom {'args': [(10000, 1)], 'op_type': 'CustomAddOne'} 2787046.25 0.005 0.0062
Custom {'args': [(10000, 10)], 'op_type': 'CustomAddOne'} 2787046.25 0.0056 0.0058
Deconvolution {'data': (32, 3, 64, 64), 'weight': (3, 64, 3, 3), 'bias': (64,), 'kernel': (3, 3), 'stride': (1, 1), 'dilate': (1, 1), 'pad': (0, 0), 'num_filter': 64, 'no_bias': False, 'layout': 'NCHW'} 2661075.75 306.0796 108.6012
Dropout {'data': (32, 3, 256, 256), 'p': 0.5, 'mode': 'always'} 327814.0312 5.224 0.984
Dropout {'data': (10000, 10), 'p': 0.5, 'mode': 'always'} 315831.125 0.0802 0.02
ElementWiseSum {'args': (1024, 1024)} 2097.1521 0.0872 ---
FullyConnected {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 196608), 'bias': (64,), 'flatten': True} 261757.8438 1.9692 3.9592
FullyConnected {'data': (32, 3, 256, 256), 'num_hidden': 64, 'weight': (64, 256), 'bias': (64,), 'flatten': False} 268045.1875 9.1298 7.3656
GridGenerator {'data': (32, 2, 256, 256), 'transform_type': 'warp', 'target_shape': (256, 6)} 8650.752 32.7878 28.4014
GridGenerator {'data': (256, 6), 'transform_type': 'affine', 'target_shape': (256, 6)} 1582.08 0.4452 0.3376
LeakyReLU {'data': (1024, 1024), 'act_type': 'leaky'} 379071.3125 0.2542 0.0532
LeakyReLU {'data': (1024, 1024), 'act_type': 'elu'} 381168.4688 0.6718 0.3506
LeakyReLU {'data': (1024, 1024), 'act_type': 'selu'} 383265.625 0.6952 0.267
LeakyReLU {'data': (1024, 1024), 'act_type': 'gelu'} 385362.7812 0.7508 1.7076
MakeLoss {'data': (1024, 1024)} 2649525.5 0.0254 0.0166
MakeLoss {'data': (10000, 1)} 2647468.25 0.017 0.0678
MakeLoss {'data': (10000, 100)} 2651448.25 0.0372 0.0178
Pooling {'data': (32, 3, 64, 64), 'kernel': (3, 3), 'pool_type': 'sum', 'global_pool': 1, 'stride': (1, 1), 'pad': (0, 0)} 499384.1562 0.4854 0.266
ROIPooling {'data': (32, 3, 64, 64), 'rois': (32, 5), 'pooled_size': (2, 2), 'spatial_scale': 0.5} 499391.625 0.1358 0.0284
SequenceLast {'data': (1024, 1024), 'axis': 0} 2787052.5 0.0212 ---
SequenceLast {'data': (10000, 1), 'axis': 0} 2787050.5 0.015 ---
SequenceLast {'data': (10000, 100), 'axis': 0} 2787050.75 0.0172 ---
SequenceMask {'data': (1024, 1024), 'axis': 0} 2793342.0 0.236 0.2312
SequenceMask {'data': (10000, 1), 'axis': 0} 2791285.0 0.0182 0.0144
SequenceMask {'data': (10000, 100), 'axis': 0} 2795265.0 1.1988 0.2206
SequenceReverse {'data': (1024, 1024), 'axis': 0} 2799556.25 2.413 2.5898
SequenceReverse {'data': (10000, 1), 'axis': 0} 2797499.25 0.114 0.027
SequenceReverse {'data': (10000, 100), 'axis': 0} 2801479.25 2.1198 2.227
Softmax {'data': (1024, 1024), 'label': (1024, 1024)} 389557.0625 0.7084 ---
Softmax {'data': (10000, 1), 'label': (10000, 1)} 387499.9375 0.0212 ---
Softmax {'data': (10000, 100), 'label': (10000, 100)} 391479.9375 0.677 ---
SoftmaxActivation {'data': (1024, 1024)} 395771.375 0.722 2.5686
SoftmaxActivation {'data': (10000, 1)} 393714.2188 0.018 0.0458
SoftmaxActivation {'data': (10000, 100)} 397694.2188 0.6598 2.4262
UpSampling {'args': (32, 3, 256, 256), 'scale': 2, 'sample_type': 'nearest'} 2806658.0 20.4806 3.576
UpSampling {'args': (32, 3, 10000, 1), 'scale': 4, 'sample_type': 'nearest'} 2817766.25 15.3894 2.4824
abs {'data': (1024, 1024)} 2097.1521 0.0412 0.4446
abs {'data': (10000, 1)} 20.0 0.0706 0.0114
abs {'data': (10000, 100)} 2000.0 0.0396 0.3768
adam_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 1332532.125 0.4328 ---
adam_update {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 1330475.0 0.071 ---
adam_update {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 1334455.0 0.7548 ---
add_n {'args': [(1024, 1024)]} 2655663.0 0.0086 ---
add_n {'args': [(10000, 1)]} 2655663.0 0.0082 ---
add_n {'args': [(10000, 10)]} 2655663.0 0.021 ---
all_finite {'data': (1024, 1024)} 2799479.25 0.0584 ---
all_finite {'data': (10000, 1)} 2799479.25 0.0154 ---
all_finite {'data': (10000, 100)} 2799479.25 0.058 ---
amp_cast {'data': (1024, 1024), 'dtype': 'float32'} 2097.1521 0.3822 0.7264
amp_cast {'data': (10000, 1), 'dtype': 'int32'} 20.0 0.0178 0.0156
amp_cast {'data': (10000, 100), 'dtype': 'float32'} 2000.0 0.4632 0.718
amp_multicast {'args': [(1024, 1024)], 'num_outputs': 1} 0.004 0.1494 0.2182
amp_multicast {'args': [(10000, 1)], 'num_outputs': 1} 0.004 0.2058 0.1646
arccos {'data': (1024, 1024)} 2097.1521 0.7754 0.5562
arccos {'data': (10000, 1)} 20.0 0.019 0.0138
arccos {'data': (10000, 100)} 2000.0 0.7388 0.473
arccosh {'data': (1024, 1024)} 2097.1521 0.6592 0.679
arccosh {'data': (10000, 1)} 20.0 0.0184 0.0164
arccosh {'data': (10000, 100)} 2000.0 0.6574 0.6262
arcsin {'data': (1024, 1024)} 2097.1521 0.7198 0.5258
arcsin {'data': (10000, 1)} 20.0 0.0172 0.0136
arcsin {'data': (10000, 100)} 2000.0 0.7036 0.478
arcsinh {'data': (1024, 1024)} 2097.1521 1.506 0.4308
arcsinh {'data': (10000, 1)} 20.0 0.0236 0.0144
arcsinh {'data': (10000, 100)} 2000.0 1.393 0.412
arctan {'data': (1024, 1024)} 2097.1521 1.1018 0.0558
arctan {'data': (10000, 1)} 20.0 0.0234 0.0142
arctan {'data': (10000, 100)} 2000.0 0.8488 0.053
arctanh {'data': (1024, 1024)} 2097.1521 1.3018 0.059
arctanh {'data': (10000, 1)} 20.0 0.026 0.0148
arctanh {'data': (10000, 100)} 2000.0 1.2358 0.0554
argmax {'data': (1024, 1024), 'axis': 0} 225478.4375 28.4932 ---
argmax {'data': (10000, 1), 'axis': 0} 225476.3906 0.2746 ---
argmax {'data': (10000, 100), 'axis': 0} 225476.7969 20.8574 ---
argmax_channel {'data': (1024, 1024)} 2.048 0.6618 ---
argmax_channel {'data': (10000, 1)} 20.0 0.2182 ---
argmax_channel {'data': (10000, 100)} 20.0 0.7302 ---
argmin {'data': (1024, 1024), 'axis': 0} 225482.7344 27.1264 ---
argmin {'data': (10000, 1), 'axis': 0} 225480.6875 0.1894 ---
argmin {'data': (10000, 100), 'axis': 0} 225481.0938 22.1958 ---
argsort {'data': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 231772.3438 65.4412 ---
argsort {'data': (10000, 1), 'axis': 0, 'dtype': 'int32'} 229715.2031 1.3666 ---
argsort {'data': (10000, 100), 'axis': 0, 'dtype': 'float32'} 233695.2031 61.1342 ---
batch_dot {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)} 67108.8672 67.6699 ---
batch_dot {'lhs': (32, 1000, 10), 'rhs': (32, 1000, 10), 'transpose_b': True} 64000.0 28.2041 ---
batch_dot {'lhs': (32, 1000, 1), 'rhs': (32, 100, 1000), 'transpose_a': True, 'transpose_b': True} 6.4 0.5757 ---
broadcast_add {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0496 0.0616
broadcast_add {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0208 0.016
broadcast_add {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0132 0.0142
broadcast_div {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0484 0.0966
broadcast_div {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.029 0.023
broadcast_div {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0138 0.017
broadcast_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0456 ---
broadcast_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0162 ---
broadcast_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.013 ---
broadcast_greater {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0452 ---
broadcast_greater {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.016 ---
broadcast_greater {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0134 ---
broadcast_greater_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0456 ---
broadcast_greater_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.071 ---
broadcast_greater_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0116 ---
broadcast_hypot {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.4212 0.935
broadcast_hypot {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.052 0.1026
broadcast_hypot {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.016 0.0228
broadcast_lesser {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0448 ---
broadcast_lesser {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.1598 ---
broadcast_lesser {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0112 ---
broadcast_lesser_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0444 ---
broadcast_lesser_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0134 ---
broadcast_lesser_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0112 ---
broadcast_logical_and {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0472 ---
broadcast_logical_and {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0152 ---
broadcast_logical_and {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.012 ---
broadcast_logical_or {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.049 ---
broadcast_logical_or {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0148 ---
broadcast_logical_or {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0118 ---
broadcast_logical_xor {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.1372 ---
broadcast_logical_xor {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.023 ---
broadcast_logical_xor {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0124 ---
broadcast_maximum {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.066 0.1162
broadcast_maximum {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0184 0.0206
broadcast_maximum {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0132 0.0166
broadcast_minimum {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0658 0.1048
broadcast_minimum {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0184 0.0218
broadcast_minimum {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0136 0.0224
broadcast_minus {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.048 ---
broadcast_minus {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0188 ---
broadcast_minus {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0134 ---
broadcast_mod {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 2.0048 0.1652
broadcast_mod {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.2156 0.0288
broadcast_mod {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0306 0.0182
broadcast_mul {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.048 0.0916
broadcast_mul {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.015 0.0188
broadcast_mul {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.083 0.0162
broadcast_not_equal {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0474 ---
broadcast_not_equal {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0176 ---
broadcast_not_equal {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0122 ---
broadcast_plus {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0472 ---
broadcast_plus {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0152 ---
broadcast_plus {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0124 ---
broadcast_power {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 2.452 5.6488
broadcast_power {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.2396 0.5526
broadcast_power {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0362 0.0694
broadcast_sub {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.049 0.0602
broadcast_sub {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0158 0.015
broadcast_sub {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0132 0.0134
cast {'data': (1024, 1024), 'dtype': 'float32'} 2097.1521 --- 0.6986
cast {'data': (10000, 1), 'dtype': 'int32'} 20.0 --- 0.016
cast {'data': (10000, 100), 'dtype': 'float32'} 2000.0 --- 0.6988
cast_storage {'data': (1024, 1024), 'stype': 'default'} 2805770.75 0.3338 ---
cast_storage {'data': (10000, 1), 'stype': 'csr'} 2803813.5 1.7382 ---
cast_storage {'data': (10000, 100), 'stype': 'row_sparse'} 2805853.5 0.8246 ---
cbrt {'data': (1024, 1024)} 2097.1521 1.3416 0.0564
cbrt {'data': (10000, 1)} 20.0 0.0826 0.0108
cbrt {'data': (10000, 100)} 2000.0 1.4136 0.0636
ceil {'data': (1024, 1024)} 2097.1521 0.158 ---
ceil {'data': (10000, 1)} 20.0 0.0106 ---
ceil {'data': (10000, 100)} 2000.0 5.2406 ---
choose_element_0index {'data': (1024, 1024), 'index': (1, 1024), 'axis': 0} 2.048 0.013 ---
choose_element_0index {'data': (10000, 1), 'index': (1, 1), 'axis': 0} 0.002 0.01 ---
choose_element_0index {'data': (10000, 100), 'index': (1, 100), 'axis': 0} 0.2 0.0106 ---
clip {'data': (1024, 1024), 'a_min': 0.1, 'a_max': 0.9} 2812105.0 0.1434 0.272
clip {'data': (10000, 1), 'a_min': 0.1, 'a_max': 0.9} 2810047.75 0.0106 0.0094
clip {'data': (10000, 100), 'a_min': 0.1, 'a_max': 0.9} 2814027.75 0.1488 0.2476
cos {'data': (1024, 1024)} 2097.1521 0.4746 0.533
cos {'data': (10000, 1)} 20.0 0.0262 0.0222
cos {'data': (10000, 100)} 2000.0 0.6506 0.6954
cosh {'data': (1024, 1024)} 2097.1521 0.9284 1.6436
cosh {'data': (10000, 1)} 20.0 0.026 0.029
cosh {'data': (10000, 100)} 2000.0 0.834 1.5324
ctc_loss {'data': (1024, 100, 100), 'label': (100, 100)} 2690408.75 56.6516 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 2818319.25 1.271 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'int32'} 2820416.5 1.2022 ---
cumsum {'a': (1024, 1024), 'axis': 0, 'dtype': 'float32'} 2822513.5 1.2452 ---
degrees {'data': (1024, 1024)} 2097.1521 0.0462 0.0384
degrees {'data': (10000, 1)} 20.0 0.0126 0.012
degrees {'data': (10000, 100)} 2000.0 0.061 0.036
depth_to_space {'data': (1, 4, 2, 4), 'block_size': 2} 237913.9844 0.02 ---
depth_to_space {'data': (10, 25, 10, 100), 'block_size': 5} 238913.9219 1.1982 ---
dot {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 2.3594 5.0956
dot {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} 2000.0 0.2376 0.3486
dot {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_a': True, 'transpose_b': True} 0.2 0.496 0.3552
elemwise_add {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0462 ---
elemwise_add {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.09 ---
elemwise_add {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0128 ---
elemwise_div {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0464 ---
elemwise_div {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0134 ---
elemwise_div {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0112 ---
elemwise_mul {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.048 ---
elemwise_mul {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.015 ---
elemwise_mul {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.012 ---
elemwise_sub {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.0444 ---
elemwise_sub {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0142 ---
elemwise_sub {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0128 ---
erf {'data': (1024, 1024)} 2097.1521 0.911 0.778
erf {'data': (10000, 1)} 20.0 0.0218 0.0184
erf {'data': (10000, 100)} 2000.0 0.8886 0.7576
erfinv {'data': (1024, 1024)} 2097.1521 4.0932 0.8178
erfinv {'data': (10000, 1)} 20.0 0.0548 0.0782
erfinv {'data': (10000, 100)} 2000.0 3.4034 0.7242
exp {'data': (1024, 1024)} 2097.1521 0.5958 ---
exp {'data': (10000, 1)} 20.0 0.0206 ---
exp {'data': (10000, 100)} 2000.0 0.5546 ---
expm1 {'data': (1024, 1024)} 2097.1521 0.998 0.635
expm1 {'data': (10000, 1)} 20.0 0.0792 0.0172
expm1 {'data': (10000, 100)} 2000.0 0.952 0.6044
fill_element_0index {'lhs': (1024, 1024), 'mhs': (1024,), 'rhs': (1024,)} 2826707.75 0.0784 ---
fill_element_0index {'lhs': (10000, 1), 'mhs': (10000,), 'rhs': (10000,)} 2824650.75 0.0124 ---
fill_element_0index {'lhs': (10000, 100), 'mhs': (10000,), 'rhs': (10000,)} 2828630.75 0.078 ---
fix {'data': (1024, 1024)} 2097.1521 0.3086 ---
fix {'data': (10000, 1)} 20.0 0.0174 ---
fix {'data': (10000, 100)} 2000.0 0.3006 ---
flatten {'data': (1024, 1024)} 2097.1521 0.4218 ---
flatten {'data': (10000, 1)} 20.0 0.016 ---
flatten {'data': (10000, 100)} 2000.0 0.4934 ---
flip {'data': (1024, 1024), 'axis': 0} 242608.2344 1.201 ---
flip {'data': (10000, 1), 'axis': 0} 242628.2344 0.025 ---
flip {'data': (10000, 100), 'axis': 0} 244628.2344 1.173 ---
floor {'data': (1024, 1024)} 2097.1521 0.123 ---
floor {'data': (10000, 1)} 20.0 0.0124 ---
floor {'data': (10000, 100)} 2000.0 0.1232 ---
ftml_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'd': (1024, 1024), 'v': (1024, 1024), 'z': (1024, 1024), 'lr': 0.1, 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'wd': 0.1, 'rescale_grad': 0.4, 'clip_grad': -1.0} 1334552.125 1.5026 ---
ftml_update {'weight': (10000, 1), 'grad': (10000, 1), 'd': (10000, 1), 'v': (10000, 1), 'z': (10000, 1), 'lr': 0.5, 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'wd': 0.5, 'rescale_grad': 0.4, 'clip_grad': -1.0} 1334592.125 0.0418 ---
ftml_update {'weight': (10000, 100), 'grad': (10000, 100), 'd': (10000, 100), 'v': (10000, 100), 'z': (10000, 100), 'lr': 0.9, 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'wd': 0.9, 'rescale_grad': 0.4, 'clip_grad': -1.0} 1338572.125 1.8204 ---
ftrl_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'z': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 1342863.625 1.6096 ---
ftrl_update {'weight': (10000, 1), 'grad': (10000, 1), 'z': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 1340806.375 0.0344 ---
ftrl_update {'weight': (10000, 100), 'grad': (10000, 100), 'z': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 1344786.375 1.7374 ---
gamma {'data': (1024, 1024)} 2097.1521 4.523 8.283
gamma {'data': (10000, 1)} 20.0 0.0658 0.0816
gamma {'data': (10000, 100)} 2000.0 3.684 6.9538
gammaln {'data': (1024, 1024)} 2097.1521 22.1688 3.6534
gammaln {'data': (10000, 1)} 20.0 0.2214 0.0528
gammaln {'data': (10000, 100)} 2000.0 21.1138 3.1828
gather_nd {'data': (1024, 1024), 'indices': (1, 1)} 255465.8906 0.0248 ---
gather_nd {'data': (10000, 1), 'indices': (1, 1)} 255463.8438 0.0222 ---
gather_nd {'data': (10000, 100), 'indices': (1, 1)} 255464.25 0.035 ---
hard_sigmoid {'data': (1024, 1024)} 401985.6875 0.0962 0.1264
hard_sigmoid {'data': (10000, 1)} 399928.5312 0.011 0.0108
hard_sigmoid {'data': (10000, 100)} 403908.5312 0.0902 0.099
identity {'data': (1024, 1024)} 2097.1521 0.3702 ---
identity {'data': (10000, 1)} 20.0 0.0124 ---
identity {'data': (10000, 100)} 2000.0 0.4864 ---
khatri_rao {'args': [(32, 32), (32, 32)]} 0.008 0.0162 ---
khatri_rao {'args': [(64, 64), (64, 64)]} 0.008 0.0124 ---
lamb_update_phase1 {'weight': (1024, 1024), 'grad': (1024, 1024), 'mean': (1024, 1024), 'var': (1024, 1024), 'beta1': 0.1, 'beta2': 0.1, 'epsilon': 1e-08, 't': 1, 'wd': 0.1, 'rescale_grad': 0.4} 1349077.875 0.6448 ---
lamb_update_phase1 {'weight': (10000, 1), 'grad': (10000, 1), 'mean': (10000, 1), 'var': (10000, 1), 'beta1': 0.5, 'beta2': 0.5, 'epsilon': 1e-08, 't': 1, 'wd': 0.5, 'rescale_grad': 0.4} 1347020.75 0.0678 ---
lamb_update_phase1 {'weight': (10000, 100), 'grad': (10000, 100), 'mean': (10000, 100), 'var': (10000, 100), 'beta1': 0.9, 'beta2': 0.9, 'epsilon': 1e-08, 't': 1, 'wd': 0.9, 'rescale_grad': 0.4} 1351000.75 1.3144 ---
lamb_update_phase2 {'weight': (1024, 1024), 'g': (1024, 1024), 'r1': (1, 1024), 'r2': (1, 1024), 'lr': 0.1} 1355292.125 0.2024 ---
lamb_update_phase2 {'weight': (10000, 1), 'g': (10000, 1), 'r1': (1, 1), 'r2': (1, 1), 'lr': 0.5} 1353235.0 0.014 ---
lamb_update_phase2 {'weight': (10000, 100), 'g': (10000, 100), 'r1': (1, 100), 'r2': (1, 100), 'lr': 0.9} 1357215.0 0.1996 ---
linalg_det {'A': (1024, 1024)} 2843332.75 6.273 14.2136
linalg_extractdiag {'A': (1024, 1024)} 2841239.75 0.0122 0.0246
linalg_extracttrian {'A': (1024, 1024)} 2844386.5 0.4962 0.4698
linalg_gelqf {'A': (1024, 1024)} 2855919.75 27.4016 ---
linalg_gemm {'A': (1024, 1024), 'B': (1024, 1024), 'C': (1024, 1024), 'axis': 0} 2858017.0 2.182 4.914
linalg_gemm2 {'A': (1024, 1024), 'B': (1024, 1024), 'axis': 0} 2862211.25 1.232 2.9446
linalg_inverse {'A': (1024, 1024)} 2866405.75 14.0604 5.3512
linalg_makediag {'A': (1024, 1024)} 9306759.0 510.868 10.3234
linalg_maketrian {'A': (1024, 1035)} 7171717.5 2.8278 2.6514
linalg_potrf {'A': [[1, 0], [0, 1]]} 2841233.75 0.011 ---
linalg_potrf {'A': [[2, -1, 0], [-1, 2, -1], [0, -1, 2]]} 2841233.75 0.0102 ---
linalg_potri {'A': (1024, 1024)} 7178009.0 11.0338 7.3288
linalg_slogdet {'A': (1024, 1024)} 7180108.0 5.2206 ---
linalg_sumlogdiag {'A': (1024, 1024)} 7180108.0 0.0622 0.4224
linalg_syrk {'A': (1024, 1024)} 7186399.5 2.6536 4.3462
linalg_trmm {'A': (1024, 1024), 'B': (1024, 1024)} 7194788.0 3.055 6.9746
linalg_trsm {'A': (1024, 1024), 'B': (1024, 1024)} 7198982.5 2.7888 6.965
log {'data': (1024, 1024)} 2097.1521 0.8582 0.114
log {'data': (10000, 1)} 20.0 0.0186 0.0106
log {'data': (10000, 100)} 2000.0 0.773 0.0496
log10 {'data': (1024, 1024)} 2097.1521 0.9688 0.0516
log10 {'data': (10000, 1)} 20.0 0.0242 0.012
log10 {'data': (10000, 100)} 2000.0 1.012 0.0538
log1p {'data': (1024, 1024)} 2097.1521 0.9614 0.0502
log1p {'data': (10000, 1)} 20.0 0.0194 0.011
log1p {'data': (10000, 100)} 2000.0 0.9288 0.0504
log2 {'data': (1024, 1024)} 2097.1521 0.7932 0.0666
log2 {'data': (10000, 1)} 20.0 0.0224 0.0102
log2 {'data': (10000, 100)} 2000.0 0.7478 0.0488
log_softmax {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 405054.25 3.0146 1.8958
log_softmax {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 404045.6875 0.4254 0.222
log_softmax {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 412025.6875 4.3692 3.8542
logical_not {'data': (1024, 1024)} 2097.1521 0.0384 ---
logical_not {'data': (10000, 1)} 20.0 0.0102 ---
logical_not {'data': (10000, 100)} 2000.0 0.0568 ---
make_loss {'data': (1024, 1024)} 2097.1521 0.3632 ---
make_loss {'data': (10000, 1)} 20.0 0.0142 ---
make_loss {'data': (10000, 100)} 2000.0 0.456 ---
max {'data': (1024, 1024), 'axis': 0} 225433.0156 1.1784 0.9798
max {'data': (10000, 1), 'axis': 0} 225430.9688 0.3048 0.021
max {'data': (10000, 100), 'axis': 0} 225431.3594 1.257 0.9096
max_axis {'data': (1024, 1024), 'axis': 0} 225437.3125 1.0052 ---
max_axis {'data': (10000, 1), 'axis': 0} 225435.2656 0.248 ---
max_axis {'data': (10000, 100), 'axis': 0} 225435.6562 1.1448 ---
mean {'data': (1024, 1024), 'axis': 0} 225441.6094 1.192 1.6202
mean {'data': (10000, 1), 'axis': 0} 225439.5625 0.339 0.0298
mean {'data': (10000, 100), 'axis': 0} 225439.9531 1.2318 1.4816
min {'data': (1024, 1024), 'axis': 0} 225445.9062 1.097 0.9514
min {'data': (10000, 1), 'axis': 0} 225443.8594 0.3134 0.0212
min {'data': (10000, 100), 'axis': 0} 225444.2656 1.208 0.9864
min_axis {'data': (1024, 1024), 'axis': 0} 225450.2031 1.0054 ---
min_axis {'data': (10000, 1), 'axis': 0} 225448.1562 0.2446 ---
min_axis {'data': (10000, 100), 'axis': 0} 225448.5625 1.1444 ---
moments {'data': (1024, 1024), 'axes': [0, 1]} 7196885.5 51.8952 ---
moments {'data': (10000, 1), 'axes': [0, 1]} 7196885.5 0.5656 ---
moments {'data': (10000, 100), 'axes': [0, 1]} 7196885.5 52.1542 ---
mp_nag_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 1361506.5 0.3314 ---
mp_nag_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 1359449.375 0.0208 ---
mp_nag_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 1363429.375 0.2442 ---
mp_sgd_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 1367720.75 0.1882 ---
mp_sgd_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 1365663.625 0.0178 ---
mp_sgd_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 1369643.625 0.1814 ---
mp_sgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'weight32': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 1373935.125 0.174 ---
mp_sgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'weight32': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 1371878.0 0.0158 ---
mp_sgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'weight32': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 1375858.0 0.1712 ---
multi_all_finite {'args': [(1024, 1024)], 'num_arrays': 1} 2655663.0 0.0708 ---
multi_all_finite {'args': [(10000, 1)], 'num_arrays': 1} 2655663.0 0.0588 ---
multi_all_finite {'args': [(10000, 10)], 'num_arrays': 1} 2655663.0 0.0604 ---
multi_lars {'lrs': (1024, 1024), 'weights_sum_sq': (1024, 1024), 'grads_sum_sq': (1024, 1024), 'wds': (1024, 1024), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 2832922.25 4.0508 ---
multi_lars {'lrs': (10000, 1), 'weights_sum_sq': (10000, 1), 'grads_sum_sq': (10000, 1), 'wds': (10000, 1), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 2830865.0 0.074 ---
multi_lars {'lrs': (10000, 100), 'weights_sum_sq': (10000, 100), 'grads_sum_sq': (10000, 100), 'wds': (10000, 100), 'eta': 0.5, 'eps': 1e-08, 'rescale_grad': 0.4} 2834845.0 0.9 ---
multi_mp_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0135 ---
multi_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0148 ---
multi_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'lrs': 0.1, 'wds': 0.2, 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.036 ---
multi_sum_sq {'args': [(1024, 1024)], 'num_arrays': 1} 2655663.0 0.0088 ---
multi_sum_sq {'args': [(10000, 1)], 'num_arrays': 1} 2655663.0 0.008 ---
multi_sum_sq {'args': [(10000, 10)], 'num_arrays': 1} 2655663.0 0.0078 ---
nag_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 1380149.375 0.274 ---
nag_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 1378092.25 0.0172 ---
nag_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 1382072.25 0.2622 ---
nanprod {'data': (1024, 1024), 'axis': 0} 225454.5 1.2514 0.9976
nanprod {'data': (10000, 1), 'axis': 0} 225452.4531 0.301 0.0212
nanprod {'data': (10000, 100), 'axis': 0} 225452.8594 1.2488 0.9458
nansum {'data': (1024, 1024), 'axis': 0} 225458.7969 1.201 1.0496
nansum {'data': (10000, 1), 'axis': 0} 225456.75 0.3112 0.0216
nansum {'data': (10000, 100), 'axis': 0} 225457.1562 1.2022 0.8632
negative {'data': (1024, 1024)} 2097.1521 0.0402 ---
negative {'data': (10000, 1)} 20.0 0.0326 ---
negative {'data': (10000, 100)} 2000.0 0.0602 ---
norm {'data': (1024, 1024), 'axis': 0} 225463.0938 1.9666 1.4628
norm {'data': (10000, 1), 'axis': 0} 225461.0469 0.3276 0.0254
norm {'data': (10000, 100), 'axis': 0} 225461.4531 1.3678 1.3804
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'float32'} --- 0.008 ---
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'int32'} --- 0.008 ---
one_hot {'indices': (1, 1), 'depth': 0, 'dtype': 'float32'} --- 0.0078 ---
ones_like {'data': (1024, 1024)} 2097.1521 0.0376 ---
ones_like {'data': (10000, 1)} 20.0 0.0176 ---
ones_like {'data': (10000, 100)} 2000.0 0.0436 ---
pick {'data': (1024, 1024), 'index': (1, 1024), 'axis': 0} 255470.1875 0.0192 0.2586
pick {'data': (10000, 1), 'index': (1, 1), 'axis': 0} 255468.1406 0.0158 0.0196
pick {'data': (10000, 100), 'index': (1, 100), 'axis': 0} 255468.5469 0.0168 0.2888
preloaded_multi_mp_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 5x5 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'args5': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.017 ---
preloaded_multi_mp_sgd_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 1 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.0153 ---
preloaded_multi_sgd_mom_update {'args0': '<NDArray 5x5 @cpu(0)>', 'args1': '<NDArray 5x5 @cpu(0)>', 'args2': '<NDArray 5x5 @cpu(0)>', 'args3': '<NDArray 1 @cpu(0)>', 'args4': '<NDArray 1 @cpu(0)>', 'out': '<NDArray 5x5 @cpu(0)>'} --- 0.015 ---
prod {'data': (1024, 1024), 'axis': 0} 225467.3906 1.7888 1.4696
prod {'data': (10000, 1), 'axis': 0} 225465.3438 0.2978 0.0252
prod {'data': (10000, 100), 'axis': 0} 225465.75 1.4932 1.4082
radians {'data': (1024, 1024)} 2097.1521 0.0478 0.0406
radians {'data': (10000, 1)} 20.0 0.0154 0.0136
radians {'data': (10000, 100)} 2000.0 0.053 0.039
random_exponential {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 2.8246 ---
random_exponential {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0456 ---
random_exponential {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 2.6586 ---
random_gamma {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 7.3196 ---
random_gamma {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0824 ---
random_gamma {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 8.7672 ---
random_generalized_negative_binomial {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 11.67 ---
random_generalized_negative_binomial {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.1076 ---
random_generalized_negative_binomial {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 10.1344 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 8.5006 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0944 ---
random_negative_binomial {'k': 1, 'p': 1, 'shape': (10000, 100), 'dtype': 'float64'} 4000.0 8.1284 ---
random_normal {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 4.2624 ---
random_normal {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0392 ---
random_normal {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 4.4492 ---
random_pdf_dirichlet {'sample': (2,), 'alpha': [0.0, 2.5]} 0.002 0.009 ---
random_pdf_exponential {'sample': (2,), 'lam': [1.0, 8.5]} 0.004 0.0094 ---
random_pdf_gamma {'sample': (2,), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]} 0.004 0.0122 ---
random_pdf_generalized_negative_binomial {'sample': (2,), 'mu': [2.0, 2.5], 'alpha': [0.0, 2.5]} 0.004 0.012 ---
random_pdf_negative_binomial {'sample': (2,), 'k': [20, 49], 'p': [0.4, 0.77]} 0.004 0.0124 ---
random_pdf_normal {'sample': (2,), 'mu': [2.0, 2.5], 'sigma': [1.0, 3.7]} 0.004 0.012 ---
random_pdf_poisson {'sample': (2,), 'lam': [1.0, 8.5]} 0.004 0.0122 ---
random_pdf_uniform {'sample': (2,), 'low': [0.0, 2.5], 'high': [1.0, 3.7]} 0.004 0.0126 ---
random_poisson {'shape': (1024, 1024), 'dtype': 'float16'} 1048.576 2.5328 ---
random_poisson {'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0348 ---
random_poisson {'shape': (10000, 100), 'dtype': 'float64'} 4000.0 2.2502 ---
random_randint {'low': 0, 'high': 5, 'shape': (1024, 1024), 'dtype': 'int32'} 2097.1521 1.7264 ---
random_randint {'low': 0, 'high': 5, 'shape': (10000, 1), 'dtype': 'int64'} 40.0 0.0282 ---
random_randint {'low': 0, 'high': 5, 'shape': (10000, 100), 'dtype': 'int32'} 2000.0 1.6516 ---
random_uniform {'low': 0, 'high': 5, 'shape': (1024, 1024), 'dtype': 'float16'} 2097.1521 1.4548 ---
random_uniform {'low': 0, 'high': 5, 'shape': (10000, 1), 'dtype': 'float32'} 20.0 0.0336 ---
random_uniform {'low': 0, 'high': 5, 'shape': (10000, 100), 'dtype': 'float64'} 8000.0 1.6414 ---
ravel_multi_index {'data': (2, 1024), 'shape': (1024, 1024)} 255474.4844 0.0172 ---
ravel_multi_index {'data': (2, 1024), 'shape': (10000, 1)} 255476.5312 0.0168 ---
ravel_multi_index {'data': (2, 1024), 'shape': (10000, 100)} 255478.5938 0.0182 ---
rcbrt {'data': (1024, 1024)} 2097.1521 1.4202 1.507
rcbrt {'data': (10000, 1)} 20.0 0.0256 0.0246
rcbrt {'data': (10000, 100)} 2000.0 1.2624 1.3266
reciprocal {'data': (1024, 1024)} 2097.1521 0.0528 0.0528
reciprocal {'data': (10000, 1)} 20.0 0.0168 0.0122
reciprocal {'data': (10000, 100)} 2000.0 0.051 0.0504
relu {'data': (1024, 1024)} 2097.1521 0.0712 0.1052
relu {'data': (10000, 1)} 20.0 0.0118 0.0118
relu {'data': (10000, 100)} 2000.0 0.0634 0.092
reset_arrays {'args': [(1024, 1024)], 'num_arrays': 1} --- 0.005 ---
reset_arrays {'args': [(10000, 1)], 'num_arrays': 1} --- 0.0032 ---
reset_arrays {'args': [(10000, 10)], 'num_arrays': 1} --- 0.003 ---
reshape_like {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 2097.1521 0.3992 ---
reshape_like {'lhs': (10000, 10), 'rhs': (10000, 10)} 200.0 0.0536 ---
reshape_like {'lhs': (10000, 1), 'rhs': (10000, 1)} 20.0 0.0104 ---
rint {'data': (1024, 1024)} 2097.1521 0.2502 ---
rint {'data': (10000, 1)} 20.0 0.0124 ---
rint {'data': (10000, 100)} 2000.0 0.2364 ---
rmsprop_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4} 1386363.75 0.4028 ---
rmsprop_update {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4} 1384306.5 0.0232 ---
rmsprop_update {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4} 1388286.5 0.8 ---
rmspropalex_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'n': (1024, 1024), 'g': (1024, 1024), 'delta': (1024, 1024), 'lr': 0.1, 'gamma1': 0.1, 'gamma2': 0.1, 'epsilon': 1e-08, 'wd': 0.1, 'rescale_grad': 0.4} 1392578.0 0.5422 ---
rmspropalex_update {'weight': (10000, 1), 'grad': (10000, 1), 'n': (10000, 1), 'g': (10000, 1), 'delta': (10000, 1), 'lr': 0.5, 'gamma1': 0.5, 'gamma2': 0.5, 'epsilon': 1e-08, 'wd': 0.5, 'rescale_grad': 0.4} 1390520.875 0.0378 ---
rmspropalex_update {'weight': (10000, 100), 'grad': (10000, 100), 'n': (10000, 100), 'g': (10000, 100), 'delta': (10000, 100), 'lr': 0.9, 'gamma1': 0.9, 'gamma2': 0.9, 'epsilon': 1e-08, 'wd': 0.9, 'rescale_grad': 0.4} 1394500.875 0.804 ---
round {'data': (1024, 1024)} 2097.1521 0.3482 ---
round {'data': (10000, 1)} 20.0 0.0146 ---
round {'data': (10000, 100)} 2000.0 0.4528 ---
rsqrt {'data': (1024, 1024)} 2097.1521 0.6472 0.6758
rsqrt {'data': (10000, 1)} 20.0 0.0184 0.0156
rsqrt {'data': (10000, 100)} 2000.0 0.601 0.654
sample_exponential {'lam': [1.0, 8.5], 'shape': (1024, 1024), 'dtype': 'float16'} 2097.1521 6.1304 ---
sample_exponential {'lam': [1.0, 8.5], 'shape': (10000, 1), 'dtype': 'float32'} 40.0 0.0698 ---
sample_exponential {'lam': [1.0, 8.5], 'shape': (10000, 100), 'dtype': 'float64'} 16000.0 5.2318 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'beta': [1.0, 0.7]} 2097.1521 18.391 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'beta': [1.0, 0.7]} 40.0 0.2092 ---
sample_gamma {'alpha': [0.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'beta': [1.0, 0.7]} 8000.0 21.4558 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'alpha': [0.0, 2.5]} 2097.1521 35.7084 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'alpha': [0.0, 2.5]} 40.0 0.435 ---
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'alpha': [0.0, 2.5]} 8000.0 34.1968 ---
sample_multinomial {'data': (32, 32), 'shape': (1024, 1024), 'dtype': 'float16'} 81404.0781 257.0554 ---
sample_multinomial {'data': (32, 32), 'shape': (10000, 1), 'dtype': 'float32'} 49129.6484 1.7624 ---
sample_multinomial {'data': (32, 32), 'shape': (10000, 100), 'dtype': 'float64'} 304489.6562 171.6686 ---
sample_negative_binomial {'k': [20, 49], 'shape': (1024, 1024), 'dtype': 'float16', 'p': [0.4, 0.77]} 182781.1094 299.2208 ---
sample_negative_binomial {'k': [20, 49], 'shape': (10000, 1), 'dtype': 'float32', 'p': [0.4, 0.77]} 180763.9531 2.8592 ---
sample_negative_binomial {'k': [20, 49], 'shape': (10000, 100), 'dtype': 'float64', 'p': [0.4, 0.77]} 196723.9531 284.7328 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'sigma': [1.0, 3.7]} 195015.4062 6.015 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'sigma': [1.0, 3.7]} 192998.2656 0.0668 ---
sample_normal {'mu': [2.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'sigma': [1.0, 3.7]} 208958.2656 5.7466 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (1024, 1024), 'dtype': 'float16'} 207249.7188 6.8382 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (10000, 1), 'dtype': 'float32'} 205232.5625 0.087 ---
sample_poisson {'lam': [1.0, 8.5], 'shape': (10000, 100), 'dtype': 'float64'} 221192.5625 6.6176 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (1024, 1024), 'dtype': 'float16', 'high': [1.0, 3.7]} 219484.0156 1.6996 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (10000, 1), 'dtype': 'float32', 'high': [1.0, 3.7]} 217466.8594 0.038 ---
sample_uniform {'low': [0.0, 2.5], 'shape': (10000, 100), 'dtype': 'float64', 'high': [1.0, 3.7]} 233426.8594 1.4822 ---
sgd_mom_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 1398792.25 0.1456 ---
sgd_mom_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 1396735.125 0.013 ---
sgd_mom_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 1400715.125 0.1264 ---
sgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4, 'lazy_update': 0} 1405006.625 0.0944 ---
sgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4, 'lazy_update': 0} 1402949.5 0.0164 ---
sgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4, 'lazy_update': 0} 1406929.5 0.0902 ---
shape_array {'data': (1024, 1024)} 0.008 0.008 ---
shape_array {'data': (10000, 1)} 0.008 0.005 ---
shape_array {'data': (10000, 100)} 0.008 0.005 ---
shuffle {'data': (1024, 1024)} 2097.1521 1.0618 ---
shuffle {'data': (10000, 1)} 20.0 0.573 ---
shuffle {'data': (10000, 100)} 2000.0 1.6734 ---
sigmoid {'data': (1024, 1024)} 2097.1521 0.688 0.0498
sigmoid {'data': (10000, 1)} 20.0 0.0198 0.011
sigmoid {'data': (10000, 100)} 2000.0 0.6562 0.049
sign {'data': (1024, 1024)} 2097.1521 0.2382 0.0394
sign {'data': (10000, 1)} 20.0 0.0124 0.0114
sign {'data': (10000, 100)} 2000.0 0.2266 0.0346
signsgd_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 1411220.875 0.124 ---
signsgd_update {'weight': (10000, 1), 'grad': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 1409163.75 0.0126 ---
signsgd_update {'weight': (10000, 100), 'grad': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 1413143.75 0.1178 ---
signum_update {'weight': (1024, 1024), 'grad': (1024, 1024), 'mom': (1024, 1024), 'lr': 0.1, 'wd': 0.1, 'rescale_grad': 0.4} 1417435.25 0.3168 ---
signum_update {'weight': (10000, 1), 'grad': (10000, 1), 'mom': (10000, 1), 'lr': 0.5, 'wd': 0.5, 'rescale_grad': 0.4} 1415378.125 0.021 ---
signum_update {'weight': (10000, 100), 'grad': (10000, 100), 'mom': (10000, 100), 'lr': 0.9, 'wd': 0.9, 'rescale_grad': 0.4} 1419358.125 0.292 ---
sin {'data': (1024, 1024)} 2097.1521 0.4764 0.485
sin {'data': (10000, 1)} 20.0 0.0146 0.0134
sin {'data': (10000, 100)} 2000.0 0.4518 0.474
sinh {'data': (1024, 1024)} 2097.1521 1.4904 0.9036
sinh {'data': (10000, 1)} 20.0 0.0244 0.018
sinh {'data': (10000, 100)} 2000.0 1.424 0.8778
size_array {'data': (1024, 1024)} 0.004 0.0062 ---
size_array {'data': (10000, 1)} 0.004 0.0056 ---
size_array {'data': (10000, 100)} 0.004 0.0062 ---
slice {'data': (1024, 1024), 'begin': 0, 'end': 1} 255482.6875 0.017 0.025
slice {'data': (10000, 1), 'begin': 0, 'end': 1} 255480.6406 0.0156 0.0108
slice {'data': (10000, 100), 'begin': 0, 'end': 1} 255481.0312 0.0142 0.0288
slice_axis {'data': (1024, 1024), 'axis': 0, 'begin': 0, 'end': 1} 255486.9844 0.0128 0.0254
slice_axis {'data': (10000, 1), 'axis': 0, 'begin': 0, 'end': 1} 255484.9375 0.0124 0.0158
slice_axis {'data': (10000, 100), 'axis': 0, 'begin': 0, 'end': 1} 255485.3281 0.0126 0.0268
slice_like {'data': (1024, 1024), 'shape_like': (100, 100), 'axes': [0, 1]} 255545.1406 0.0158 0.036
slice_like {'data': (10000, 1), 'shape_like': (10, 1), 'axes': [0, 1]} 255525.1719 0.0148 0.0164
slice_like {'data': (10000, 100), 'shape_like': (100, 10), 'axes': [0, 1]} 255529.1562 0.0152 0.0312
smooth_l1 {'data': (1024, 1024)} 2655740.0 0.2966 0.219
smooth_l1 {'data': (10000, 1)} 2653683.0 0.013 0.0102
smooth_l1 {'data': (10000, 100)} 2657663.0 0.2484 0.204
softmax {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 411171.4062 3.2496 1.7626
softmax {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 410162.8438 0.3944 0.034
softmax {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 418142.8438 4.4234 1.9228
softmax_cross_entropy {'data': (1024, 1024), 'label': (1024,)} 2655663.0 0.5932 ---
softmin {'data': (1024, 1024), 'axis': 0, 'dtype': 'float16'} 417288.5625 3.3336 1.7892
softmin {'data': (10000, 1), 'axis': 0, 'dtype': 'float32'} 416280.0 0.4106 0.0374
softmin {'data': (10000, 100), 'axis': 0, 'dtype': 'float64'} 424260.0 4.5022 1.982
softsign {'data': (1024, 1024)} 2097.1521 0.0452 0.0466
softsign {'data': (10000, 1)} 20.0 0.0114 0.0104
softsign {'data': (10000, 100)} 2000.0 0.1784 0.097
sort {'data': (1024, 1024), 'axis': 0} 242180.9531 66.2036 ---
sort {'data': (10000, 1), 'axis': 0} 235969.5 1.4408 ---
sort {'data': (10000, 100), 'axis': 0} 243909.5 63.1696 ---
space_to_depth {'data': (1, 4, 2, 4), 'block_size': 2} 244628.2969 0.017 ---
space_to_depth {'data': (10, 25, 10, 100), 'block_size': 5} 245128.2969 0.8904 ---
sqrt {'data': (1024, 1024)} 2097.1521 0.6336 0.0488
sqrt {'data': (10000, 1)} 20.0 0.0216 0.0098
sqrt {'data': (10000, 100)} 2000.0 0.6446 0.1016
square {'data': (1024, 1024)} 2097.1521 0.0366 0.0438
square {'data': (10000, 1)} 20.0 0.0142 0.012
square {'data': (10000, 100)} 2000.0 0.0456 0.0474
squeeze {'data': (1, 1024, 1024), 'axis': 0} 2839136.5 0.4142 0.625
squeeze {'data': (32, 1, 256, 256), 'axis': 1} 2845428.0 1.314 1.165
stop_gradient {'data': (1024, 1024)} 2097.1521 0.3286 ---
stop_gradient {'data': (10000, 1)} 20.0 0.0116 ---
stop_gradient {'data': (10000, 100)} 2000.0 0.3582 ---
sum {'data': (1024, 1024), 'axis': 0} 225471.8906 1.8752 1.4526
sum {'data': (10000, 1), 'axis': 0} 225469.8438 0.369 0.0228
sum {'data': (10000, 100), 'axis': 0} 225470.25 1.7088 1.3704
sum_axis {'data': (1024, 1024), 'axis': 0} 225474.1406 1.9974 ---
sum_axis {'data': (10000, 1), 'axis': 0} 225474.1406 0.3746 ---
sum_axis {'data': (10000, 100), 'axis': 0} 225474.3438 1.7386 ---
swapaxes {'data': (1024, 1024), 'dim1': 0, 'dim2': 1} 247225.4531 3.0952 ---
swapaxes {'data': (10000, 1), 'dim1': 0, 'dim2': 1} 247265.4531 0.0416 ---
swapaxes {'data': (10000, 100), 'dim1': 0, 'dim2': 1} 251245.4531 1.9396 ---
take {'a': (1024, 1024), 'indices': (1, 1), 'axis': 0} 255533.2969 0.0142 0.0332
tan {'data': (1024, 1024)} 2097.1521 1.1422 0.0506
tan {'data': (10000, 1)} 20.0 0.019 0.01
tan {'data': (10000, 100)} 2000.0 1.0806 0.0492
tanh {'data': (1024, 1024)} 2097.1521 1.3942 0.0494
tanh {'data': (10000, 1)} 40.0 0.0642 0.0092
tanh {'data': (10000, 100)} 4000.0 1.339 0.0582
topk {'data': (1024, 1024), 'axis': 0, 'k': 1, 'dtype': 'float32'} 237915.6406 67.2932 ---
topk {'data': (10000, 1), 'axis': 0, 'k': 1, 'dtype': 'int32'} 237913.5938 0.5962 ---
topk {'data': (10000, 100), 'axis': 0, 'k': 1, 'dtype': 'float32'} 237914.0 55.9822 ---
transpose {'data': (1024, 1024), 'axes': [0, 1]} 255536.9062 0.3544 ---
transpose {'data': (10000, 1), 'axes': [0, 1]} 253479.75 0.0162 ---
transpose {'data': (10000, 100), 'axes': [0, 1]} 257459.75 0.3684 ---
trunc {'data': (1024, 1024)} 2097.1521 0.2564 ---
trunc {'data': (10000, 1)} 20.0 0.0124 ---
trunc {'data': (10000, 100)} 2000.0 0.2468 ---
where {'condition': (1024,), 'x': (1024, 1024), 'y': (1024, 1024)} 261822.7031 0.586 1.0934
where {'condition': (10000,), 'x': (10000, 1), 'y': (10000, 1)} 259765.5469 0.0264 0.0226
where {'condition': (10000,), 'x': (10000, 100), 'y': (10000, 100)} 263745.5625 0.5384 1.065
zeros_like {'data': (1024, 1024)} 2097.1521 0.0236 ---
zeros_like {'data': (10000, 1)} 20.0 0.0098 ---
zeros_like {'data': (10000, 100)} 2000.0 0.0276 ---
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment