Skip to content

Instantly share code, notes, and snippets.

@sandeep-krishnamurthy
Last active July 2, 2019 06:43
Show Gist options
  • Save sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50 to your computer and use it in GitHub Desktop.
Save sandeep-krishnamurthy/e0a2be893c8c4d484390c9c8813bdf50 to your computer and use it in GitHub Desktop.
MXNet operator performance 1.4.1 and 1.5.RC2 - CPU (c5.18x) - MXNet-MKL

CPU

Operator Inputs Avg Forward Time (ms) (1.4.1) Avg Forward Time (ms) (1.5.0) Avg. Backward Time (ms) (1.4.1) Avg. Backward Time (ms) (1.5.0) Max Mem Usage (Storage) (Bytes) (1.4.1) Max Mem Usage (Storage) (Bytes) (1.5.0)
BatchNorm {'beta': (3,), 'moving_mean': (3,), 'gamma': (3,), 'data': (32, 3, 256, 256), 'moving_var': (3,)} 8.0009 7.9855 7.9907 8.014 12582.9238 12582.9238
BatchNorm {'beta': (3,), 'moving_mean': (3,), 'gamma': (3,), 'data': (32, 3, 10000, 10), 'moving_var': (3,)} 14.4459 14.5993 12.363 12.5098 38400.0117 19200.0117
BlockGrad {'data': (1024, 1024)} 0.9345 1.0886 --- --- 2097.1521 2097.1521
BlockGrad {'data': (10000, 1)} 0.0038 0.0054 --- --- 20.0 20.0
BlockGrad {'data': (10000, 100)} 1.1126 1.1151 --- --- 2000.0 2000.0
Convolution {'kernel': (3,), 'weight': (64, 3, 3), 'bias': (64,), 'data': (32, 3, 256), 'layout': 'NCW', 'pad': (0,), 'dilate': (1,), 'num_filter': 64, 'stride': (1,)} 1.7578 0.242 3.3886 0.5067 2080.7681 2080.7681
Dropout {'mode': 'always', 'data': (32, 3, 256, 256), 'p': 0.5} 2.1245 1.8425 0.7703 0.6013 37748.7344 37748.7344
Dropout {'mode': 'always', 'data': (10000, 10), 'p': 0.5} 0.0241 0.0365 0.0078 0.0121 400.0 400.0
Flatten {'data': (1024, 1024)} 1.181 1.1202 --- --- 2097.1521 2097.1521
Flatten {'data': (10000, 1)} 0.0035 0.0103 --- --- 20.0 20.0
Flatten {'data': (10000, 100)} 1.1459 1.0995 --- --- 2000.0 2000.0
FullyConnected {'num_hidden': 64, 'flatten': True, 'weight': (64, 196608), 'bias': (64,), 'data': (32, 3, 256, 256)} 1.7245 1.6666 4.664 4.5218 8.192 8.192
FullyConnected {'num_hidden': 64, 'flatten': False, 'weight': (64, 256), 'bias': (64,), 'data': (32, 3, 256, 256)} 0.7191 0.4847 1.7348 1.8349 6291.4561 6291.4561
LeakyReLU {'slope': 0.1, 'act_type': 'leaky', 'data': (1024, 1024)} 0.433 0.162 0.0546 0.0178 2097.1521 2097.1521
LeakyReLU {'slope': 0.1, 'act_type': 'leaky', 'data': (10000, 1)} 0.0105 0.0294 0.0108 0.0147 20.0 20.0
LeakyReLU {'slope': 0.1, 'act_type': 'leaky', 'data': (10000, 100)} 0.1542 0.1556 0.0203 0.0132 2000.0 2000.0
LeakyReLU {'slope': 0.1, 'act_type': 'elu', 'data': (1024, 1024)} 0.4995 0.497 0.1719 0.1792 4194.3042 4194.3042
LeakyReLU {'slope': 0.1, 'act_type': 'elu', 'data': (10000, 1)} 0.0105 0.0119 0.0074 0.0456 20.0 20.0
LeakyReLU {'slope': 0.1, 'act_type': 'elu', 'data': (10000, 100)} 0.4669 0.4832 0.1624 0.1688 2000.0 2000.0
LeakyReLU {'act_type': 'selu', 'data': (1024, 1024)} 0.4989 0.5195 0.171 0.1744 2097.1521 2097.1521
LeakyReLU {'act_type': 'selu', 'data': (10000, 1)} 0.0099 0.0108 0.0074 0.047 20.0 20.0
LeakyReLU {'act_type': 'selu', 'data': (10000, 100)} 0.4729 0.4954 0.1598 0.1659 2000.0 2000.0
LeakyReLU {'act_type': 'prelu', 'gamma': (1, 1024), 'data': (1024, 1024)} 0.1798 0.2583 4.8877 2.3837 2097.1521 2097.1521
LeakyReLU {'act_type': 'prelu', 'gamma': (1, 1), 'data': (10000, 1)} 0.0148 0.0116 0.5144 0.2028 20.0 20.0
LeakyReLU {'act_type': 'prelu', 'gamma': (1, 100), 'data': (10000, 100)} 0.1752 0.2649 3.8858 2.1013 2000.0 2000.0
Pooling {'kernel': 3, 'data': (32, 3, 256), 'global_pool': 0, 'pad': 1, 'pool_type': 'avg', 'stride': 1} 0.1805 0.1746 0.2323 0.2315 49.152 49.152
abs {'data': (1024, 1024)} 0.0197 0.0207 0.1646 0.1722 2097.1521 2097.1521
abs {'data': (10000, 1)} 0.0074 0.0091 0.0071 0.038 20.0 20.0
abs {'data': (10000, 100)} 0.019 0.0205 0.1572 0.1638 2000.0 2000.0
arccos {'data': (1024, 1024)} 0.5484 0.574 0.3664 0.3713 2097.1521 2097.1521
arccos {'data': (10000, 1)} 0.0125 0.0137 0.0973 0.0928 20.0 20.0
arccos {'data': (10000, 100)} 0.5266 0.5505 0.3972 0.3564 2000.0 2000.0
arccosh {'data': (1024, 1024)} 0.4695 0.3511 0.4696 0.3398 2097.1521 2097.1521
arccosh {'data': (10000, 1)} 0.0158 0.0116 0.01 0.0108 20.0 20.0
arccosh {'data': (10000, 100)} 0.4494 0.4338 0.451 0.4419 2000.0 2000.0
arcsin {'data': (1024, 1024)} 0.5129 0.3933 0.3663 0.2937 2097.1521 2097.1521
arcsin {'data': (10000, 1)} 0.0136 0.0126 0.104 0.0958 20.0 20.0
arcsin {'data': (10000, 100)} 0.493 0.37 0.3513 0.2768 2000.0 2000.0
arcsinh {'data': (1024, 1024)} 1.0431 1.0381 0.2859 0.279 2097.1521 2097.1521
arcsinh {'data': (10000, 1)} 0.0141 0.0209 0.0117 0.0108 20.0 20.0
arcsinh {'data': (10000, 100)} 0.9962 0.9878 0.2718 0.2684 2000.0 2000.0
arctan {'data': (1024, 1024)} 0.6258 0.4473 0.0396 0.0318 2097.1521 2097.1521
arctan {'data': (10000, 1)} 0.0103 0.0115 0.0067 0.0163 20.0 20.0
arctan {'data': (10000, 100)} 0.6085 0.4275 0.0318 0.0138 4000.0 2000.0
arctanh {'data': (1024, 1024)} 0.8912 0.6664 0.0383 0.0382 2097.1521 2097.1521
arctanh {'data': (10000, 1)} 0.0176 0.0154 0.0077 0.0164 20.0 20.0
arctanh {'data': (10000, 100)} 0.8508 0.6387 0.0309 0.0194 2000.0 2000.0
argmax_channel {'data': (1024, 1024)} 0.4417 0.1278 --- --- 4.096 2.048
argmax_channel {'data': (10000, 1)} 0.0174 0.012 --- --- 20.0 20.0
argmax_channel {'data': (10000, 100)} 0.447 0.1417 --- --- 20.0 20.0
batch_dot {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)} 60.4961 56.3872 72.3274 70.2494 134217.7344 134217.7344
batch_dot {'lhs': (32, 1000, 10), 'rhs': (32, 1000, 10), 'transpose_b': True} 32.023 35.8877 6.6535 6.3897 128000.0 128000.0
batch_dot {'lhs': (32, 1000, 1), 'rhs': (32, 100, 1000), 'transpose_b': True, 'transpose_a': True} 0.207 0.6677 1.1905 1.2125 12.8 12.8
broadcast_add {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0025 0.0046 0.0021 0.0035 0.012 0.012
broadcast_div {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0025 0.0044 0.003 0.0049 0.012 0.012
broadcast_equal {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0025 0.0034 --- --- 0.012 0.012
broadcast_greater {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0018 0.0036 --- --- 0.012 0.012
broadcast_greater_equal {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0027 0.0034 --- --- 0.012 0.012
broadcast_hypot {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0021 0.0045 0.0024 0.0051 0.012 0.012
broadcast_lesser {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0018 0.0036 --- --- 0.012 0.012
broadcast_lesser_equal {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0018 0.0035 --- --- 0.012 0.012
broadcast_logical_and {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0018 0.0034 --- --- 0.012 0.012
broadcast_logical_or {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0027 0.0038 --- --- 0.012 0.012
broadcast_logical_xor {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0025 0.0033 --- --- 0.012 0.012
broadcast_maximum {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.002 0.0043 0.0022 0.0052 0.012 0.012
broadcast_minimum {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.002 0.0045 0.0022 0.0047 0.012 0.012
broadcast_minus {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} --- --- --- --- 0.012 0.012
broadcast_mod {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0021 0.0045 0.0022 0.0051 0.012 0.012
broadcast_mul {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0019 0.0047 0.0022 0.0052 0.012 0.012
broadcast_not_equal {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0026 0.0034 --- --- 0.012 0.012
broadcast_plus {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} --- --- --- --- 0.012 0.012
broadcast_power {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0028 0.005 0.0037 0.0059 0.012 0.012
broadcast_sub {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0025 0.0045 0.0022 0.0037 0.012 0.012
cbrt {'data': (1024, 1024)} 0.9478 0.6315 0.0332 0.0239 2097.1521 2097.1521
cbrt {'data': (10000, 1)} 0.0154 0.0154 0.01 0.0179 20.0 20.0
cbrt {'data': (10000, 100)} 0.9075 0.6018 0.0308 0.0233 2000.0 2000.0
ceil {'data': (1024, 1024)} 0.1269 0.0813 --- --- 2097.1521 2097.1521
ceil {'data': (10000, 1)} 0.0213 0.0243 --- --- 20.0 20.0
ceil {'data': (10000, 100)} 0.0767 0.0781 --- --- 2000.0 2000.0
cos {'data': (1024, 1024)} 0.3447 0.2635 0.3667 0.2383 2097.1521 2097.1521
cos {'data': (10000, 1)} 0.0093 0.0097 0.0096 0.01 20.0 20.0
cos {'data': (10000, 100)} 0.3281 0.2539 0.3493 0.2285 2000.0 2000.0
cosh {'data': (1024, 1024)} 0.6565 0.654 1.1456 1.149 2097.1521 2097.1521
cosh {'data': (10000, 1)} 0.0167 0.0116 0.0154 0.0156 20.0 20.0
cosh {'data': (10000, 100)} 0.6261 0.6237 1.0902 1.0988 2000.0 2000.0
degrees {'data': (1024, 1024)} 0.0211 0.023 0.0176 0.0205 2097.1521 2097.1521
degrees {'data': (10000, 1)} 0.0038 0.011 0.0067 0.0128 20.0 20.0
degrees {'data': (10000, 100)} 0.0204 0.0223 0.0143 0.0164 2000.0 2000.0
dot {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 1.0055 0.8465 2.3081 1.9458 4194.3042 2097.1521
dot {'lhs': (1000, 10), 'rhs': (1000, 10), 'transpose_b': True} 0.1167 0.0817 0.1221 0.1437 2000.0 2000.0
dot {'lhs': (1000, 1), 'rhs': (100, 1000), 'transpose_b': True, 'transpose_a': True} 0.0102 0.0161 0.0238 0.0429 0.2 0.2
elemwise_add {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0214 0.0187 --- --- 0.012 0.024
elemwise_div {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0025 0.0033 --- --- 0.012 0.012
elemwise_mul {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0026 0.0034 --- --- 0.012 0.012
elemwise_sub {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0026 0.0035 --- --- 0.012 0.012
erf {'data': (1024, 1024)} 0.6751 0.5095 0.6064 0.4381 2097.1521 2097.1521
erf {'data': (10000, 1)} 0.0146 0.0128 0.0139 0.0125 20.0 20.0
erf {'data': (10000, 100)} 0.6425 0.5344 0.5882 0.4183 4000.0 2000.0
exp {'data': (1024, 1024)} 0.4367 0.2743 --- --- 2097.1521 2097.1521
exp {'data': (10000, 1)} 0.0123 0.0103 --- --- 20.0 20.0
exp {'data': (10000, 100)} 0.4143 0.2635 --- --- 2000.0 2000.0
expm1 {'data': (1024, 1024)} 0.7395 0.7458 0.4674 0.4593 2097.1521 2097.1521
expm1 {'data': (10000, 1)} 0.0224 0.0139 0.0115 0.0118 20.0 20.0
expm1 {'data': (10000, 100)} 0.7062 0.7087 0.4415 0.4374 2000.0 2000.0
fix {'data': (1024, 1024)} 0.2035 0.2007 --- --- 2097.1521 2097.1521
fix {'data': (10000, 1)} 0.0084 0.0425 --- --- 20.0 20.0
fix {'data': (10000, 100)} 0.1926 0.2019 --- --- 2000.0 4000.0
flatten {'data': (1024, 1024)} --- --- --- --- 2097.1521 2097.1521
flatten {'data': (10000, 1)} --- --- --- --- 20.0 20.0
flatten {'data': (10000, 100)} --- --- --- --- 2000.0 2000.0
floor {'data': (1024, 1024)} 0.0855 0.0771 --- --- 2097.1521 2097.1521
floor {'data': (10000, 1)} 0.0231 0.0218 --- --- 20.0 20.0
floor {'data': (10000, 100)} 0.0786 0.0743 --- --- 2000.0 2000.0
gamma {'data': (1024, 1024)} 2.7952 2.7983 5.2844 5.1828 4194.3042 2097.1521
gamma {'data': (10000, 1)} 0.0419 0.0317 0.0561 0.0564 20.0 20.0
gamma {'data': (10000, 100)} 2.6624 2.2738 5.0535 4.1741 4000.0 2000.0
gammaln {'data': (1024, 1024)} 37.6309 17.0666 2.0739 2.5733 4194.3042 2097.1521
gammaln {'data': (10000, 1)} 0.3838 0.1885 0.0302 0.0325 20.0 20.0
gammaln {'data': (10000, 100)} 35.9658 19.2171 1.8351 1.9163 4000.0 2000.0
hard_sigmoid {'alpha': 0.25, 'beta': 0.5, 'data': (1024, 1024)} 0.0579 0.0663 0.0506 0.0825 2097.1521 2097.1521
hard_sigmoid {'alpha': 0.25, 'beta': 0.5, 'data': (10000, 1)} 0.0077 0.0076 0.0075 0.0079 20.0 20.0
hard_sigmoid {'alpha': 0.25, 'beta': 0.5, 'data': (10000, 100)} 0.0555 0.0632 0.0479 0.0792 2000.0 2000.0
identity {'data': (1024, 1024)} --- --- --- --- 2097.1521 2097.1521
identity {'data': (10000, 1)} --- --- --- --- 20.0 20.0
identity {'data': (10000, 100)} --- --- --- --- 2000.0 2000.0
log {'data': (1024, 1024)} 0.5515 0.5511 0.0288 0.0297 2097.1521 2097.1521
log {'data': (10000, 1)} 0.0137 0.0131 0.0074 0.015 20.0 20.0
log {'data': (10000, 100)} 0.5272 0.5276 0.0165 0.0178 2000.0 2000.0
log10 {'data': (1024, 1024)} 0.6513 0.6611 0.0368 0.0321 2097.1521 2097.1521
log10 {'data': (10000, 1)} 0.0147 0.0143 0.0072 0.0126 20.0 20.0
log10 {'data': (10000, 100)} 0.6318 0.6224 0.0311 0.0245 2000.0 2000.0
log1p {'data': (1024, 1024)} 0.6822 0.7117 0.0354 0.0337 2097.1521 2097.1521
log1p {'data': (10000, 1)} 0.0131 0.0114 0.0076 0.0128 20.0 20.0
log1p {'data': (10000, 100)} 0.6593 0.6774 0.0272 0.028 2000.0 2000.0
log2 {'data': (1024, 1024)} 0.5325 0.378 0.0306 0.0283 2097.1521 2097.1521
log2 {'data': (10000, 1)} 0.0141 0.0121 0.0073 0.0149 20.0 20.0
log2 {'data': (10000, 100)} 0.5092 0.3642 0.0181 0.0132 2000.0 2000.0
log_softmax {'temperature': 0.5, 'axis': -1, 'data': (1024, 1024)} 1.176 1.1758 0.5789 0.5908 4194.3042 2097.1521
log_softmax {'temperature': 0.5, 'axis': -1, 'data': (10000, 1)} 0.0213 0.0172 0.0184 0.0139 20.0 20.0
log_softmax {'temperature': 0.5, 'axis': -1, 'data': (10000, 100)} 0.9789 1.1135 0.4785 0.5578 2000.0 2000.0
logical_not {'data': (1024, 1024)} 0.0342 0.0328 --- --- 2097.1521 2097.1521
logical_not {'data': (10000, 1)} 0.0048 0.0175 --- --- 40.0 20.0
logical_not {'data': (10000, 100)} 0.056 0.0314 --- --- 4000.0 2000.0
make_loss {'data': (1024, 1024)} 1.2697 1.1637 --- --- 2097.1521 2097.1521
make_loss {'data': (10000, 1)} 0.0058 0.0051 --- --- 20.0 20.0
make_loss {'data': (10000, 100)} 1.49 1.0948 --- --- 4000.0 2000.0
max {'axis': (), 'data': (1024, 1024)} 19.1602 7.1825 1.8421 0.3114 0.002 0.002
max {'axis': 0, 'data': (10000, 1)} 0.218 0.1065 0.0256 0.0121 0.002 0.002
max {'axis': (0, 1), 'data': (10000, 100)} 18.2947 6.8747 1.7538 0.3819 0.002 0.002
max_axis {'axis': (), 'data': (1024, 1024)} 21.3432 7.3994 --- --- 0.004 0.004
max_axis {'axis': 0, 'data': (10000, 1)} 0.1923 0.0794 --- --- 0.002 0.002
max_axis {'axis': (0, 1), 'data': (10000, 100)} 19.08 7.1096 --- --- 0.004 0.002
mean {'axis': (), 'data': (1024, 1024)} 20.3092 7.6116 1.7335 0.8835 0.004 0.002
mean {'axis': 0, 'data': (10000, 1)} 0.2564 0.1173 0.0272 0.0185 0.002 0.002
mean {'axis': (0, 1), 'data': (10000, 100)} 22.8492 7.2628 1.7059 0.7517 0.004 0.002
min {'axis': (), 'data': (1024, 1024)} 19.0906 7.3093 1.8435 0.4186 0.002 0.004
min {'axis': 0, 'data': (10000, 1)} 0.2145 0.0732 0.0256 0.0195 0.002 0.002
min {'axis': (0, 1), 'data': (10000, 100)} 18.2143 7.62 1.754 0.525 0.002 0.004
min_axis {'axis': (), 'data': (1024, 1024)} 19.0678 7.1708 --- --- 0.002 0.004
min_axis {'axis': 0, 'data': (10000, 1)} 0.1914 0.0726 --- --- 0.002 0.002
min_axis {'axis': (0, 1), 'data': (10000, 100)} 18.1892 7.2402 --- --- 0.002 0.002
nanprod {'axis': (), 'data': (1024, 1024)} 18.9492 7.2374 1.8323 0.4292 0.002 0.002
nanprod {'axis': 0, 'data': (10000, 1)} 0.2108 0.1044 0.0252 0.014 0.002 0.002
nanprod {'axis': (0, 1), 'data': (10000, 100)} 18.0501 6.9665 1.7485 0.4629 0.002 0.004
nansum {'axis': (), 'data': (1024, 1024)} 19.3539 9.5764 0.9261 0.3318 0.002 0.004
nansum {'axis': 0, 'data': (10000, 1)} 0.218 0.1204 0.0165 0.0128 0.002 0.002
nansum {'axis': (0, 1), 'data': (10000, 100)} 18.4616 8.982 0.8795 0.3825 0.002 0.004
negative {'data': (1024, 1024)} 0.0309 0.0313 --- --- 2097.1521 2097.1521
negative {'data': (10000, 1)} 0.0038 0.0099 --- --- 20.0 20.0
negative {'data': (10000, 100)} 0.0294 0.0307 --- --- 2000.0 2000.0
ones_like {'data': (1024, 1024)} 0.0273 0.0279 --- --- 2097.1521 2097.1521
ones_like {'data': (10000, 1)} 0.0031 0.0099 --- --- 20.0 20.0
ones_like {'data': (10000, 100)} 0.0264 0.0278 --- --- 2000.0 2000.0
prod {'axis': (), 'data': (1024, 1024)} 18.3052 7.3333 1.8268 0.6137 0.002 0.004
prod {'axis': 0, 'data': (10000, 1)} 0.2063 0.0969 0.0251 0.0165 0.002 0.002
prod {'axis': (0, 1), 'data': (10000, 100)} 17.4361 8.0385 1.746 0.6575 0.002 0.004
radians {'data': (1024, 1024)} 0.0229 0.0205 0.0192 0.0194 2097.1521 2097.1521
radians {'data': (10000, 1)} 0.0086 0.0071 0.0068 0.0095 20.0 20.0
radians {'data': (10000, 100)} 0.0222 0.0202 0.0155 0.0143 2000.0 2000.0
random_exponential {'shape': (1024, 1024)} 1.6658 1.1995 --- --- 4194.3042 4194.3042
random_exponential {'shape': (10000, 1)} 0.026 0.0245 --- --- 20.0 20.0
random_exponential {'shape': (10000, 100)} 1.762 1.1383 --- --- 4000.0 2000.0
random_gamma {'shape': (1024, 1024)} 4.4205 5.1698 --- --- 2097.1521 2097.1521
random_gamma {'shape': (10000, 1)} 0.058 0.0654 --- --- 20.0 20.0
random_gamma {'shape': (10000, 100)} 4.2231 4.949 --- --- 4000.0 4000.0
random_generalized_negative_binomial {'shape': (1024, 1024)} 6.2619 7.1031 --- --- 4194.3042 4194.3042
random_generalized_negative_binomial {'shape': (10000, 1)} 0.0681 0.0883 --- --- 20.0 20.0
random_generalized_negative_binomial {'shape': (10000, 100)} 5.11 6.7376 --- --- 2000.0 4000.0
random_negative_binomial {'shape': (1024, 1024), 'k': 1, 'p': 1} 5.9661 4.019 --- --- 2097.1521 2097.1521
random_negative_binomial {'shape': (10000, 1), 'k': 1, 'p': 1} 0.0731 0.0558 --- --- 20.0 20.0
random_negative_binomial {'shape': (10000, 100), 'k': 1, 'p': 1} 5.7027 3.9245 --- --- 4000.0 2000.0
random_normal {'shape': (1024, 1024)} 2.1622 1.8507 --- --- 2097.1521 2097.1521
random_normal {'shape': (10000, 1)} 0.0295 0.034 --- --- 20.0 20.0
random_normal {'shape': (10000, 100)} 2.0216 1.7918 --- --- 2000.0 4000.0
random_poisson {'shape': (1024, 1024)} 1.8634 1.6017 --- --- 2097.1521 4194.3042
random_poisson {'shape': (10000, 1)} 0.0265 0.0258 --- --- 20.0 40.0
random_poisson {'shape': (10000, 100)} 1.7835 1.5337 --- --- 2000.0 4000.0
random_randint {'low': 0, 'shape': (1024, 1024), 'high': 5} 1.0431 0.8278 --- --- 4194.3042 2097.1521
random_randint {'low': 0, 'shape': (10000, 1), 'high': 5} 0.0183 0.0167 --- --- 20.0 20.0
random_randint {'low': 0, 'shape': (10000, 100), 'high': 5} 1.0256 0.7907 --- --- 4000.0 2000.0
random_uniform {'low': 0, 'shape': (1024, 1024), 'high': 5} 0.5809 0.4009 --- --- 2097.1521 4194.3042
random_uniform {'low': 0, 'shape': (10000, 1), 'high': 5} 0.0128 0.0124 --- --- 20.0 20.0
random_uniform {'low': 0, 'shape': (10000, 100), 'high': 5} 0.5552 0.4814 --- --- 2000.0 2000.0
rcbrt {'data': (1024, 1024)} 1.0 0.6241 1.0904 0.6415 4194.3042 2097.1521
rcbrt {'data': (10000, 1)} 0.0148 0.0148 0.0152 0.0142 20.0 20.0
rcbrt {'data': (10000, 100)} 0.9322 0.5979 1.0092 0.613 4000.0 2000.0
reciprocal {'data': (1024, 1024)} 0.0313 0.0214 0.0566 0.0312 2097.1521 2097.1521
reciprocal {'data': (10000, 1)} 0.0043 0.0134 0.007 0.0159 20.0 20.0
reciprocal {'data': (10000, 100)} 0.0302 0.0213 0.0296 0.0155 2000.0 2000.0
relu {'data': (1024, 1024)} 0.022 0.0353 0.0225 0.037 2097.1521 2097.1521
relu {'data': (10000, 1)} 0.0036 0.0144 0.0068 0.0187 20.0 20.0
relu {'data': (10000, 100)} 0.0213 0.034 0.0175 0.0345 2000.0 2000.0
rint {'data': (1024, 1024)} 0.2766 0.1615 --- --- 2097.1521 2097.1521
rint {'data': (10000, 1)} 0.0078 0.0483 --- --- 20.0 20.0
rint {'data': (10000, 100)} 0.158 0.1544 --- --- 2000.0 2000.0
round {'data': (1024, 1024)} 0.2226 0.1733 --- --- 2097.1521 2097.1521
round {'data': (10000, 1)} 0.0082 0.0596 --- --- 20.0 20.0
round {'data': (10000, 100)} 0.212 0.1669 --- --- 2000.0 2000.0
rsqrt {'data': (1024, 1024)} 0.4565 0.3737 0.4722 0.3615 2097.1521 2097.1521
rsqrt {'data': (10000, 1)} 0.0115 0.0105 0.01 0.0108 20.0 20.0
rsqrt {'data': (10000, 100)} 0.4381 0.3602 0.4502 0.3447 2000.0 2000.0
sample_exponential {'shape': (1024, 1024), 'lam': [1.0, 8.5]} 3.6523 3.1681 --- --- 8388.6084 8388.6084
sample_exponential {'shape': (10000, 1), 'lam': [1.0, 8.5]} 0.0404 0.0384 --- --- 40.0 40.0
sample_exponential {'shape': (10000, 100), 'lam': [1.0, 8.5]} 2.7679 2.9097 --- --- 4000.0 8000.0
sample_gamma {'shape': (1024, 1024), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]} 13.1229 12.3537 --- --- 8388.6084 8388.6084
sample_gamma {'shape': (10000, 1), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]} 0.1378 0.1359 --- --- 80.0 40.0
sample_gamma {'shape': (10000, 100), 'alpha': [0.0, 2.5], 'beta': [1.0, 0.7]} 12.4871 11.8388 --- --- 8000.0 4000.0
sample_generalized_negative_binomial {'shape': (1024, 1024), 'alpha': [0.0, 2.5], 'mu': [2.0, 2.5]} 27.3342 27.56 --- --- 4194.3042 8388.6084
sample_generalized_negative_binomial {'shape': (10000, 1), 'alpha': [0.0, 2.5], 'mu': [2.0, 2.5]} 0.2789 0.2808 --- --- 40.0 40.0
sample_generalized_negative_binomial {'shape': (10000, 100), 'alpha': [0.0, 2.5], 'mu': [2.0, 2.5]} 26.0722 25.7926 --- --- 4000.0 4000.0
sample_negative_binomial {'shape': (1024, 1024), 'k': [20, 49], 'p': [0.4, 0.77]} 335.0974 270.0913 --- --- 8388.6084 4194.3042
sample_negative_binomial {'shape': (10000, 1), 'k': [20, 49], 'p': [0.4, 0.77]} 3.1917 2.5851 --- --- 80.0 80.0
sample_negative_binomial {'shape': (10000, 100), 'k': [20, 49], 'p': [0.4, 0.77]} 318.5452 257.783 --- --- 8000.0 8000.0
sample_normal {'shape': (1024, 1024), 'sigma': [1.0, 3.7], 'mu': [2.0, 2.5]} 2.8849 3.7052 --- --- 4194.3042 8388.6084
sample_normal {'shape': (10000, 1), 'sigma': [1.0, 3.7], 'mu': [2.0, 2.5]} 0.0461 0.0449 --- --- 40.0 40.0
sample_normal {'shape': (10000, 100), 'sigma': [1.0, 3.7], 'mu': [2.0, 2.5]} 4.0236 3.5366 --- --- 8000.0 8000.0
sample_poisson {'shape': (1024, 1024), 'lam': [1.0, 8.5]} 5.1852 5.1714 --- --- 4194.3042 8388.6084
sample_poisson {'shape': (10000, 1), 'lam': [1.0, 8.5]} 0.0598 0.0661 --- --- 40.0 40.0
sample_poisson {'shape': (10000, 100), 'lam': [1.0, 8.5]} 4.9439 4.9404 --- --- 4000.0 8000.0
sample_uniform {'low': [0.0, 2.5], 'shape': (1024, 1024), 'high': [1.0, 3.7]} 0.8399 0.7192 --- --- 4194.3042 4194.3042
sample_uniform {'low': [0.0, 2.5], 'shape': (10000, 1), 'high': [1.0, 3.7]} 0.0184 0.0155 --- --- 40.0 40.0
sample_uniform {'low': [0.0, 2.5], 'shape': (10000, 100), 'high': [1.0, 3.7]} 0.8011 0.692 --- --- 4000.0 4000.0
shuffle {'data': (1024, 1024)} 1.5753 1.4351 --- --- 4194.3042 2097.1521
shuffle {'data': (10000, 1)} 0.2571 0.2901 --- --- 20.0 20.0
shuffle {'data': (10000, 100)} 1.8581 1.9192 --- --- 2000.0 2000.0
sigmoid {'data': (1024, 1024)} 0.4888 0.6225 0.0278 0.0293 4194.3042 2097.1521
sigmoid {'data': (10000, 1)} 0.0148 0.0132 0.0108 0.0184 20.0 20.0
sigmoid {'data': (10000, 100)} 0.4766 0.4633 0.0192 0.0179 2000.0 2000.0
sign {'data': (1024, 1024)} 0.1648 0.1591 0.0183 0.0176 2097.1521 2097.1521
sign {'data': (10000, 1)} 0.0322 0.0497 0.0066 0.0129 20.0 20.0
sign {'data': (10000, 100)} 0.158 0.1531 0.0159 0.0123 2000.0 2000.0
sin {'data': (1024, 1024)} 0.3474 0.3467 0.349 0.3563 2097.1521 2097.1521
sin {'data': (10000, 1)} 0.0083 0.0108 0.0077 0.0106 20.0 20.0
sin {'data': (10000, 100)} 0.3324 0.3732 0.3335 0.4071 2000.0 2000.0
sinh {'data': (1024, 1024)} 1.1067 1.1262 0.6819 0.6829 2097.1521 2097.1521
sinh {'data': (10000, 1)} 0.0191 0.0165 0.0124 0.0134 20.0 20.0
sinh {'data': (10000, 100)} 1.0553 1.065 0.6514 0.6535 2000.0 2000.0
size_array {'data': (1024, 1024)} 0.0015 0.0026 --- --- 0.004 0.004
size_array {'data': (10000, 1)} 0.0016 0.0026 --- --- 0.004 0.008
size_array {'data': (10000, 100)} 0.0016 0.0026 --- --- 0.004 0.004
softmax {'temperature': 0.5, 'axis': -1, 'data': (1024, 1024)} 1.2038 1.2071 0.0934 0.0876 4194.3042 2097.1521
softmax {'temperature': 0.5, 'axis': -1, 'data': (10000, 1)} 0.02 0.0168 0.015 0.0117 20.0 20.0
softmax {'temperature': 0.5, 'axis': -1, 'data': (10000, 100)} 1.1498 1.1432 0.096 0.0936 2000.0 2000.0
softsign {'data': (1024, 1024)} 0.0248 0.0259 0.0358 0.0348 2097.1521 2097.1521
softsign {'data': (10000, 1)} 0.0053 0.0154 0.0078 0.0171 20.0 20.0
softsign {'data': (10000, 100)} 0.0244 0.0252 0.0198 0.0193 2000.0 2000.0
sqrt {'data': (1024, 1024)} 0.4656 0.4432 0.0223 0.0221 2097.1521 2097.1521
sqrt {'data': (10000, 1)} 0.0099 0.0097 0.0088 0.0147 20.0 20.0
sqrt {'data': (10000, 100)} 0.4424 0.4254 0.0186 0.0172 2000.0 2000.0
square {'data': (1024, 1024)} 0.0219 0.0191 0.0274 0.0246 2097.1521 2097.1521
square {'data': (10000, 1)} 0.0037 0.0063 0.0069 0.0107 20.0 20.0
square {'data': (10000, 100)} 0.0222 0.0194 0.0172 0.015 2000.0 2000.0
stop_gradient {'data': (1024, 1024)} --- --- --- --- 4194.3042 2097.1521
stop_gradient {'data': (10000, 1)} --- --- --- --- 40.0 20.0
stop_gradient {'data': (10000, 100)} --- --- --- --- 4000.0 2000.0
sum {'axis': (), 'data': (1024, 1024)} 18.7897 7.5997 0.9073 0.4699 0.002 0.002
sum {'axis': 0, 'data': (10000, 1)} 0.2124 0.1119 0.0149 0.0131 0.002 0.004
sum {'axis': (0, 1), 'data': (10000, 100)} 17.9343 8.0291 0.863 0.4388 0.002 0.004
sum_axis {'axis': (), 'data': (1024, 1024)} 21.5143 7.6222 --- --- 0.004 0.004
sum_axis {'axis': 0, 'data': (10000, 1)} 0.1886 0.0834 --- --- 0.002 0.002
sum_axis {'axis': (0, 1), 'data': (10000, 100)} 19.2096 7.2101 --- --- 0.004 0.002
tan {'data': (1024, 1024)} 0.7864 0.7897 0.033 0.0304 2097.1521 2097.1521
tan {'data': (10000, 1)} 0.014 0.0148 0.0097 0.0165 20.0 20.0
tan {'data': (10000, 100)} 0.7533 0.753 0.0314 0.0314 2000.0 2000.0
tanh {'data': (1024, 1024)} 1.0414 1.0349 0.0361 0.0312 2097.1521 2097.1521
tanh {'data': (10000, 1)} 0.0184 0.0142 0.0156 0.014 40.0 20.0
tanh {'data': (10000, 100)} 1.0187 0.9849 0.0523 0.0288 2000.0 2000.0
trunc {'data': (1024, 1024)} 0.2086 0.2099 --- --- 2097.1521 2097.1521
trunc {'data': (10000, 1)} 0.0079 0.053 --- --- 20.0 20.0
trunc {'data': (10000, 100)} 0.1992 0.2015 --- --- 2000.0 2000.0
zeros_like {'data': (1024, 1024)} 0.0326 0.0336 --- --- 2097.1521 2097.1521
zeros_like {'data': (10000, 1)} 0.0035 0.0048 --- --- 20.0 20.0
zeros_like {'data': (10000, 100)} 0.0315 0.0329 --- --- 2000.0 2000.0

GPU

Operator Inputs Avg Forward Time (ms) (v1) Avg Forward Time (ms) (v2) Avg. Backward Time (ms) (v1) Avg. Backward Time (ms) (v2) Max Mem Usage (Storage) (Bytes) (v1) Max Mem Usage (Storage) (Bytes) (v2)
BatchNorm {'moving_mean': (3,), 'moving_var': (3,), 'beta': (3,), 'gamma': (3,), 'data': (32, 3, 256, 256)} 1.1285 1.1327 2.3064 2.3057 12582.9238 12582.9238
BatchNorm {'moving_mean': (3,), 'moving_var': (3,), 'beta': (3,), 'gamma': (3,), 'data': (32, 3, 10000, 10)} 1.716 1.7203 3.5197 3.5286 19200.0117 19200.0117
BlockGrad {'data': (1024, 1024)} 0.0322 0.0329 --- --- 2097.1521 2097.1521
BlockGrad {'data': (10000, 1)} 0.0203 0.0202 --- --- 20.0 20.0
BlockGrad {'data': (10000, 100)} 0.0344 0.0344 --- --- 2000.0 2000.0
Convolution {'weight': (64, 3, 3), 'stride': (1,), 'layout': 'NCW', 'kernel': (3,), 'num_filter': 64, 'data': (32, 3, 256), 'bias': (64,), 'pad': (0,), 'dilate': (1,)} 0.05 0.0515 0.0995 0.1014 1040.384 1040.384
Dropout {'p': 0.5, 'mode': 'always', 'data': (32, 3, 256, 256)} 5.3855 0.1185 0.11 0.0945 25165.8242 25165.8242
Dropout {'p': 0.5, 'mode': 'always', 'data': (10000, 10)} 0.0856 0.0589 0.0189 0.022 400.0 400.0
Flatten {'data': (1024, 1024)} 0.0321 0.0334 --- --- 2097.1521 2097.1521
Flatten {'data': (10000, 1)} 0.0204 0.0216 --- --- 20.0 20.0
Flatten {'data': (10000, 100)} 0.0342 0.0357 --- --- 2000.0 2000.0
FullyConnected {'bias': (64,), 'weight': (64, 196608), 'num_hidden': 64, 'flatten': True, 'data': (32, 3, 256, 256)} 0.423 0.3097 0.3726 0.3474 4.096 4.096
FullyConnected {'bias': (64,), 'weight': (64, 256), 'num_hidden': 64, 'flatten': False, 'data': (32, 3, 256, 256)} 0.1327 0.1261 0.5494 0.2547 3145.728 3145.728
LeakyReLU {'act_type': 'leaky', 'slope': 0.1, 'data': (1024, 1024)} 0.0291 0.0302 0.0305 0.0318 2097.1521 2097.1521
LeakyReLU {'act_type': 'leaky', 'slope': 0.1, 'data': (10000, 1)} 0.0249 0.0256 0.0171 0.0183 20.0 20.0
LeakyReLU {'act_type': 'leaky', 'slope': 0.1, 'data': (10000, 100)} 0.0286 0.0292 0.03 0.031 2000.0 2000.0
LeakyReLU {'act_type': 'elu', 'slope': 0.1, 'data': (1024, 1024)} 0.0294 0.0307 0.0309 0.032 2097.1521 2097.1521
LeakyReLU {'act_type': 'elu', 'slope': 0.1, 'data': (10000, 1)} 0.018 0.0189 0.017 0.0181 20.0 20.0
LeakyReLU {'act_type': 'elu', 'slope': 0.1, 'data': (10000, 100)} 0.0291 0.0296 0.0311 0.0315 2000.0 2000.0
LeakyReLU {'act_type': 'selu', 'data': (1024, 1024)} 0.0319 0.0304 0.034 0.0322 2097.1521 2097.1521
LeakyReLU {'act_type': 'selu', 'data': (10000, 1)} 0.0183 0.0189 0.0172 0.0183 20.0 20.0
LeakyReLU {'act_type': 'selu', 'data': (10000, 100)} 0.0287 0.0298 0.0299 0.031 2000.0 2000.0
LeakyReLU {'act_type': 'prelu', 'gamma': (1, 1024), 'data': (1024, 1024)} 0.0305 0.0319 0.0935 0.0909 2097.1521 2097.1521
LeakyReLU {'act_type': 'prelu', 'gamma': (1, 1), 'data': (10000, 1)} 0.0201 0.0219 0.0435 0.0453 20.0 20.0
LeakyReLU {'act_type': 'prelu', 'gamma': (1, 100), 'data': (10000, 100)} 0.0298 0.0315 0.1041 0.0993 4000.0 2000.0
Pooling {'pool_type': 'avg', 'stride': 1, 'kernel': 3, 'data': (32, 3, 256), 'global_pool': 0, 'pad': 1} 0.0188 0.0202 0.0227 0.0242 49.152 49.152
abs {'data': (1024, 1024)} 0.029 0.0289 0.031 0.0307 2097.1521 2097.1521
abs {'data': (10000, 1)} 0.0178 0.0177 0.0171 0.0172 20.0 20.0
abs {'data': (10000, 100)} 0.0287 0.0281 0.0302 0.0305 2000.0 2000.0
arccos {'data': (1024, 1024)} 0.0292 0.0295 0.0345 0.0349 2097.1521 2097.1521
arccos {'data': (10000, 1)} 0.0181 0.0178 0.0175 0.0177 20.0 20.0
arccos {'data': (10000, 100)} 0.0291 0.0286 0.0331 0.0333 2000.0 2000.0
arccosh {'data': (1024, 1024)} 0.0313 0.0316 0.0339 0.0341 2097.1521 2097.1521
arccosh {'data': (10000, 1)} 0.0179 0.0182 0.0175 0.0177 20.0 20.0
arccosh {'data': (10000, 100)} 0.0309 0.0309 0.0323 0.0328 2000.0 2000.0
arcsin {'data': (1024, 1024)} 0.0287 0.0295 0.0333 0.0338 2097.1521 2097.1521
arcsin {'data': (10000, 1)} 0.0174 0.018 0.0171 0.0177 20.0 20.0
arcsin {'data': (10000, 100)} 0.0284 0.0289 0.0318 0.0326 2000.0 2000.0
arcsinh {'data': (1024, 1024)} 0.0303 0.0326 0.033 0.0339 2097.1521 4194.3042
arcsinh {'data': (10000, 1)} 0.0182 0.0197 0.0174 0.0182 20.0 20.0
arcsinh {'data': (10000, 100)} 0.0301 0.0315 0.0315 0.0325 2000.0 2000.0
arctan {'data': (1024, 1024)} 0.0308 0.0294 0.036 0.0325 2097.1521 2097.1521
arctan {'data': (10000, 1)} 0.0181 0.0179 0.0176 0.0173 20.0 20.0
arctan {'data': (10000, 100)} 0.0291 0.0289 0.0314 0.0314 2000.0 2000.0
arctanh {'data': (1024, 1024)} 0.0294 0.0297 0.0324 0.0323 2097.1521 2097.1521
arctanh {'data': (10000, 1)} 0.0178 0.0179 0.017 0.0172 20.0 20.0
arctanh {'data': (10000, 100)} 0.0288 0.0289 0.0312 0.0313 2000.0 2000.0
argmax_channel {'data': (1024, 1024)} 0.282 0.2803 --- --- 2.048 2.048
argmax_channel {'data': (10000, 1)} 0.019 0.0189 --- --- 20.0 20.0
argmax_channel {'data': (10000, 100)} 0.0449 0.0359 --- --- 20.0 20.0
batch_dot {'lhs': (32, 1024, 1024), 'rhs': (32, 1024, 1024)} 4.6829 4.7 9.2628 9.2645 67108.8672 67108.8672
batch_dot {'lhs': (32, 1000, 10), 'transpose_b': True, 'rhs': (32, 1000, 10)} 0.2658 0.291 1.2259 1.2288 64000.0 64000.0
batch_dot {'transpose_a': True, 'lhs': (32, 1000, 1), 'transpose_b': True, 'rhs': (32, 100, 1000)} 0.0436 0.0459 0.0614 0.0643 6.4 6.4
broadcast_add {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0242 0.0197 0.0261 0.0203 0.012 0.024
broadcast_div {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0257 0.0184 0.0289 0.022 0.012 0.012
broadcast_equal {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0178 0.0185 --- --- 0.012 0.012
broadcast_greater {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0237 0.0185 --- --- 0.024 0.012
broadcast_greater_equal {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0233 0.0186 --- --- 0.012 0.012
broadcast_hypot {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0206 0.0184 0.0227 0.0218 0.012 0.012
broadcast_lesser {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0235 0.0185 --- --- 0.012 0.012
broadcast_lesser_equal {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0239 0.0199 --- --- 0.012 0.012
broadcast_logical_and {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0238 0.0189 --- --- 0.012 0.012
broadcast_logical_or {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.024 0.0184 --- --- 0.012 0.012
broadcast_logical_xor {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0236 0.018 --- --- 0.012 0.012
broadcast_maximum {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0253 0.0187 0.028 0.0214 0.012 0.012
broadcast_minimum {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0254 0.0187 0.0279 0.0214 0.012 0.012
broadcast_minus {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} --- --- --- --- 0.012 0.012
broadcast_mod {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0184 0.019 0.0212 0.0223 0.012 0.012
broadcast_mul {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0253 0.0187 0.0281 0.0216 0.012 0.012
broadcast_not_equal {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.018 0.0183 --- --- 0.012 0.012
broadcast_plus {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} --- --- --- --- 0.012 0.012
broadcast_power {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0184 0.0189 0.0217 0.0218 0.012 0.012
broadcast_sub {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0252 0.0187 0.0273 0.0206 0.012 0.012
cbrt {'data': (1024, 1024)} 0.0288 0.0289 0.0324 0.0325 2097.1521 2097.1521
cbrt {'data': (10000, 1)} 0.0176 0.0181 0.0171 0.0167 20.0 20.0
cbrt {'data': (10000, 100)} 0.0283 0.0286 0.0314 0.0309 2000.0 2000.0
ceil {'data': (1024, 1024)} 0.0292 0.0292 --- --- 2097.1521 2097.1521
ceil {'data': (10000, 1)} 0.0178 0.0177 --- --- 20.0 20.0
ceil {'data': (10000, 100)} 0.0285 0.0283 --- --- 2000.0 2000.0
cos {'data': (1024, 1024)} 0.0291 0.0311 0.0306 0.0323 2097.1521 2097.1521
cos {'data': (10000, 1)} 0.0176 0.0196 0.0163 0.0181 20.0 20.0
cos {'data': (10000, 100)} 0.0286 0.0302 0.03 0.0317 2000.0 2000.0
cosh {'data': (1024, 1024)} 0.0291 0.0294 0.0316 0.0313 2097.1521 2097.1521
cosh {'data': (10000, 1)} 0.0178 0.0179 0.0173 0.0171 20.0 20.0
cosh {'data': (10000, 100)} 0.0286 0.032 0.0308 0.0345 2000.0 2000.0
degrees {'data': (1024, 1024)} 0.0289 0.029 0.0256 0.0251 2097.1521 2097.1521
degrees {'data': (10000, 1)} 0.0176 0.0177 0.0172 0.0169 20.0 20.0
degrees {'data': (10000, 100)} 0.0286 0.0286 0.0245 0.0242 2000.0 2000.0
dot {'lhs': (1024, 1024), 'rhs': (1024, 1024)} 0.2165 0.2162 0.4085 0.4054 2097.1521 2097.1521
dot {'lhs': (1000, 10), 'transpose_b': True, 'rhs': (1000, 10)} 0.0308 0.0303 0.0604 0.0606 2000.0 2000.0
dot {'transpose_a': True, 'lhs': (1000, 1), 'transpose_b': True, 'rhs': (100, 1000)} 0.0421 0.041 0.0375 0.038 0.2 0.2
elemwise_add {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0186 0.0185 --- --- 0.012 0.012
elemwise_div {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0188 0.0181 --- --- 0.012 0.012
elemwise_mul {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0188 0.0181 --- --- 0.012 0.012
elemwise_sub {'lhs': [(1024, 1024), (10000, 10), (10000, 1)], 'rhs': [(1024, 1024), (10000, 10), (10000, 1)]} 0.0186 0.0181 --- --- 0.012 0.012
erf {'data': (1024, 1024)} 0.0287 0.0293 0.0307 0.0309 2097.1521 2097.1521
erf {'data': (10000, 1)} 0.017 0.0181 0.0164 0.0172 20.0 20.0
erf {'data': (10000, 100)} 0.0279 0.0286 0.0298 0.0306 2000.0 2000.0
exp {'data': (1024, 1024)} 0.0291 0.0294 --- --- 2097.1521 2097.1521
exp {'data': (10000, 1)} 0.0178 0.0178 --- --- 20.0 20.0
exp {'data': (10000, 100)} 0.0286 0.0286 --- --- 2000.0 2000.0
expm1 {'data': (1024, 1024)} 0.0293 0.0294 0.0312 0.0305 2097.1521 2097.1521
expm1 {'data': (10000, 1)} 0.0177 0.018 0.0169 0.0171 20.0 20.0
expm1 {'data': (10000, 100)} 0.0287 0.0286 0.0303 0.0305 2000.0 2000.0
fix {'data': (1024, 1024)} 0.029 0.0299 --- --- 4194.3042 2097.1521
fix {'data': (10000, 1)} 0.0177 0.0183 --- --- 20.0 20.0
fix {'data': (10000, 100)} 0.0284 0.0292 --- --- 2000.0 2000.0
flatten {'data': (1024, 1024)} --- --- --- --- 2097.1521 2097.1521
flatten {'data': (10000, 1)} --- --- --- --- 20.0 20.0
flatten {'data': (10000, 100)} --- --- --- --- 2000.0 2000.0
floor {'data': (1024, 1024)} 0.0289 0.0291 --- --- 2097.1521 2097.1521
floor {'data': (10000, 1)} 0.0175 0.0183 --- --- 20.0 20.0
floor {'data': (10000, 100)} 0.0282 0.0297 --- --- 2000.0 2000.0
gamma {'data': (1024, 1024)} 0.0323 0.0343 0.0683 0.0683 2097.1521 2097.1521
gamma {'data': (10000, 1)} 0.0181 0.0186 0.0197 0.0199 20.0 20.0
gamma {'data': (10000, 100)} 0.0321 0.032 0.0663 0.0656 2000.0 2000.0
gammaln {'data': (1024, 1024)} 0.0365 0.0371 0.0589 0.0593 2097.1521 2097.1521
gammaln {'data': (10000, 1)} 0.0181 0.0182 0.0189 0.0191 20.0 20.0
gammaln {'data': (10000, 100)} 0.037 0.036 0.0575 0.057 2000.0 2000.0
hard_sigmoid {'alpha': 0.25, 'beta': 0.5, 'data': (1024, 1024)} 0.0285 0.0294 0.0319 0.0325 2097.1521 2097.1521
hard_sigmoid {'alpha': 0.25, 'beta': 0.5, 'data': (10000, 1)} 0.0173 0.0183 0.0166 0.0167 20.0 20.0
hard_sigmoid {'alpha': 0.25, 'beta': 0.5, 'data': (10000, 100)} 0.0277 0.0286 0.0308 0.0309 2000.0 2000.0
identity {'data': (1024, 1024)} --- --- --- --- 2097.1521 2097.1521
identity {'data': (10000, 1)} --- --- --- --- 20.0 20.0
identity {'data': (10000, 100)} --- --- --- --- 2000.0 2000.0
log {'data': (1024, 1024)} 0.0289 0.0291 0.0325 0.0324 2097.1521 2097.1521
log {'data': (10000, 1)} 0.0175 0.0178 0.0169 0.0172 20.0 20.0
log {'data': (10000, 100)} 0.0286 0.0285 0.031 0.0314 2000.0 2000.0
log10 {'data': (1024, 1024)} 0.0287 0.0294 0.0321 0.0326 2097.1521 2097.1521
log10 {'data': (10000, 1)} 0.0174 0.0179 0.0165 0.0172 20.0 20.0
log10 {'data': (10000, 100)} 0.0282 0.0284 0.0307 0.0313 2000.0 2000.0
log1p {'data': (1024, 1024)} 0.029 0.0296 0.0323 0.0328 2097.1521 2097.1521
log1p {'data': (10000, 1)} 0.0178 0.0181 0.017 0.0175 20.0 20.0
log1p {'data': (10000, 100)} 0.0287 0.0287 0.0309 0.0316 2000.0 2000.0
log2 {'data': (1024, 1024)} 0.029 0.0291 0.0327 0.0326 2097.1521 2097.1521
log2 {'data': (10000, 1)} 0.0176 0.0179 0.0171 0.0169 20.0 20.0
log2 {'data': (10000, 100)} 0.0285 0.0287 0.0313 0.0315 2000.0 2000.0
log_softmax {'temperature': 0.5, 'axis': -1, 'data': (1024, 1024)} 0.0395 0.0406 0.0406 0.0367 2097.1521 2097.1521
log_softmax {'temperature': 0.5, 'axis': -1, 'data': (10000, 1)} 0.0604 0.0521 0.0443 0.0367 20.0 20.0
log_softmax {'temperature': 0.5, 'axis': -1, 'data': (10000, 100)} 0.0852 0.0773 0.0605 0.0526 2000.0 2000.0
logical_not {'data': (1024, 1024)} 0.0292 0.0285 --- --- 2097.1521 2097.1521
logical_not {'data': (10000, 1)} 0.0177 0.017 --- --- 20.0 20.0
logical_not {'data': (10000, 100)} 0.0284 0.0288 --- --- 2000.0 2000.0
make_loss {'data': (1024, 1024)} 0.0316 0.0326 --- --- 2097.1521 2097.1521
make_loss {'data': (10000, 1)} 0.0197 0.02 --- --- 20.0 20.0
make_loss {'data': (10000, 100)} 0.034 0.0341 --- --- 2000.0 2000.0
max {'axis': (), 'data': (1024, 1024)} 0.0693 0.0643 0.0453 0.0973 0.002 0.002
max {'axis': 0, 'data': (10000, 1)} 0.0303 0.0307 0.02 0.0204 0.002 0.002
max {'axis': (0, 1), 'data': (10000, 100)} 0.068 0.0639 0.0444 0.0938 0.002 0.002
max_axis {'axis': (), 'data': (1024, 1024)} 0.0626 0.0545 --- --- 0.002 0.002
max_axis {'axis': 0, 'data': (10000, 1)} 0.0311 0.0298 --- --- 0.002 0.002
max_axis {'axis': (0, 1), 'data': (10000, 100)} 0.062 0.0539 --- --- 0.002 0.002
mean {'axis': (), 'data': (1024, 1024)} 0.0856 0.077 0.0417 0.04 0.002 0.002
mean {'axis': 0, 'data': (10000, 1)} 0.0359 0.0351 0.0223 0.0236 0.002 0.002
mean {'axis': (0, 1), 'data': (10000, 100)} 0.0834 0.0756 0.0403 0.0394 0.004 0.002
min {'axis': (), 'data': (1024, 1024)} 0.0724 0.064 0.0461 0.0976 0.002 0.002
min {'axis': 0, 'data': (10000, 1)} 0.0325 0.0304 0.0209 0.0206 0.002 0.002
min {'axis': (0, 1), 'data': (10000, 100)} 0.0716 0.0633 0.0453 0.0943 0.002 0.002
min_axis {'axis': (), 'data': (1024, 1024)} 0.0622 0.0547 --- --- 0.002 0.002
min_axis {'axis': 0, 'data': (10000, 1)} 0.0304 0.0298 --- --- 0.002 0.002
min_axis {'axis': (0, 1), 'data': (10000, 100)} 0.0615 0.0542 --- --- 0.002 0.002
nanprod {'axis': (), 'data': (1024, 1024)} 0.0726 0.0681 0.0503 0.098 0.002 0.002
nanprod {'axis': 0, 'data': (10000, 1)} 0.0315 0.0314 0.0209 0.0204 0.002 0.002
nanprod {'axis': (0, 1), 'data': (10000, 100)} 0.0714 0.0674 0.0491 0.0943 0.002 0.002
nansum {'axis': (), 'data': (1024, 1024)} 0.0764 0.0707 0.0362 0.0984 0.002 0.002
nansum {'axis': 0, 'data': (10000, 1)} 0.0341 0.0338 0.0203 0.0205 0.002 0.002
nansum {'axis': (0, 1), 'data': (10000, 100)} 0.0753 0.0699 0.0359 0.095 0.002 0.002
negative {'data': (1024, 1024)} 0.0285 0.0291 --- --- 2097.1521 2097.1521
negative {'data': (10000, 1)} 0.0175 0.0174 --- --- 20.0 20.0
negative {'data': (10000, 100)} 0.0281 0.0285 --- --- 2000.0 2000.0
ones_like {'data': (1024, 1024)} 0.0248 0.0248 --- --- 2097.1521 2097.1521
ones_like {'data': (10000, 1)} 0.0175 0.0174 --- --- 20.0 20.0
ones_like {'data': (10000, 100)} 0.0246 0.0243 --- --- 4000.0 2000.0
prod {'axis': (), 'data': (1024, 1024)} 0.0691 0.0651 0.0493 0.0985 0.002 0.002
prod {'axis': 0, 'data': (10000, 1)} 0.0307 0.0309 0.0207 0.0207 0.002 0.002
prod {'axis': (0, 1), 'data': (10000, 100)} 0.0679 0.0633 0.0493 0.0947 0.002 0.002
radians {'data': (1024, 1024)} 0.029 0.0289 0.0255 0.0252 2097.1521 2097.1521
radians {'data': (10000, 1)} 0.0176 0.0179 0.0171 0.0172 20.0 20.0
radians {'data': (10000, 100)} 0.0286 0.0286 0.0237 0.0243 2000.0 2000.0
random_exponential {'shape': (1024, 1024)} 18.9581 15.748 --- --- 2097.1521 2097.1521
random_exponential {'shape': (10000, 1)} 0.1939 0.1881 --- --- 20.0 20.0
random_exponential {'shape': (10000, 100)} 18.1409 15.0313 --- --- 2000.0 2000.0
random_gamma {'shape': (1024, 1024)} 57.6804 46.4024 --- --- 2097.1521 2097.1521
random_gamma {'shape': (10000, 1)} 0.5812 0.4907 --- --- 20.0 20.0
random_gamma {'shape': (10000, 100)} 55.0833 44.321 --- --- 2000.0 2000.0
random_generalized_negative_binomial {'shape': (1024, 1024)} 71.1868 63.0437 --- --- 2097.1521 2097.1521
random_generalized_negative_binomial {'shape': (10000, 1)} 0.7127 0.6538 --- --- 20.0 20.0
random_generalized_negative_binomial {'shape': (10000, 100)} 68.1477 60.159 --- --- 2000.0 2000.0
random_negative_binomial {'p': 1, 'k': 1, 'shape': (1024, 1024)} 61.784 52.0455 --- --- 2097.1521 2097.1521
random_negative_binomial {'p': 1, 'k': 1, 'shape': (10000, 1)} 0.62 0.548 --- --- 20.0 20.0
random_negative_binomial {'p': 1, 'k': 1, 'shape': (10000, 100)} 59.2521 49.6821 --- --- 2000.0 2000.0
random_normal {'shape': (1024, 1024)} 19.0541 16.3413 --- --- 2097.1521 2097.1521
random_normal {'shape': (10000, 1)} 0.2101 0.1922 --- --- 20.0 20.0
random_normal {'shape': (10000, 100)} 18.2038 15.5969 --- --- 2000.0 2000.0
random_poisson {'shape': (1024, 1024)} 18.8666 15.4039 --- --- 2097.1521 2097.1521
random_poisson {'shape': (10000, 1)} 0.2039 0.1787 --- --- 20.0 20.0
random_poisson {'shape': (10000, 100)} 18.2745 14.9178 --- --- 2000.0 2000.0
random_randint {'high': 5, 'low': 0, 'shape': (1024, 1024)} 10.889 9.045 --- --- 2097.1521 2097.1521
random_randint {'high': 5, 'low': 0, 'shape': (10000, 1)} 0.1155 0.1178 --- --- 20.0 20.0
random_randint {'high': 5, 'low': 0, 'shape': (10000, 100)} 10.4834 8.6332 --- --- 2000.0 2000.0
random_uniform {'high': 5, 'low': 0, 'shape': (1024, 1024)} 6.141 3.1128 --- --- 2097.1521 2097.1521
random_uniform {'high': 5, 'low': 0, 'shape': (10000, 1)} 0.0798 0.0632 --- --- 20.0 20.0
random_uniform {'high': 5, 'low': 0, 'shape': (10000, 100)} 6.0672 2.9739 --- --- 2000.0 2000.0
rcbrt {'data': (1024, 1024)} 0.0294 0.0298 0.0317 0.0319 2097.1521 2097.1521
rcbrt {'data': (10000, 1)} 0.0177 0.0179 0.0165 0.0172 20.0 20.0
rcbrt {'data': (10000, 100)} 0.0285 0.0286 0.0298 0.0311 2000.0 2000.0
reciprocal {'data': (1024, 1024)} 0.0285 0.0298 0.0323 0.0325 2097.1521 2097.1521
reciprocal {'data': (10000, 1)} 0.0174 0.0182 0.0166 0.0174 20.0 20.0
reciprocal {'data': (10000, 100)} 0.028 0.0284 0.0308 0.0314 2000.0 2000.0
relu {'data': (1024, 1024)} 0.0283 0.029 0.0306 0.0305 2097.1521 2097.1521
relu {'data': (10000, 1)} 0.0175 0.0177 0.0168 0.0171 20.0 20.0
relu {'data': (10000, 100)} 0.0279 0.0281 0.0296 0.0297 2000.0 2000.0
rint {'data': (1024, 1024)} 0.0292 0.0296 --- --- 2097.1521 2097.1521
rint {'data': (10000, 1)} 0.0178 0.0186 --- --- 20.0 20.0
rint {'data': (10000, 100)} 0.0287 0.0292 --- --- 2000.0 2000.0
round {'data': (1024, 1024)} 0.0282 0.029 --- --- 2097.1521 2097.1521
round {'data': (10000, 1)} 0.0172 0.0177 --- --- 20.0 20.0
round {'data': (10000, 100)} 0.0276 0.0284 --- --- 2000.0 2000.0
rsqrt {'data': (1024, 1024)} 0.0292 0.0314 0.0343 0.0357 2097.1521 2097.1521
rsqrt {'data': (10000, 1)} 0.0174 0.0201 0.0172 0.0185 20.0 20.0
rsqrt {'data': (10000, 100)} 0.029 0.0308 0.0328 0.034 2000.0 2000.0
sample_exponential {'lam': [1.0, 8.5], 'shape': (1024, 1024)} 0.1351 0.1445 --- --- 4194.3042 4194.3042
sample_exponential {'lam': [1.0, 8.5], 'shape': (10000, 1)} 0.0752 0.0819 --- --- 40.0 40.0
sample_exponential {'lam': [1.0, 8.5], 'shape': (10000, 100)} 0.1329 0.1421 --- --- 4000.0 4000.0
sample_gamma {'alpha': [0.0, 2.5], 'beta': [1.0, 0.7], 'shape': (1024, 1024)} 0.3482 0.3107 --- --- 4194.3042 4194.3042
sample_gamma {'alpha': [0.0, 2.5], 'beta': [1.0, 0.7], 'shape': (10000, 1)} 0.2858 0.2473 --- --- 40.0 40.0
sample_gamma {'alpha': [0.0, 2.5], 'beta': [1.0, 0.7], 'shape': (10000, 100)} 0.346 0.3003 --- --- 4000.0 4000.0
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'alpha': [0.0, 2.5], 'shape': (1024, 1024)} 1.0807 1.0976 --- --- 4194.3042 4194.3042
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'alpha': [0.0, 2.5], 'shape': (10000, 1)} 0.5691 0.5931 --- --- 40.0 40.0
sample_generalized_negative_binomial {'mu': [2.0, 2.5], 'alpha': [0.0, 2.5], 'shape': (10000, 100)} 0.9628 0.9685 --- --- 8000.0 4000.0
sample_negative_binomial {'p': [0.4, 0.77], 'k': [20, 49], 'shape': (1024, 1024)} 1.3997 1.4196 --- --- 4194.3042 4194.3042
sample_negative_binomial {'p': [0.4, 0.77], 'k': [20, 49], 'shape': (10000, 1)} 0.8096 0.8071 --- --- 40.0 40.0
sample_negative_binomial {'p': [0.4, 0.77], 'k': [20, 49], 'shape': (10000, 100)} 1.3842 1.4067 --- --- 4000.0 8000.0
sample_normal {'mu': [2.0, 2.5], 'sigma': [1.0, 3.7], 'shape': (1024, 1024)} 0.1451 0.1323 --- --- 8388.6084 4194.3042
sample_normal {'mu': [2.0, 2.5], 'sigma': [1.0, 3.7], 'shape': (10000, 1)} 0.0538 0.0834 --- --- 40.0 40.0
sample_normal {'mu': [2.0, 2.5], 'sigma': [1.0, 3.7], 'shape': (10000, 100)} 0.1429 0.1298 --- --- 8000.0 4000.0
sample_poisson {'lam': [1.0, 8.5], 'shape': (1024, 1024)} 0.402 0.4752 --- --- 4194.3042 4194.3042
sample_poisson {'lam': [1.0, 8.5], 'shape': (10000, 1)} 0.3301 0.4037 --- --- 40.0 40.0
sample_poisson {'lam': [1.0, 8.5], 'shape': (10000, 100)} 0.4018 0.4745 --- --- 8000.0 4000.0
sample_uniform {'high': [1.0, 3.7], 'low': [0.0, 2.5], 'shape': (1024, 1024)} 0.1466 0.1485 --- --- 4194.3042 4194.3042
sample_uniform {'high': [1.0, 3.7], 'low': [0.0, 2.5], 'shape': (10000, 1)} 0.0683 0.0584 --- --- 40.0 40.0
sample_uniform {'high': [1.0, 3.7], 'low': [0.0, 2.5], 'shape': (10000, 100)} 0.144 0.1426 --- --- 4000.0 8000.0
shuffle {'data': (1024, 1024)} 0.0826 0.086 --- --- 2097.1521 2097.1521
shuffle {'data': (10000, 1)} 0.4134 0.2593 --- --- 100.0 60.0
shuffle {'data': (10000, 100)} 0.4204 0.2638 --- --- 2000.0 2000.0
sigmoid {'data': (1024, 1024)} 0.029 0.0328 0.0309 0.0341 2097.1521 2097.1521
sigmoid {'data': (10000, 1)} 0.0177 0.0182 0.0171 0.0175 20.0 20.0
sigmoid {'data': (10000, 100)} 0.0285 0.0286 0.03 0.03 2000.0 2000.0
sign {'data': (1024, 1024)} 0.0291 0.029 0.025 0.0252 2097.1521 2097.1521
sign {'data': (10000, 1)} 0.0175 0.0179 0.0162 0.017 20.0 20.0
sign {'data': (10000, 100)} 0.0285 0.0285 0.0234 0.0243 2000.0 2000.0
sin {'data': (1024, 1024)} 0.0298 0.0301 0.032 0.0314 2097.1521 2097.1521
sin {'data': (10000, 1)} 0.0253 0.0184 0.0176 0.0172 20.0 20.0
sin {'data': (10000, 100)} 0.0289 0.0291 0.031 0.031 2000.0 2000.0
sinh {'data': (1024, 1024)} 0.0295 0.0314 0.0311 0.032 2097.1521 2097.1521
sinh {'data': (10000, 1)} 0.0177 0.0198 0.0169 0.0182 20.0 20.0
sinh {'data': (10000, 100)} 0.0286 0.0298 0.0303 0.0315 2000.0 2000.0
size_array {'data': (1024, 1024)} 0.0201 0.0159 --- --- 0.004 0.004
size_array {'data': (10000, 1)} 0.0284 0.0228 --- --- 0.004 0.004
size_array {'data': (10000, 100)} 0.0199 0.016 --- --- 0.004 0.004
softmax {'temperature': 0.5, 'axis': -1, 'data': (1024, 1024)} 0.0431 0.0435 0.0448 0.0423 2097.1521 2097.1521
softmax {'temperature': 0.5, 'axis': -1, 'data': (10000, 1)} 0.0605 0.0531 0.0432 0.0362 20.0 20.0
softmax {'temperature': 0.5, 'axis': -1, 'data': (10000, 100)} 0.0856 0.0776 0.0606 0.0595 2000.0 2000.0
softsign {'data': (1024, 1024)} 0.0291 0.029 0.0315 0.0324 2097.1521 2097.1521
softsign {'data': (10000, 1)} 0.0179 0.0177 0.0161 0.0171 20.0 20.0
softsign {'data': (10000, 100)} 0.0285 0.0283 0.0301 0.0313 2000.0 2000.0
sqrt {'data': (1024, 1024)} 0.0293 0.0291 0.0337 0.0334 2097.1521 2097.1521
sqrt {'data': (10000, 1)} 0.0183 0.0179 0.0177 0.0174 20.0 20.0
sqrt {'data': (10000, 100)} 0.0289 0.0285 0.0323 0.0321 2000.0 2000.0
square {'data': (1024, 1024)} 0.0285 0.0292 0.0306 0.0309 2097.1521 2097.1521
square {'data': (10000, 1)} 0.0173 0.0177 0.0166 0.0174 20.0 20.0
square {'data': (10000, 100)} 0.0285 0.028 0.03 0.0303 2000.0 2000.0
stop_gradient {'data': (1024, 1024)} --- --- --- --- 2097.1521 2097.1521
stop_gradient {'data': (10000, 1)} --- --- --- --- 40.0 20.0
stop_gradient {'data': (10000, 100)} --- --- --- --- 2000.0 2000.0
sum {'axis': (), 'data': (1024, 1024)} 0.0836 0.0752 0.0328 0.0329 0.002 0.002
sum {'axis': 0, 'data': (10000, 1)} 0.0334 0.0344 0.0184 0.0197 0.002 0.002
sum {'axis': (0, 1), 'data': (10000, 100)} 0.0811 0.0737 0.0321 0.0319 0.002 0.002
sum_axis {'axis': (), 'data': (1024, 1024)} 0.0681 0.0616 --- --- 0.002 0.002
sum_axis {'axis': 0, 'data': (10000, 1)} 0.033 0.0328 --- --- 0.002 0.002
sum_axis {'axis': (0, 1), 'data': (10000, 100)} 0.0673 0.0611 --- --- 0.002 0.002
tan {'data': (1024, 1024)} 0.0304 0.0303 0.031 0.0305 2097.1521 2097.1521
tan {'data': (10000, 1)} 0.0178 0.0177 0.0168 0.017 20.0 20.0
tan {'data': (10000, 100)} 0.0299 0.0296 0.0301 0.0301 2000.0 2000.0
tanh {'data': (1024, 1024)} 0.0292 0.0301 0.0311 0.031 2097.1521 2097.1521
tanh {'data': (10000, 1)} 0.0177 0.0255 0.017 0.0174 20.0 20.0
tanh {'data': (10000, 100)} 0.0285 0.029 0.0301 0.0302 2000.0 2000.0
trunc {'data': (1024, 1024)} 0.0292 0.0297 --- --- 2097.1521 2097.1521
trunc {'data': (10000, 1)} 0.0179 0.0184 --- --- 20.0 20.0
trunc {'data': (10000, 100)} 0.0288 0.0292 --- --- 2000.0 2000.0
zeros_like {'data': (1024, 1024)} 0.025 0.0254 --- --- 2097.1521 2097.1521
zeros_like {'data': (10000, 1)} 0.0179 0.0185 --- --- 20.0 20.0
zeros_like {'data': (10000, 100)} 0.025 0.0252 --- --- 4000.0 2000.0
@aaronmarkham
Copy link

It's alarming how different these results are for many ops.
Would be nice to see a & diff % column. Can you supply an xls or csv for easy analysis?

@TaoLv
Copy link

TaoLv commented Jun 30, 2019

Is MKL-DNN used to get the performance on CPU?

@sandeep-krishnamurthy
Copy link
Author

Is MKL-DNN used to get the performance on CPU?

Yes these numbers are using MXNet-MKL

@sandeep-krishnamurthy
Copy link
Author

It's alarming how different these results are for many ops.
Would be nice to see a & diff % column. Can you supply an xls or csv for easy analysis?

Thank for doing with excel sheet for now. Will update the tool to do this automatically going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment