- The tests were run on 4 independent machines with the same specs
- Each machine has multiple Xeon Platinum 8168 CPUs
- Python: 3.6, NumPy: 1.16, Intel MKL: 2019.4
- Either
MKL_ENABLE_INSTRUCTIONS
orMKL_CBWR
was set for each test MKL_DYNAMIC
was set to FALSE for all testsMKL_NUM_THREADS
was set to either 1, 8 or 16- Each test was run for float32 (
f4
) and float64 (f8
) inputs - All inputs were pre-aligned to 512-bit boundary (for AVX-512)
- Each test marked as small=0 was run
25 * num_threads
times - Each test marked as small=1 was run
250 * num_threads
times - For each test, the median elapsed time was recorded for each machine
- Median times were then averaged across machines for each test
dot
op:np.dot()
was called without any alterationsgemm
op:blas.dgemm()
was called (arguments rearranged to matchnp.dot
)gemm-strict
op - same asgemm
but with MKL Strict CNR mode enabled
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedw
: (150,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 278.52 | 292.83 | - |
avx | Af-w | 247.07 | 276.66 | - |
avx | w-AcT | 333.17 | 401.95 | - |
avx | w-AfT | 182.39 | 191.47 | - |
avx2 | Ac-w | 225.19 | 226.94 | 315.4252 |
avx2 | Af-w | 251.00 | 211.06 | 558.11 |
avx2 | w-AcT | 306.19 | 240.45 | 304.9262 |
avx2 | w-AfT | 260.30 | 174.11 | 359.1774 |
avx512 | Ac-w | 187.97 | 186.89 | 351.1422 |
avx512 | Af-w | 255.45 | 172.14 | 430.1841 |
avx512 | w-AcT | 199.07 | 204.67 | 275.3029 |
avx512 | w-AfT | 156.65 | 163.57 | 333.9329 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedw
: (150,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 53.18 | 55.34 | - |
avx | Af-w | 34.53 | 37.49 | - |
avx | w-AcT | 62.42 | 55.15 | - |
avx | w-AfT | 35.37 | 38.60 | - |
avx2 | Ac-w | 41.14 | 42.85 | 59.9105 |
avx2 | Af-w | 31.07 | 49.07 | 67.6441 |
avx2 | w-AcT | 56.51 | 44.37 | 46.4869 |
avx2 | w-AfT | 34.69 | 35.70 | 68.4242 |
avx512 | Ac-w | 38.03 | 39.92 | 61.5696 |
avx512 | Af-w | 34.80 | 46.74 | 89.0299 |
avx512 | w-AcT | 44.63 | 43.11 | 72.8658 |
avx512 | w-AfT | 36.88 | 33.69 | 124.8525 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedw
: (150,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 29.94 | 31.84 | - |
avx | Af-w | 21.29 | 26.94 | - |
avx | w-AcT | 41.95 | 28.63 | - |
avx | w-AfT | 26.93 | 24.02 | - |
avx2 | Ac-w | 25.05 | 25.42 | 37.4955 |
avx2 | Af-w | 22.75 | 34.92 | 46.9344 |
avx2 | w-AcT | 34.64 | 23.10 | 28.3142 |
avx2 | w-AfT | 23.87 | 24.40 | 41.3183 |
avx512 | Ac-w | 23.64 | 24.66 | 40.7913 |
avx512 | Af-w | 25.01 | 25.53 | 41.4362 |
avx512 | w-AcT | 35.85 | 30.62 | 82.5378 |
avx512 | w-AfT | 31.93 | 23.21 | 128.4683 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedw
: (150,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 180.10 | 143.40 | - |
avx | Af-w | 80.08 | 135.02 | - |
avx | w-AcT | 199.21 | 196.09 | - |
avx | w-AfT | 83.04 | 83.06 | - |
avx2 | Ac-w | 152.79 | 115.79 | 252.4276 |
avx2 | Af-w | 134.94 | 84.69 | 233.166 |
avx2 | w-AcT | 167.70 | 120.72 | 191.1838 |
avx2 | w-AfT | 86.29 | 86.83 | 165.9119 |
avx512 | Ac-w | 126.44 | 130.84 | 206.7101 |
avx512 | Af-w | 76.48 | 76.74 | 244.271 |
avx512 | w-AcT | 182.31 | 130.66 | 128.2289 |
avx512 | w-AfT | 78.64 | 75.68 | 164.6559 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedw
: (150,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 27.22 | 26.22 | - |
avx | Af-w | 15.59 | 19.21 | - |
avx | w-AcT | 28.60 | 28.18 | - |
avx | w-AfT | 12.96 | 21.34 | - |
avx2 | Ac-w | 22.89 | 22.39 | 36.284 |
avx2 | Af-w | 20.24 | 17.54 | 42.0482 |
avx2 | w-AcT | 23.32 | 21.47 | 23.674 |
avx2 | w-AfT | 15.05 | 17.73 | 33.7815 |
avx512 | Ac-w | 26.33 | 23.56 | 33.5469 |
avx512 | Af-w | 12.20 | 15.81 | 43.6333 |
avx512 | w-AcT | 24.98 | 23.97 | 47.5395 |
avx512 | w-AfT | 16.93 | 12.48 | 46.1854 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedw
: (150,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 15.79 | 15.49 | - |
avx | Af-w | 12.97 | 15.59 | - |
avx | w-AcT | 18.08 | 14.08 | - |
avx | w-AfT | 9.31 | 10.15 | - |
avx2 | Ac-w | 16.62 | 13.06 | 20.6012 |
avx2 | Af-w | 11.00 | 13.23 | 26.0878 |
avx2 | w-AcT | 12.88 | 12.01 | 15.5666 |
avx2 | w-AfT | 14.57 | 9.91 | 19.25 |
avx512 | Ac-w | 17.61 | 15.65 | 18.7627 |
avx512 | Af-w | 13.45 | 16.86 | 26.3501 |
avx512 | w-AcT | 15.16 | 15.80 | 46.3575 |
avx512 | w-AfT | 8.91 | 15.15 | 43.3421 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedw
: (15,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 0.8597 | 0.8758 | - |
avx | Af-w | 0.9394 | 0.9731 | - |
avx | w-AcT | 0.9364 | 0.8839 | - |
avx | w-AfT | 0.9435 | 0.9509 | - |
avx2 | Ac-w | 0.8725 | 0.8707 | 1.1811 |
avx2 | Af-w | 1.0650 | 1.0676 | 0.9959 |
avx2 | w-AcT | 0.9238 | 0.8923 | 1.5117 |
avx2 | w-AfT | 1.0314 | 1.0499 | 1.0641 |
avx512 | Ac-w | 1.0768 | 0.9760 | 1.8678 |
avx512 | Af-w | 1.0759 | 1.0974 | 1.4689 |
avx512 | w-AcT | 1.0823 | 1.0121 | 1.4306 |
avx512 | w-AfT | 1.0760 | 1.0874 | 0.9563 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedw
: (15,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 0.2103 | 0.2854 | - |
avx | Af-w | 0.2105 | 0.2429 | - |
avx | w-AcT | 0.2830 | 0.2816 | - |
avx | w-AfT | 0.2939 | 0.2151 | - |
avx2 | Ac-w | 0.2108 | 0.2913 | 0.248 |
avx2 | Af-w | 0.2193 | 0.2544 | 0.2048 |
avx2 | w-AcT | 0.2828 | 0.2904 | 0.2813 |
avx2 | w-AfT | 0.3200 | 0.3285 | 0.2229 |
avx512 | Ac-w | 0.2396 | 0.3121 | 0.3481 |
avx512 | Af-w | 0.2394 | 0.2472 | 0.2849 |
avx512 | w-AcT | 0.3071 | 0.3164 | 0.4215 |
avx512 | w-AfT | 0.3076 | 0.2744 | 0.3433 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedw
: (15,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 0.1583 | 0.2518 | - |
avx | Af-w | 0.1597 | 0.2491 | - |
avx | w-AcT | 0.2880 | 0.1633 | - |
avx | w-AfT | 0.2592 | 0.1659 | - |
avx2 | Ac-w | 0.1612 | 0.1690 | 0.1488 |
avx2 | Af-w | 0.1760 | 0.1891 | 0.1593 |
avx2 | w-AcT | 0.2427 | 0.1669 | 0.1705 |
avx2 | w-AfT | 0.2480 | 0.1802 | 0.1585 |
avx512 | Ac-w | 0.1881 | 0.2303 | 0.2022 |
avx512 | Af-w | 0.1957 | 0.2170 | 0.1945 |
avx512 | w-AcT | 0.2015 | 0.1974 | 0.4064 |
avx512 | w-AfT | 0.2053 | 0.2038 | 0.3159 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedw
: (15,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 0.8495 | 0.8581 | - |
avx | Af-w | 0.4742 | 0.4859 | - |
avx | w-AcT | 0.8485 | 0.8526 | - |
avx | w-AfT | 0.4745 | 0.4835 | - |
avx2 | Ac-w | 0.4628 | 0.4592 | 1.2122 |
avx2 | Af-w | 0.4950 | 0.5033 | 1.0467 |
avx2 | w-AcT | 0.4542 | 0.4630 | 0.8676 |
avx2 | w-AfT | 0.4921 | 0.5007 | 0.596 |
avx512 | Ac-w | 0.6473 | 0.6624 | 1.4192 |
avx512 | Af-w | 0.4983 | 0.5091 | 1.1797 |
avx512 | w-AcT | 0.6514 | 0.6616 | 0.8051 |
avx512 | w-AfT | 0.4975 | 0.5084 | 0.4723 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedw
: (15,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 0.1769 | 0.1729 | - |
avx | Af-w | 0.1249 | 0.1199 | - |
avx | w-AcT | 0.1793 | 0.1853 | - |
avx | w-AfT | 0.1254 | 0.1597 | - |
avx2 | Ac-w | 0.1309 | 0.1259 | 0.1943 |
avx2 | Af-w | 0.1235 | 0.1202 | 0.1736 |
avx2 | w-AcT | 0.1326 | 0.1857 | 0.1695 |
avx2 | w-AfT | 0.1211 | 0.1543 | 0.1338 |
avx512 | Ac-w | 0.1671 | 0.1764 | 0.2275 |
avx512 | Af-w | 0.1202 | 0.1287 | 0.2022 |
avx512 | w-AcT | 0.1528 | 0.1957 | 0.2482 |
avx512 | w-AfT | 0.1342 | 0.1309 | 0.155 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedw
: (15,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Ac-w | 0.1258 | 0.1290 | - |
avx | Af-w | 0.0963 | 0.0968 | - |
avx | w-AcT | 0.1183 | 0.1619 | - |
avx | w-AfT | 0.0973 | 0.0966 | - |
avx2 | Ac-w | 0.0885 | 0.1125 | 0.1175 |
avx2 | Af-w | 0.0947 | 0.0935 | 0.1109 |
avx2 | w-AcT | 0.0939 | 0.1328 | 0.114 |
avx2 | w-AfT | 0.1228 | 0.1171 | 0.1278 |
avx512 | Ac-w | 0.1152 | 0.1273 | 0.1445 |
avx512 | Af-w | 0.1018 | 0.1073 | 0.1342 |
avx512 | w-AcT | 0.1137 | 0.2005 | 0.2601 |
avx512 | w-AfT | 0.1376 | 0.1400 | 0.2527 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedy
: (1500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 375.88 | 286.77 | - |
avx | AfT-y | 185.65 | 192.24 | - |
avx | y-Ac | 302.22 | 269.70 | - |
avx | y-Af | 186.64 | 184.56 | - |
avx2 | AcT-y | 225.73 | 214.31 | 464.2119 |
avx2 | AfT-y | 245.28 | 187.57 | 257.0441 |
avx2 | y-Ac | 227.87 | 232.95 | 398.9069 |
avx2 | y-Af | 177.51 | 187.09 | 215.1909 |
avx512 | AcT-y | 190.52 | 184.97 | 430.6951 |
avx512 | AfT-y | 150.14 | 155.64 | 274.4005 |
avx512 | y-Ac | 262.89 | 184.00 | 399.9746 |
avx512 | y-Af | 151.56 | 151.42 | 204.8347 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedy
: (1500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 72.51 | 57.69 | - |
avx | AfT-y | 37.06 | 57.96 | - |
avx | y-Ac | 56.33 | 60.31 | - |
avx | y-Af | 37.53 | 35.46 | - |
avx2 | AcT-y | 54.35 | 54.04 | 131.7934 |
avx2 | AfT-y | 32.51 | 37.91 | 51.929 |
avx2 | y-Ac | 50.19 | 54.50 | 151.298 |
avx2 | y-Af | 37.94 | 34.71 | 40.8832 |
avx512 | AcT-y | 55.61 | 52.77 | 126.3179 |
avx512 | AfT-y | 34.28 | 30.66 | 52.8289 |
avx512 | y-Ac | 50.71 | 47.45 | 139.7909 |
avx512 | y-Af | 34.32 | 31.92 | 40.3422 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedy
: (1500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 41.01 | 38.55 | - |
avx | AfT-y | 29.84 | 34.41 | - |
avx | y-Ac | 32.38 | 38.50 | - |
avx | y-Af | 22.83 | 22.11 | - |
avx2 | AcT-y | 27.64 | 25.57 | 97.4087 |
avx2 | AfT-y | 31.53 | 30.76 | 39.9516 |
avx2 | y-Ac | 35.63 | 29.91 | 105.489 |
avx2 | y-Af | 32.59 | 22.54 | 25.4169 |
avx512 | AcT-y | 30.92 | 21.10 | 85.5452 |
avx512 | AfT-y | 22.67 | 32.84 | 44.1593 |
avx512 | y-Ac | 34.40 | 33.87 | 96.9224 |
avx512 | y-Af | 33.44 | 21.84 | 31.5531 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedy
: (1500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 129.34 | 126.08 | - |
avx | AfT-y | 95.86 | 91.45 | - |
avx | y-Ac | 131.97 | 129.77 | - |
avx | y-Af | 127.58 | 92.65 | - |
avx2 | AcT-y | 141.98 | 100.37 | 228.5094 |
avx2 | AfT-y | 91.16 | 92.79 | 243.4499 |
avx2 | y-Ac | 105.25 | 102.64 | 131.5789 |
avx2 | y-Af | 86.90 | 90.44 | 140.4917 |
avx512 | AcT-y | 101.38 | 101.59 | 238.6456 |
avx512 | AfT-y | 83.84 | 83.96 | 169.2152 |
avx512 | y-Ac | 101.85 | 103.45 | 129.4417 |
avx512 | y-Af | 121.63 | 78.41 | 187.0246 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedy
: (1500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 31.56 | 29.80 | - |
avx | AfT-y | 25.76 | 17.96 | - |
avx | y-Ac | 35.22 | 20.67 | - |
avx | y-Af | 16.46 | 20.86 | - |
avx2 | AcT-y | 34.10 | 29.16 | 61.767 |
avx2 | AfT-y | 23.74 | 18.97 | 33.0189 |
avx2 | y-Ac | 24.00 | 29.31 | 91.0718 |
avx2 | y-Af | 15.78 | 16.60 | 27.9925 |
avx512 | AcT-y | 33.97 | 25.91 | 91.5085 |
avx512 | AfT-y | 23.10 | 18.32 | 31.8801 |
avx512 | y-Ac | 26.59 | 27.16 | 91.2677 |
avx512 | y-Af | 16.28 | 20.25 | 26.5968 |
Ac
: (1500000, 150) matrix, C-contiguousAcT
: (1500000, 150) matrix, C-contiguous, transposedAf
: (1500000, 150) matrix, F-contiguousAfT
: (1500000, 150) matrix, F-contiguous, transposedy
: (1500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 24.31 | 16.02 | - |
avx | AfT-y | 17.60 | 14.27 | - |
avx | y-Ac | 20.07 | 21.62 | - |
avx | y-Af | 10.54 | 17.72 | - |
avx2 | AcT-y | 18.87 | 14.36 | 57.865 |
avx2 | AfT-y | 15.81 | 16.70 | 26.4263 |
avx2 | y-Ac | 14.64 | 17.15 | 92.8601 |
avx2 | y-Af | 15.24 | 13.72 | 18.8649 |
avx512 | AcT-y | 13.59 | 15.01 | 58.7211 |
avx512 | AfT-y | 16.94 | 19.49 | 25.3721 |
avx512 | y-Ac | 20.83 | 15.26 | 72.0012 |
avx512 | y-Af | 10.97 | 14.51 | 19.9279 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedy
: (150000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 0.7522 | 0.8136 | - |
avx | AfT-y | 0.9217 | 0.9243 | - |
avx | y-Ac | 0.8121 | 0.8248 | - |
avx | y-Af | 0.9193 | 0.9346 | - |
avx2 | AcT-y | 0.7827 | 0.8042 | 2.0015 |
avx2 | AfT-y | 0.9606 | 0.9718 | 1.633 |
avx2 | y-Ac | 0.8590 | 0.9023 | 1.6313 |
avx2 | y-Af | 0.9764 | 0.9727 | 1.4566 |
avx512 | AcT-y | 0.8166 | 0.9619 | 2.9684 |
avx512 | AfT-y | 0.9460 | 0.9580 | 2.2258 |
avx512 | y-Ac | 0.8785 | 0.9055 | 2.0547 |
avx512 | y-Af | 0.9444 | 0.9565 | 1.6849 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedy
: (150000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 0.7480 | 0.5035 | - |
avx | AfT-y | 0.1042 | 0.1111 | - |
avx | y-Ac | 0.7770 | 0.6127 | - |
avx | y-Af | 0.1109 | 0.1116 | - |
avx2 | AcT-y | 1.0178 | 0.6397 | 0.9854 |
avx2 | AfT-y | 0.1011 | 0.1087 | 0.7269 |
avx2 | y-Ac | 0.8044 | 0.7367 | 0.9432 |
avx2 | y-Af | 0.1032 | 0.1080 | 0.7 |
avx512 | AcT-y | 0.5493 | 0.8249 | 0.9062 |
avx512 | AfT-y | 0.1073 | 0.1252 | 0.433 |
avx512 | y-Ac | 0.5462 | 0.9861 | 0.8636 |
avx512 | y-Af | 0.1080 | 0.1156 | 0.4184 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedy
: (150000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 0.2594 | 0.2844 | - |
avx | AfT-y | 0.0520 | 0.0601 | - |
avx | y-Ac | 0.2557 | 0.3841 | - |
avx | y-Af | 0.0522 | 0.0567 | - |
avx2 | AcT-y | 0.6394 | 0.7250 | 1.3473 |
avx2 | AfT-y | 0.0463 | 0.0536 | 0.7966 |
avx2 | y-Ac | 0.5724 | 0.6649 | 1.3349 |
avx2 | y-Af | 0.0446 | 0.0520 | 0.783 |
avx512 | AcT-y | 0.2518 | 0.2539 | 0.9742 |
avx512 | AfT-y | 0.0477 | 0.0544 | 0.4983 |
avx512 | y-Ac | 0.2556 | 0.2415 | 0.895 |
avx512 | y-Af | 0.0474 | 0.0527 | 0.4904 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedy
: (150000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 0.7520 | 0.7348 | - |
avx | AfT-y | 0.4706 | 0.4814 | - |
avx | y-Ac | 0.7246 | 0.7331 | - |
avx | y-Af | 0.4714 | 0.4783 | - |
avx2 | AcT-y | 0.4337 | 0.4416 | 1.4658 |
avx2 | AfT-y | 0.4721 | 0.4809 | 1.3509 |
avx2 | y-Ac | 0.4325 | 0.4430 | 1.2209 |
avx2 | y-Af | 0.4702 | 0.4779 | 1.0979 |
avx512 | AcT-y | 0.3794 | 0.3922 | 1.7906 |
avx512 | AfT-y | 0.4933 | 0.4984 | 1.6916 |
avx512 | y-Ac | 0.3810 | 0.3910 | 1.4169 |
avx512 | y-Af | 0.4756 | 0.4998 | 1.3061 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedy
: (150000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 0.6914 | 0.8855 | - |
avx | AfT-y | 0.0975 | 0.1006 | - |
avx | y-Ac | 0.7036 | 0.7992 | - |
avx | y-Af | 0.0410 | 0.0577 | - |
avx2 | AcT-y | 0.9887 | 1.2641 | 0.9136 |
avx2 | AfT-y | 0.0384 | 0.1013 | 0.7207 |
avx2 | y-Ac | 0.9842 | 1.0730 | 1.2324 |
avx2 | y-Af | 0.0937 | 0.0491 | 1.0977 |
avx512 | AcT-y | 0.5894 | 0.4787 | 0.6526 |
avx512 | AfT-y | 0.0422 | 0.0532 | 0.5004 |
avx512 | y-Ac | 0.5631 | 0.4685 | 0.6435 |
avx512 | y-Af | 0.0427 | 0.0541 | 0.4594 |
Ac
: (150000, 15) matrix, C-contiguousAcT
: (150000, 15) matrix, C-contiguous, transposedAf
: (150000, 15) matrix, F-contiguousAfT
: (150000, 15) matrix, F-contiguous, transposedy
: (150000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | AcT-y | 0.5946 | 0.3349 | - |
avx | AfT-y | 0.0355 | 0.0465 | - |
avx | y-Ac | 0.5720 | 0.3836 | - |
avx | y-Af | 0.0370 | 0.0415 | - |
avx2 | AcT-y | 0.8885 | 0.5543 | 1.1668 |
avx2 | AfT-y | 0.0358 | 0.0286 | 1.2774 |
avx2 | y-Ac | 0.9370 | 0.5372 | 1.2355 |
avx2 | y-Af | 0.0341 | 0.0455 | 1.0991 |
avx512 | AcT-y | 0.2873 | 0.4690 | 0.7153 |
avx512 | AfT-y | 0.0408 | 0.0305 | 0.5648 |
avx512 | y-Ac | 0.4008 | 0.3853 | 0.6986 |
avx512 | y-Af | 0.0398 | 0.0288 | 0.5527 |
Bc
: (3000, 3000) matrix, C-contiguousBf
: (3000, 3000) matrix, F-contiguousCc
: (3000, 3000) matrix, C-contiguousCf
: (3000, 3000) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 2375.17 | 2374.86 | - |
avx | Bc-Cf | 2384.14 | 2447.75 | - |
avx | Bf-Cc | 2373.71 | 2377.03 | - |
avx | Bf-Cf | 2363.71 | 2367.88 | - |
avx2 | Bc-Cc | 1263.58 | 1255.88 | 1256.8004 |
avx2 | Bc-Cf | 1252.99 | 1301.95 | 1254.4515 |
avx2 | Bf-Cc | 1251.43 | 1254.46 | 1300.8235 |
avx2 | Bf-Cf | 1246.28 | 1246.09 | 1251.3151 |
avx512 | Bc-Cc | 774.56 | 775.08 | 773.2522 |
avx512 | Bc-Cf | 771.82 | 776.34 | 799.6686 |
avx512 | Bf-Cc | 779.26 | 782.37 | 778.6236 |
avx512 | Bf-Cf | 769.04 | 773.98 | 775.7101 |
Bc
: (3000, 3000) matrix, C-contiguousBf
: (3000, 3000) matrix, F-contiguousCc
: (3000, 3000) matrix, C-contiguousCf
: (3000, 3000) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 375.62 | 362.89 | - |
avx | Bc-Cf | 347.77 | 339.80 | - |
avx | Bf-Cc | 346.43 | 339.93 | - |
avx | Bf-Cf | 343.72 | 372.99 | - |
avx2 | Bc-Cc | 214.43 | 212.76 | 217.7826 |
avx2 | Bc-Cf | 200.34 | 196.39 | 208.8227 |
avx2 | Bf-Cc | 200.54 | 198.09 | 206.0442 |
avx2 | Bf-Cf | 202.84 | 215.40 | 211.8449 |
avx512 | Bc-Cc | 162.93 | 158.49 | 151.5677 |
avx512 | Bc-Cf | 141.32 | 135.50 | 146.9132 |
avx512 | Bf-Cc | 146.14 | 147.11 | 150.8385 |
avx512 | Bf-Cf | 161.68 | 159.28 | 146.8157 |
Bc
: (3000, 3000) matrix, C-contiguousBf
: (3000, 3000) matrix, F-contiguousCc
: (3000, 3000) matrix, C-contiguousCf
: (3000, 3000) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 211.73 | 197.75 | - |
avx | Bc-Cf | 191.21 | 211.58 | - |
avx | Bf-Cc | 193.20 | 213.48 | - |
avx | Bf-Cf | 207.33 | 194.72 | - |
avx2 | Bc-Cc | 139.49 | 134.84 | 144.8766 |
avx2 | Bc-Cf | 132.68 | 136.62 | 145.2361 |
avx2 | Bf-Cc | 126.92 | 137.15 | 135.8298 |
avx2 | Bf-Cf | 137.72 | 125.34 | 143.8074 |
avx512 | Bc-Cc | 113.10 | 121.20 | 108.8694 |
avx512 | Bc-Cf | 105.97 | 121.44 | 110.4436 |
avx512 | Bf-Cc | 96.76 | 109.15 | 110.1264 |
avx512 | Bf-Cf | 107.85 | 93.54 | 92.4746 |
Bc
: (3000, 3000) matrix, C-contiguousBf
: (3000, 3000) matrix, F-contiguousCc
: (3000, 3000) matrix, C-contiguousCf
: (3000, 3000) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 1237.06 | 1266.51 | - |
avx | Bc-Cf | 1200.05 | 1197.30 | - |
avx | Bf-Cc | 1208.18 | 1200.05 | - |
avx | Bf-Cf | 1188.55 | 1186.36 | - |
avx2 | Bc-Cc | 651.27 | 646.33 | 622.0946 |
avx2 | Bc-Cf | 636.30 | 618.42 | 620.358 |
avx2 | Bf-Cc | 619.06 | 616.23 | 619.4856 |
avx2 | Bf-Cf | 618.98 | 614.67 | 615.4227 |
avx512 | Bc-Cc | 429.51 | 415.97 | 396.0257 |
avx512 | Bc-Cf | 425.59 | 386.70 | 385.7088 |
avx512 | Bf-Cc | 387.70 | 396.91 | 388.1412 |
avx512 | Bf-Cf | 389.95 | 386.39 | 385.8595 |
Bc
: (3000, 3000) matrix, C-contiguousBf
: (3000, 3000) matrix, F-contiguousCc
: (3000, 3000) matrix, C-contiguousCf
: (3000, 3000) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 181.08 | 183.31 | - |
avx | Bc-Cf | 174.86 | 174.90 | - |
avx | Bf-Cc | 181.26 | 182.11 | - |
avx | Bf-Cf | 172.69 | 173.52 | - |
avx2 | Bc-Cc | 119.05 | 112.96 | 114.547 |
avx2 | Bc-Cf | 102.87 | 101.86 | 106.1758 |
avx2 | Bf-Cc | 102.00 | 101.03 | 100.6712 |
avx2 | Bf-Cf | 101.77 | 96.69 | 96.692 |
avx512 | Bc-Cc | 74.11 | 75.73 | 71.1032 |
avx512 | Bc-Cf | 70.13 | 70.64 | 77.2659 |
avx512 | Bf-Cc | 76.91 | 72.17 | 72.1736 |
avx512 | Bf-Cf | 76.64 | 70.80 | 70.7051 |
Bc
: (3000, 3000) matrix, C-contiguousBf
: (3000, 3000) matrix, F-contiguousCc
: (3000, 3000) matrix, C-contiguousCf
: (3000, 3000) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 112.95 | 98.68 | - |
avx | Bc-Cf | 96.67 | 100.91 | - |
avx | Bf-Cc | 100.31 | 106.11 | - |
avx | Bf-Cf | 94.04 | 93.73 | - |
avx2 | Bc-Cc | 68.42 | 63.35 | 61.5305 |
avx2 | Bc-Cf | 62.37 | 59.23 | 66.0565 |
avx2 | Bf-Cc | 60.32 | 67.55 | 57.3751 |
avx2 | Bf-Cf | 58.60 | 58.15 | 53.5256 |
avx512 | Bc-Cc | 44.29 | 45.04 | 44.8097 |
avx512 | Bc-Cf | 43.71 | 53.66 | 53.3274 |
avx512 | Bf-Cc | 47.58 | 54.77 | 47.1386 |
avx512 | Bf-Cf | 45.44 | 41.61 | 40.4459 |
Bc
: (300, 300) matrix, C-contiguousBf
: (300, 300) matrix, F-contiguousCc
: (300, 300) matrix, C-contiguousCf
: (300, 300) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 2.4306 | 2.4424 | - |
avx | Bc-Cf | 2.4559 | 2.4641 | - |
avx | Bf-Cc | 2.4374 | 2.4397 | - |
avx | Bf-Cf | 2.4593 | 2.4652 | - |
avx2 | Bc-Cc | 1.2834 | 1.2929 | 1.2953 |
avx2 | Bc-Cf | 1.2904 | 1.2987 | 1.3664 |
avx2 | Bf-Cc | 1.2822 | 1.2872 | 1.2889 |
avx2 | Bf-Cf | 1.2877 | 1.2925 | 1.297 |
avx512 | Bc-Cc | 0.8655 | 0.8733 | 0.8759 |
avx512 | Bc-Cf | 0.8676 | 0.8721 | 0.875 |
avx512 | Bf-Cc | 0.8695 | 0.8740 | 0.874 |
avx512 | Bf-Cf | 0.8664 | 0.8750 | 0.8755 |
Bc
: (300, 300) matrix, C-contiguousBf
: (300, 300) matrix, F-contiguousCc
: (300, 300) matrix, C-contiguousCf
: (300, 300) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 0.4802 | 0.4499 | - |
avx | Bc-Cf | 0.4153 | 0.4215 | - |
avx | Bf-Cc | 0.4489 | 0.4723 | - |
avx | Bf-Cf | 0.4635 | 0.4230 | - |
avx2 | Bc-Cc | 0.2465 | 0.2542 | 0.2539 |
avx2 | Bc-Cf | 0.2437 | 0.2508 | 0.2897 |
avx2 | Bf-Cc | 0.2463 | 0.2759 | 0.2749 |
avx2 | Bf-Cf | 0.2553 | 0.2677 | 0.288 |
avx512 | Bc-Cc | 0.1957 | 0.2155 | 0.2053 |
avx512 | Bc-Cf | 0.2599 | 0.2286 | 0.2213 |
avx512 | Bf-Cc | 0.1965 | 0.2272 | 0.2265 |
avx512 | Bf-Cf | 0.2167 | 0.2275 | 0.2239 |
Bc
: (300, 300) matrix, C-contiguousBf
: (300, 300) matrix, F-contiguousCc
: (300, 300) matrix, C-contiguousCf
: (300, 300) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 0.2437 | 0.2511 | - |
avx | Bc-Cf | 0.2470 | 0.3000 | - |
avx | Bf-Cc | 0.2480 | 0.2551 | - |
avx | Bf-Cf | 0.2964 | 0.2968 | - |
avx2 | Bc-Cc | 0.1655 | 0.2024 | 0.1915 |
avx2 | Bc-Cf | 0.1698 | 0.1957 | 0.2005 |
avx2 | Bf-Cc | 0.1652 | 0.1838 | 0.1919 |
avx2 | Bf-Cf | 0.1845 | 0.1960 | 0.1907 |
avx512 | Bc-Cc | 0.1042 | 0.1657 | 0.1786 |
avx512 | Bc-Cf | 0.1268 | 0.1431 | 0.1636 |
avx512 | Bf-Cc | 0.1574 | 0.1633 | 0.1695 |
avx512 | Bf-Cf | 0.1645 | 0.1681 | 0.1669 |
Bc
: (300, 300) matrix, C-contiguousBf
: (300, 300) matrix, F-contiguousCc
: (300, 300) matrix, C-contiguousCf
: (300, 300) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 1.2710 | 1.2784 | - |
avx | Bc-Cf | 1.2810 | 1.2879 | - |
avx | Bf-Cc | 1.2412 | 1.2496 | - |
avx | Bf-Cf | 1.2515 | 1.2579 | - |
avx2 | Bc-Cc | 0.7203 | 0.7267 | 0.7401 |
avx2 | Bc-Cf | 0.7207 | 0.7269 | 0.7401 |
avx2 | Bf-Cc | 0.7167 | 0.7246 | 0.7374 |
avx2 | Bf-Cf | 0.7195 | 0.7253 | 0.7377 |
avx512 | Bc-Cc | 0.4485 | 0.4568 | 0.457 |
avx512 | Bc-Cf | 0.4451 | 0.4532 | 0.4537 |
avx512 | Bf-Cc | 0.4500 | 0.4585 | 0.4603 |
avx512 | Bf-Cf | 0.4478 | 0.4554 | 0.4556 |
Bc
: (300, 300) matrix, C-contiguousBf
: (300, 300) matrix, F-contiguousCc
: (300, 300) matrix, C-contiguousCf
: (300, 300) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 0.2458 | 0.2506 | - |
avx | Bc-Cf | 0.2422 | 0.2372 | - |
avx | Bf-Cc | 0.2346 | 0.2267 | - |
avx | Bf-Cf | 0.2282 | 0.2491 | - |
avx2 | Bc-Cc | 0.1359 | 0.1423 | 0.1435 |
avx2 | Bc-Cf | 0.1385 | 0.1683 | 0.1364 |
avx2 | Bf-Cc | 0.1414 | 0.1359 | 0.1347 |
avx2 | Bf-Cf | 0.1340 | 0.1402 | 0.1366 |
avx512 | Bc-Cc | 0.1121 | 0.1225 | 0.1175 |
avx512 | Bc-Cf | 0.1111 | 0.1059 | 0.1063 |
avx512 | Bf-Cc | 0.1070 | 0.1066 | 0.1059 |
avx512 | Bf-Cf | 0.1047 | 0.1128 | 0.1059 |
Bc
: (300, 300) matrix, C-contiguousBf
: (300, 300) matrix, F-contiguousCc
: (300, 300) matrix, C-contiguousCf
: (300, 300) matrix, F-contiguous
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | Bc-Cc | 0.1421 | 0.1822 | - |
avx | Bc-Cf | 0.1740 | 0.1488 | - |
avx | Bf-Cc | 0.1528 | 0.1450 | - |
avx | Bf-Cf | 0.1378 | 0.1545 | - |
avx2 | Bc-Cc | 0.0904 | 0.1061 | 0.1056 |
avx2 | Bc-Cf | 0.1040 | 0.0989 | 0.0994 |
avx2 | Bf-Cc | 0.0904 | 0.0954 | 0.0966 |
avx2 | Bf-Cf | 0.0954 | 0.0985 | 0.1011 |
avx512 | Bc-Cc | 0.0904 | 0.0770 | 0.0889 |
avx512 | Bc-Cf | 0.0894 | 0.0718 | 0.0887 |
avx512 | Bf-Cc | 0.0694 | 0.0753 | 0.0894 |
avx512 | Bf-Cf | 0.0668 | 0.0892 | 0.0997 |
a2
: (25000000,) vectorb2
: (25000000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 37.94 | 39.49 | - |
avx2 | a2-b2 | 32.57 | 33.14 | 67.5952 |
avx512 | a2-b2 | 29.98 | 29.58 | 80.7738 |
a2
: (25000000,) vectorb2
: (25000000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 6.81 | 7.17 | - |
avx2 | a2-b2 | 6.39 | 6.85 | 69.2601 |
avx512 | a2-b2 | 5.95 | 6.70 | 84.1317 |
a2
: (25000000,) vectorb2
: (25000000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 3.69 | 5.40 | - |
avx2 | a2-b2 | 4.71 | 5.67 | 71.512 |
avx512 | a2-b2 | 3.32 | 4.91 | 84.9237 |
a2
: (25000000,) vectorb2
: (25000000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 18.19 | 18.03 | - |
avx2 | a2-b2 | 16.06 | 16.19 | 54.189 |
avx512 | a2-b2 | 14.31 | 14.49 | 62.856 |
a2
: (25000000,) vectorb2
: (25000000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 3.49 | 2.59 | - |
avx2 | a2-b2 | 3.02 | 3.02 | 54.4306 |
avx512 | a2-b2 | 3.21 | 2.33 | 62.906 |
a2
: (25000000,) vectorb2
: (25000000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 2.04 | 2.21 | - |
avx2 | a2-b2 | 1.59 | 1.70 | 54.3137 |
avx512 | a2-b2 | 2.00 | 1.66 | 62.7139 |
a2
: (2500000,) vectorb2
: (2500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 2.2779 | 2.5178 | - |
avx2 | a2-b2 | 2.0921 | 2.1423 | 5.5889 |
avx512 | a2-b2 | 1.9314 | 1.9053 | 6.4217 |
a2
: (2500000,) vectorb2
: (2500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 0.2286 | 0.2406 | - |
avx2 | a2-b2 | 0.2222 | 0.2358 | 5.6441 |
avx512 | a2-b2 | 0.2286 | 0.2918 | 6.3899 |
a2
: (2500000,) vectorb2
: (2500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 0.1192 | 0.1383 | - |
avx2 | a2-b2 | 0.1154 | 0.1309 | 6.2925 |
avx512 | a2-b2 | 0.1428 | 0.1347 | 6.356 |
a2
: (2500000,) vectorb2
: (2500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 0.7795 | 0.8206 | - |
avx2 | a2-b2 | 0.7999 | 0.8123 | 5.2384 |
avx512 | a2-b2 | 0.7951 | 0.8414 | 5.6853 |
a2
: (2500000,) vectorb2
: (2500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 0.1030 | 0.1173 | - |
avx2 | a2-b2 | 0.1063 | 0.1171 | 5.2404 |
avx512 | a2-b2 | 0.1090 | 0.1171 | 5.6746 |
a2
: (2500000,) vectorb2
: (2500000,) vector
dot | gemm | gemm-strict | ||
---|---|---|---|---|
avx | a2-b2 | 0.0429 | 0.0534 | - |
avx2 | a2-b2 | 0.0412 | 0.0515 | 5.475 |
avx512 | a2-b2 | 0.0610 | 0.0591 | 6.2094 |