Skip to content

Instantly share code, notes, and snippets.

@geohot
Created October 19, 2023 05:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save geohot/471f465159ee08ab774deb583584ff52 to your computer and use it in GitHub Desktop.
Save geohot/471f465159ee08ab774deb583584ff52 to your computer and use it in GitHub Desktop.
kernels for BS=1024 CIFAR BEAM=2 WINO=1
*** 0 E_64_32_6_6n5 arg 2 sz [64, 1, 1] [32, 1, 1] OPs 33M/ 0.00G mem 3.07 GB tm 3.20us/ 0.00ms (10483.20 GFLOPS, 297.02 GB/s)
*** 1 r_128_31_31_3_2_3_2_2_2_8n26 arg 3 sz [31, 31, 128] [2, 3, 1] OPs 283M/ 0.03G mem 3.07 GB tm 218.44us/ 0.22ms ( 1297.42 GFLOPS, 216.24 GB/s)
*** 2 r_1024_32_16_2_3_4_4_8n6 arg 3 sz [32, 1024, 1] [2, 16, 1] OPs 805M/ 0.32G mem 3.07 GB tm 64.68us/ 0.29ms (12450.43 GFLOPS, 1426.62 GB/s)
*** 3 E_262144_32_4n6 arg 2 sz [262144, 1, 1] [32, 1, 1] OPs 503M/ 1.12G mem 3.07 GB tm 93.80us/ 0.38ms ( 5365.85 GFLOPS, 1430.89 GB/s)
*** 4 E_16384_2_8_8_6_6n2 arg 2 sz [16384, 1, 1] [8, 8, 2] OPs 68174M/ 1.63G mem 3.07 GB tm 363.04us/ 0.74ms (187785.52 GFLOPS, 6007.66 GB/s)
*** 5 r_36_1024_4_16_4_32_4_4n10 arg 3 sz [4, 1024, 36] [4, 16, 1] OPs 9663M/ 69.80G mem 3.07 GB tm 586.09us/ 1.33ms (16488.50 GFLOPS, 773.15 GB/s)
*** 6 E_16384_4_32_4_4_2n6 arg 2 sz [4, 16384, 1] [32, 1, 1] OPs 33487M/ 79.46G mem 3.07 GB tm 623.77us/ 1.95ms (53685.71 GFLOPS, 4572.43 GB/s)
*** 7 r_131072_8_16_2_2n6 arg 2 sz [131072, 1, 1] [16, 8, 1] OPs 67M/ 112.95G mem 3.07 GB tm 208.00us/ 2.16ms ( 322.64 GFLOPS, 806.59 GB/s)
*** 8 E_1024_16_2_2_4_4_4_4n12 arg 2 sz [1024, 1, 1] [2, 2, 16] OPs 0M/ 113.02G mem 3.07 GB tm 56.96us/ 2.22ms ( 0.00 GFLOPS, 1767.23 GB/s)
*** 9 r_32768_32_16n6 arg 2 sz [32768, 1, 1] [32, 1, 1] OPs 16M/ 113.02G mem 3.07 GB tm 31.76us/ 2.25ms ( 528.25 GFLOPS, 2245.06 GB/s)
*** 10 r_64_16_8_8_16n6 arg 2 sz [64, 1, 1] [8, 16, 1] OPs 1M/ 113.03G mem 3.07 GB tm 4.60us/ 2.25ms ( 227.97 GFLOPS, 911.86 GB/s)
*** 11 r_256_8_2_4_2_4_4_16_2n6 arg 3 sz [8, 256, 1] [32, 4, 2] OPs 50M/ 113.04G mem 3.07 GB tm 26.60us/ 2.28ms ( 1892.17 GFLOPS, 2680.58 GB/s)
*** 12 r_64_4_16_4_16_4n14 arg 2 sz [64, 1, 1] [16, 4, 1] OPs 1M/ 113.09G mem 3.07 GB tm 4.68us/ 2.29ms ( 224.11 GFLOPS, 896.28 GB/s)
*** 13 E_256_16_2_6_6_2n12 arg 2 sz [256, 1, 1] [2, 16, 1] OPs 268M/ 113.09G mem 3.07 GB tm 4.64us/ 2.29ms (57838.34 GFLOPS, 1638.44 GB/s)
*** 14 E_128_32_6_6n4 arg 2 sz [128, 1, 1] [32, 1, 1] OPs 67M/ 113.36G mem 3.07 GB tm 3.36us/ 2.29ms (19968.00 GFLOPS, 565.70 GB/s)
*** 15 E_1024_32_16_2_8_2n31 arg 5 sz [32, 1024, 1] [2, 16, 1] OPs 318M/ 113.42G mem 3.07 GB tm 53.16us/ 2.35ms ( 5996.26 GFLOPS, 1262.38 GB/s)
*** 16 E_65536_4_4_6_6n30 arg 2 sz [65536, 1, 1] [4, 4, 1] OPs 34087M/ 113.74G mem 3.07 GB tm 90.60us/ 2.44ms (376233.25 GFLOPS, 12036.50 GB/s)
*** 17 r_36_1024_2_16_2_64_4_2_2n18 arg 3 sz [2, 1024, 36] [2, 16, 1] OPs 4831M/ 147.83G mem 3.07 GB tm 218.00us/ 2.66ms (22164.19 GFLOPS, 693.98 GB/s)
*** 18 E_8192_32_4_4_4n14 arg 2 sz [8192, 1, 1] [4, 32, 1] OPs 8371M/ 152.66G mem 3.07 GB tm 94.08us/ 2.75ms (88985.35 GFLOPS, 7578.91 GB/s)
*** 19 E_1024_16_2_2_4_4_4_4n12 arg 2 sz [1024, 1, 1] [2, 2, 16] OPs 0M/ 161.03G mem 3.07 GB tm 44.24us/ 2.79ms ( 0.00 GFLOPS, 2275.39 GB/s)
*** 20 r_32768_32_16n6 arg 2 sz [32768, 1, 1] [32, 1, 1] OPs 16M/ 161.03G mem 3.07 GB tm 33.56us/ 2.83ms ( 499.92 GFLOPS, 2124.65 GB/s)
*** 21 r_64_16_8_8_16n6 arg 2 sz [64, 1, 1] [8, 16, 1] OPs 1M/ 161.05G mem 3.07 GB tm 4.72us/ 2.83ms ( 222.17 GFLOPS, 888.68 GB/s)
*** 22 r_256_8_2_4_2_4_4_16_2n6 arg 3 sz [8, 256, 1] [32, 4, 2] OPs 50M/ 161.05G mem 3.07 GB tm 27.68us/ 2.86ms ( 1818.27 GFLOPS, 2575.90 GB/s)
*** 23 r_64_4_16_4_16_4n14 arg 2 sz [64, 1, 1] [16, 4, 1] OPs 1M/ 161.10G mem 3.07 GB tm 4.88us/ 2.86ms ( 214.92 GFLOPS, 859.54 GB/s)
*** 24 E_1024_8_4_8_16_4n9 arg 6 sz [4, 8, 1024] [16, 8, 1] OPs 335M/ 161.10G mem 3.07 GB tm 141.52us/ 3.01ms ( 2370.97 GFLOPS, 711.30 GB/s)
*** 25 E_65536_4_4_6_6n30 arg 2 sz [65536, 1, 1] [4, 4, 1] OPs 34087M/ 161.44G mem 3.07 GB tm 104.84us/ 3.11ms (325131.47 GFLOPS, 10401.65 GB/s)
*** 26 r_36_512_4_2_16_4_16_4_4_4n2 arg 3 sz [4, 512, 36] [4, 16, 2] OPs 19327M/ 195.52G mem 3.07 GB tm 917.17us/ 4.03ms (21072.84 GFLOPS, 412.87 GB/s)
*** 27 E_32768_4_32_4_4n18 arg 2 sz [4, 32768, 1] [32, 1, 1] OPs 33487M/ 214.85G mem 3.07 GB tm 554.85us/ 4.58ms (60354.37 GFLOPS, 5140.40 GB/s)
*** 28 r_131072_16_8_2_2n18 arg 2 sz [131072, 1, 1] [8, 16, 1] OPs 67M/ 248.34G mem 3.07 GB tm 204.84us/ 4.79ms ( 327.61 GFLOPS, 819.03 GB/s)
*** 29 E_131072_2_16_2_8n10 arg 3 sz [2, 131072, 1] [2, 16, 1] OPs 134M/ 248.41G mem 3.07 GB tm 332.16us/ 5.12ms ( 404.07 GFLOPS, 909.16 GB/s)
*** 30 r_524288_32_4n8 arg 2 sz [524288, 1, 1] [32, 1, 1] OPs 67M/ 248.54G mem 3.07 GB tm 224.08us/ 5.34ms ( 299.48 GFLOPS, 748.71 GB/s)
*** 31 E_131072_32_4n12 arg 2 sz [131072, 1, 1] [32, 1, 1] OPs 0M/ 248.61G mem 3.07 GB tm 75.20us/ 5.42ms ( 0.00 GFLOPS, 1338.59 GB/s)
*** 32 r_256_16_64_64n6 arg 2 sz [256, 1, 1] [16, 1, 1] OPs 16M/ 248.61G mem 3.07 GB tm 26.20us/ 5.45ms ( 640.34 GFLOPS, 2561.35 GB/s)
*** 33 r_256_4_4_64_16_4n6 arg 3 sz [256, 1, 1] [4, 4, 1] OPs 50M/ 248.62G mem 3.07 GB tm 25.28us/ 5.47ms ( 1991.01 GFLOPS, 2654.70 GB/s)
*** 34 E_2048_2_8_6_6_4n16 arg 2 sz [2048, 1, 1] [8, 2, 1] OPs 2146M/ 248.67G mem 3.07 GB tm 10.88us/ 5.48ms (197330.82 GFLOPS, 5589.85 GB/s)
*** 35 E_2048_32_6_6n4 arg 2 sz [2048, 1, 1] [32, 1, 1] OPs 1073M/ 250.82G mem 3.07 GB tm 6.20us/ 5.49ms (173141.88 GFLOPS, 4904.66 GB/s)
*** 36 E_1024_32_8_16_4n14 arg 5 sz [32, 1024, 1] [16, 8, 1] OPs 318M/ 251.89G mem 3.07 GB tm 75.84us/ 5.56ms ( 4203.10 GFLOPS, 1327.33 GB/s)
*** 37 E_8192_32_2_2_6_6n2 arg 2 sz [8192, 1, 1] [32, 1, 1] OPs 34087M/ 252.21G mem 3.07 GB tm 52.60us/ 5.62ms (648031.57 GFLOPS, 20731.91 GB/s)
*** 38 r_36_64_4_4_8_4_2_256_4_4n4 arg 3 sz [4, 64, 36] [8, 8, 4] OPs 19327M/ 286.30G mem 3.07 GB tm 885.33us/ 6.50ms (21830.70 GFLOPS, 175.88 GB/s)
*** 39 E_16384_2_32_4_4n10 arg 2 sz [2, 16384, 1] [32, 1, 1] OPs 8371M/ 305.63G mem 3.07 GB tm 117.68us/ 6.62ms (71140.04 GFLOPS, 6059.02 GB/s)
*** 40 E_131072_32_4n12 arg 2 sz [131072, 1, 1] [32, 1, 1] OPs 0M/ 314.00G mem 3.07 GB tm 46.60us/ 6.67ms ( 0.00 GFLOPS, 2160.11 GB/s)
*** 41 r_256_16_64_64n6 arg 2 sz [256, 1, 1] [16, 1, 1] OPs 16M/ 314.00G mem 3.07 GB tm 29.20us/ 6.70ms ( 574.57 GFLOPS, 2298.28 GB/s)
*** 42 r_256_4_4_64_16_4n6 arg 3 sz [256, 1, 1] [4, 4, 1] OPs 50M/ 314.02G mem 3.07 GB tm 25.08us/ 6.72ms ( 2006.88 GFLOPS, 2675.87 GB/s)
*** 43 E_1024_32_8_16_4n17 arg 6 sz [32, 1024, 1] [16, 8, 1] OPs 335M/ 314.07G mem 3.07 GB tm 127.88us/ 6.85ms ( 2623.88 GFLOPS, 1049.58 GB/s)
*** 44 E_8192_32_2_2_6_6n2 arg 2 sz [8192, 1, 1] [32, 1, 1] OPs 34087M/ 314.40G mem 3.07 GB tm 78.08us/ 6.93ms (436560.86 GFLOPS, 13966.51 GB/s)
*** 45 r_36_64_8_2_4_2_4_64_4_4_4_2_2n4 arg 3 sz [8, 64, 36] [8, 4, 2] OPs 38654M/ 348.49G mem 3.07 GB tm 1485.10us/ 8.41ms (26028.44 GFLOPS, 158.86 GB/s)
*** 46 E_32768_32_2_4_4n8 arg 2 sz [32768, 1, 1] [2, 32, 1] OPs 16743M/ 387.14G mem 3.07 GB tm 283.28us/ 8.69ms (59105.78 GFLOPS, 5034.06 GB/s)
*** 47 r_16384_32_2_2_4_4n10 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 33M/ 403.89G mem 3.07 GB tm 102.00us/ 8.80ms ( 328.96 GFLOPS, 822.40 GB/s)
*** 48 E_65536_32_2_2_4n6 arg 3 sz [65536, 1, 1] [2, 2, 32] OPs 67M/ 403.92G mem 3.07 GB tm 116.24us/ 8.91ms ( 577.33 GFLOPS, 1298.98 GB/s)
*** 49 r_16384_32_4_4_4n10 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 33M/ 403.99G mem 3.07 GB tm 88.00us/ 9.00ms ( 381.30 GFLOPS, 953.24 GB/s)
*** 50 E_1024_16_2_4_4_4_4n6 arg 2 sz [1024, 1, 1] [2, 16, 1] OPs 0M/ 404.02G mem 3.07 GB tm 34.88us/ 9.04ms ( 0.00 GFLOPS, 1442.95 GB/s)
*** 51 r_512_16_64_16n6 arg 2 sz [512, 1, 1] [16, 1, 1] OPs 8M/ 404.02G mem 3.07 GB tm 17.88us/ 9.05ms ( 469.19 GFLOPS, 1876.76 GB/s)
*** 52 r_512_16_64_16n13 arg 3 sz [512, 1, 1] [16, 1, 1] OPs 25M/ 404.03G mem 3.07 GB tm 17.60us/ 9.07ms ( 1429.91 GFLOPS, 1906.63 GB/s)
*** 53 E_8192_32_6_6n4 arg 2 sz [8192, 1, 1] [32, 1, 1] OPs 4293M/ 404.06G mem 3.07 GB tm 15.00us/ 9.09ms (286261.25 GFLOPS, 8109.00 GB/s)
*** 54 E_1024_32_8_4_2_4n18 arg 5 sz [32, 1024, 1] [2, 4, 8] OPs 159M/ 408.35G mem 3.07 GB tm 22.28us/ 9.11ms ( 7153.66 GFLOPS, 2259.33 GB/s)
*** 55 E_16384_32_6_6n2 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 17043M/ 408.51G mem 3.07 GB tm 29.64us/ 9.14ms (575018.70 GFLOPS, 18396.08 GB/s)
*** 56 r_36_64_8_8_2_2_4_2_128_4_4n4 arg 3 sz [8, 64, 36] [16, 2, 8] OPs 19327M/ 425.55G mem 3.07 GB tm 1160.69us/ 10.30ms (16651.58 GFLOPS, 81.31 GB/s)
*** 57 E_16384_32_4_4n16 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 4185M/ 444.88G mem 3.07 GB tm 23.24us/ 10.32ms (180109.09 GFLOPS, 15339.96 GB/s)
*** 58 E_1024_16_2_4_4_4_4n6 arg 2 sz [1024, 1, 1] [2, 16, 1] OPs 0M/ 449.07G mem 3.07 GB tm 20.44us/ 10.34ms ( 0.00 GFLOPS, 2462.41 GB/s)
*** 59 r_512_16_64_16n6 arg 2 sz [512, 1, 1] [16, 1, 1] OPs 8M/ 449.07G mem 3.07 GB tm 16.76us/ 10.36ms ( 500.54 GFLOPS, 2002.18 GB/s)
*** 60 r_512_16_64_16n13 arg 3 sz [512, 1, 1] [16, 1, 1] OPs 25M/ 449.07G mem 3.07 GB tm 17.00us/ 10.38ms ( 1480.38 GFLOPS, 1973.92 GB/s)
*** 61 E_1024_32_2_4_4_2_4n8 arg 6 sz [32, 1024, 1] [8, 4, 2] OPs 167M/ 449.10G mem 3.07 GB tm 39.04us/ 10.42ms ( 4297.44 GFLOPS, 1719.13 GB/s)
*** 62 r_16384_32_4_4n8 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 8M/ 449.27G mem 3.07 GB tm 7.56us/ 10.42ms ( 1109.60 GFLOPS, 2357.91 GB/s)
*** 63 E_16384_32_4_4n29 arg 3 sz [16384, 1, 1] [32, 1, 1] OPs 16M/ 449.28G mem 3.07 GB tm 16.00us/ 10.44ms ( 1048.58 GFLOPS, 2162.69 GB/s)
*** 64 r_16384_32_16n14 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 8M/ 449.29G mem 3.07 GB tm 8.00us/ 10.45ms ( 1048.58 GFLOPS, 2228.22 GB/s)
*** 65 E_n9 arg 1 sz [1, 1, 1] [1, 1, 1] OPs 0M/ 449.30G mem 3.07 GB tm 1.52us/ 10.45ms ( 0.00 GFLOPS, 0.00 GB/s)
*** 66 E_512_2n6 arg 2 sz [512, 1, 1] [1, 1, 1] OPs 0M/ 449.30G mem 3.07 GB tm 1.72us/ 10.45ms ( 1.19 GFLOPS, 1.19 GB/s)
*** 67 r_512_5_2_16_32_2n22 arg 3 sz [5, 512, 1] [16, 2, 1] OPs 10M/ 449.30G mem 3.07 GB tm 3.28us/ 10.45ms ( 3200.00 GFLOPS, 329.05 GB/s)
*** 68 r_32_32_10n8 arg 2 sz [32, 1, 1] [32, 1, 1] OPs 0M/ 449.31G mem 3.07 GB tm 1.80us/ 10.46ms ( 5.69 GFLOPS, 12.52 GB/s)
*** 69 E_64_10_16n6 arg 3 sz [10, 64, 1] [16, 1, 1] OPs 0M/ 449.31G mem 3.07 GB tm 1.96us/ 10.46ms ( 15.67 GFLOPS, 21.94 GB/s)
*** 70 r_32_32_10n13 arg 2 sz [32, 1, 1] [32, 1, 1] OPs 0M/ 449.31G mem 3.07 GB tm 1.96us/ 10.46ms ( 5.22 GFLOPS, 11.49 GB/s)
*** 71 r_256_4_10n14 arg 4 sz [256, 1, 1] [4, 1, 1] OPs 0M/ 449.31G mem 3.07 GB tm 2.08us/ 10.46ms ( 25.11 GFLOPS, 12.80 GB/s)
*** 72 E_128_5_8_2n8 arg 3 sz [5, 128, 1] [8, 1, 1] OPs 0M/ 449.31G mem 3.07 GB tm 1.84us/ 10.46ms ( 11.13 GFLOPS, 23.37 GB/s)
*** 73 r_32_32_10n13 arg 2 sz [32, 1, 1] [32, 1, 1] OPs 0M/ 449.31G mem 3.07 GB tm 1.60us/ 10.46ms ( 6.40 GFLOPS, 14.08 GB/s)
*** 74 r_512_2_10n18 arg 5 sz [512, 1, 1] [2, 1, 1] OPs 0M/ 449.31G mem 3.07 GB tm 2.04us/ 10.47ms ( 35.14 GFLOPS, 23.09 GB/s)
*** 75 E_64_5_16_2n20 arg 8 sz [5, 64, 1] [2, 16, 1] OPs 0M/ 449.31G mem 3.07 GB tm 1.92us/ 10.47ms ( 48.00 GFLOPS, 46.94 GB/s)
*** 76 r_32_8_8_16_10_4_4n5 arg 3 sz [8, 32, 1] [16, 8, 1] OPs 10M/ 449.31G mem 3.07 GB tm 3.16us/ 10.47ms ( 3318.28 GFLOPS, 341.55 GB/s)
*** 77 E_256_32_8_4_4_4_2n14 arg 8 sz [32, 256, 1] [4, 8, 1] OPs 310M/ 449.32G mem 3.07 GB tm 43.56us/ 10.52ms ( 7125.15 GFLOPS, 1974.00 GB/s)
*** 78 r_512_4_4_64_16n20 arg 3 sz [512, 1, 1] [4, 4, 1] OPs 25M/ 449.63G mem 3.07 GB tm 20.28us/ 10.54ms ( 1240.92 GFLOPS, 1654.76 GB/s)
*** 79 r_128_4_16_1024n12 arg 5 sz [128, 1, 1] [16, 4, 1] OPs 33M/ 449.66G mem 3.07 GB tm 26.56us/ 10.56ms ( 1263.51 GFLOPS, 2526.83 GB/s)
*** 80 r_512_16_64_16n32 arg 3 sz [512, 1, 1] [16, 1, 1] OPs 33M/ 449.69G mem 3.07 GB tm 16.76us/ 10.58ms ( 2002.05 GFLOPS, 2002.30 GB/s)
*** 81 r_512_2_8_64_16n26 arg 5 sz [512, 1, 1] [8, 2, 1] OPs 41M/ 449.72G mem 3.07 GB tm 16.84us/ 10.60ms ( 2490.74 GFLOPS, 1993.03 GB/s)
*** 82 E_512_32_2_16_4_4n8 arg 7 sz [32, 512, 1] [4, 16, 2] OPs 58M/ 449.77G mem 3.07 GB tm 28.80us/ 10.62ms ( 2038.90 GFLOPS, 2913.00 GB/s)
*** 83 E_65536_8_6n6 arg 2 sz [65536, 1, 1] [8, 1, 1] OPs 78M/ 449.83G mem 3.07 GB tm 11.20us/ 10.64ms ( 7021.71 GFLOPS, 2246.95 GB/s)
*** 84 E_16384_32_6_6n5 arg 3 sz [16384, 1, 1] [32, 1, 1] OPs 6606M/ 449.90G mem 3.07 GB tm 24.20us/ 10.66ms (272965.12 GFLOPS, 12998.34 GB/s)
*** 85 r_36_128_8_2_2_2_128_4_4_2_8n8 arg 3 sz [128, 36, 1] [4, 2, 8] OPs 19327M/ 456.51G mem 3.07 GB tm 918.45us/ 11.58ms (21043.47 GFLOPS, 102.75 GB/s)
*** 86 E_8192_32_6_2n4 arg 2 sz [8192, 1, 1] [32, 1, 1] OPs 113M/ 475.84G mem 3.07 GB tm 5.96us/ 11.58ms (19001.04 GFLOPS, 4926.20 GB/s)
*** 87 E_2048_2_4_4_6_4_2n8 arg 2 sz [2048, 1, 1] [4, 4, 2] OPs 113M/ 475.95G mem 3.07 GB tm 6.68us/ 11.59ms (16953.03 GFLOPS, 4395.23 GB/s)
*** 88 E_8192_16_6_4n18 arg 2 sz [8192, 1, 1] [16, 1, 1] OPs 113M/ 476.06G mem 3.07 GB tm 7.28us/ 11.60ms (15555.80 GFLOPS, 4032.99 GB/s)
*** 89 E_512_8_2_2_6_2_4_2_2n10 arg 2 sz [512, 1, 1] [2, 2, 8] OPs 113M/ 476.18G mem 3.07 GB tm 10.80us/ 11.61ms (10485.76 GFLOPS, 2718.53 GB/s)
*** 90 E_1024_32_6_4_4n4 arg 2 sz [1024, 1, 1] [32, 1, 1] OPs 113M/ 476.29G mem 3.07 GB tm 7.80us/ 11.62ms (14518.74 GFLOPS, 3764.12 GB/s)
*** 91 E_1024_32_6_2_4_2n6 arg 2 sz [1024, 1, 1] [32, 1, 1] OPs 113M/ 476.40G mem 3.07 GB tm 11.24us/ 11.63ms (10075.29 GFLOPS, 2612.11 GB/s)
*** 92 E_16384_32_6_6n8 arg 8 sz [16384, 1, 1] [32, 1, 1] OPs 4756M/ 476.52G mem 3.07 GB tm 54.20us/ 11.68ms (87755.36 GFLOPS, 5107.46 GB/s)
*** 93 r_8192_32_2_7_7_6_6n12 arg 2 sz [8192, 1, 1] [2, 32, 1] OPs 924M/ 481.27G mem 3.07 GB tm 30.84us/ 11.71ms (29988.46 GFLOPS, 61200.93 GB/s)
*** 94 E_512_32_2_16_4_4n22 arg 9 sz [32, 512, 1] [4, 16, 2] OPs 318M/ 482.20G mem 3.07 GB tm 179.96us/ 11.89ms ( 1771.30 GFLOPS, 571.05 GB/s)
*** 95 r_512_4_4_64_16n20 arg 3 sz [512, 1, 1] [4, 4, 1] OPs 25M/ 482.52G mem 3.07 GB tm 27.04us/ 11.92ms ( 930.65 GFLOPS, 1241.02 GB/s)
*** 96 r_128_4_16_1024n12 arg 5 sz [128, 1, 1] [16, 4, 1] OPs 33M/ 482.54G mem 3.07 GB tm 39.52us/ 11.96ms ( 849.19 GFLOPS, 1698.25 GB/s)
*** 97 r_512_16_64_16n32 arg 3 sz [512, 1, 1] [16, 1, 1] OPs 33M/ 482.58G mem 3.07 GB tm 16.84us/ 11.98ms ( 1992.43 GFLOPS, 1992.67 GB/s)
*** 98 r_512_2_8_64_16n26 arg 5 sz [512, 1, 1] [8, 2, 1] OPs 41M/ 482.61G mem 3.07 GB tm 17.36us/ 11.99ms ( 2416.13 GFLOPS, 1933.33 GB/s)
*** 99 E_512_128_16_4_2n16 arg 7 sz [128, 512, 1] [16, 1, 1] OPs 58M/ 482.65G mem 3.07 GB tm 28.56us/ 12.02ms ( 2055.96 GFLOPS, 2937.37 GB/s)
*** 100 E_65536_2_4_8_4_2n12 arg 4 sz [2, 65536, 1] [8, 4, 1] OPs 67M/ 482.71G mem 3.07 GB tm 201.84us/ 12.22ms ( 332.48 GFLOPS, 831.21 GB/s)
*** 101 E_32768_32_2_6n2 arg 2 sz [32768, 1, 1] [2, 32, 1] OPs 314M/ 482.78G mem 3.07 GB tm 94.88us/ 12.32ms ( 3315.45 GFLOPS, 1060.94 GB/s)
*** 102 E_32768_32_2_6_6n2 arg 3 sz [32768, 1, 1] [2, 32, 1] OPs 26424M/ 483.09G mem 3.07 GB tm 333.32us/ 12.65ms (79274.57 GFLOPS, 3774.98 GB/s)
*** 103 r_36_256_2_2_8_512_4_8_4n8 arg 3 sz [256, 36, 1] [8, 2, 2] OPs 38654M/ 509.52G mem 3.07 GB tm 1480.97us/ 14.13ms (26100.87 GFLOPS, 159.31 GB/s)
*** 104 E_16384_32_6_2n4 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 226M/ 548.17G mem 3.07 GB tm 10.64us/ 14.14ms (21286.88 GFLOPS, 5518.82 GB/s)
*** 105 E_16384_16_6_4n16 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 548.40G mem 3.07 GB tm 11.52us/ 14.16ms (19660.80 GFLOPS, 5097.25 GB/s)
*** 106 E_16384_2_2_4_6_4n14 arg 2 sz [16384, 1, 1] [4, 2, 2] OPs 226M/ 548.62G mem 3.07 GB tm 33.00us/ 14.19ms ( 6863.41 GFLOPS, 1779.40 GB/s)
*** 107 E_16384_16_6_2_2n8 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 548.85G mem 3.07 GB tm 25.68us/ 14.21ms ( 8819.45 GFLOPS, 2286.53 GB/s)
*** 108 E_32768_32_6n14 arg 2 sz [32768, 1, 1] [32, 1, 1] OPs 226M/ 549.08G mem 3.07 GB tm 37.56us/ 14.25ms ( 6030.15 GFLOPS, 1563.37 GB/s)
*** 109 E_16384_16_6_4n33 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 549.30G mem 3.07 GB tm 32.92us/ 14.28ms ( 6880.09 GFLOPS, 1783.73 GB/s)
*** 110 E_32768_32_6_6n2 arg 8 sz [32768, 1, 1] [32, 1, 1] OPs 9512M/ 549.53G mem 3.07 GB tm 338.80us/ 14.62ms (28077.32 GFLOPS, 1634.13 GB/s)
*** 111 r_32_2_10_5_16_8_7_7_4_4_2_2n6 arg 2 sz [50, 2, 32] [8, 16, 1] OPs 1284M/ 559.04G mem 3.07 GB tm 626.45us/ 15.25ms ( 2050.47 GFLOPS, 4184.62 GB/s)
*** 112 E_1024_32_8_8_2_4n12 arg 6 sz [32, 1024, 1] [2, 8, 8] OPs 587M/ 560.33G mem 3.07 GB tm 225.28us/ 15.48ms ( 2606.52 GFLOPS, 744.73 GB/s)
*** 113 r_256_2_8_64_16_4n8 arg 3 sz [256, 1, 1] [8, 2, 1] OPs 50M/ 560.91G mem 3.07 GB tm 63.32us/ 15.54ms ( 794.87 GFLOPS, 1059.85 GB/s)
*** 114 r_256_16_64_64n21 arg 5 sz [256, 1, 1] [16, 1, 1] OPs 67M/ 560.96G mem 3.07 GB tm 147.16us/ 15.69ms ( 456.04 GFLOPS, 912.06 GB/s)
*** 115 r_256_16_64_64n28 arg 3 sz [256, 1, 1] [16, 1, 1] OPs 67M/ 561.03G mem 3.07 GB tm 54.24us/ 15.74ms ( 1237.26 GFLOPS, 1237.30 GB/s)
*** 116 r_256_4_4_64_16_4n15 arg 5 sz [256, 1, 1] [4, 4, 1] OPs 83M/ 561.10G mem 3.07 GB tm 78.84us/ 15.82ms ( 1064.01 GFLOPS, 851.26 GB/s)
*** 117 E_1024_32_8_16_4n8 arg 7 sz [32, 1024, 1] [16, 8, 1] OPs 117M/ 561.18G mem 3.07 GB tm 155.20us/ 15.97ms ( 756.70 GFLOPS, 1081.03 GB/s)
*** 118 E_32768_2_16_6n12 arg 2 sz [2, 32768, 1] [16, 1, 1] OPs 157M/ 561.30G mem 3.07 GB tm 38.52us/ 16.01ms ( 4083.13 GFLOPS, 1306.60 GB/s)
*** 119 E_16384_32_2_6_6n2 arg 3 sz [16384, 1, 1] [2, 32, 1] OPs 13212M/ 561.46G mem 3.07 GB tm 48.76us/ 16.06ms (270955.43 GFLOPS, 12902.64 GB/s)
*** 120 r_36_128_4_8_16_64_4_4_4n4 arg 3 sz [4, 128, 36] [16, 8, 1] OPs 19327M/ 574.67G mem 3.07 GB tm 790.77us/ 16.85ms (24441.27 GFLOPS, 196.91 GB/s)
*** 121 E_16384_32_6_2n4 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 226M/ 594.00G mem 3.07 GB tm 10.28us/ 16.86ms (22032.34 GFLOPS, 5712.09 GB/s)
*** 122 E_16384_16_6_4n16 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 594.22G mem 3.07 GB tm 11.28us/ 16.87ms (20079.11 GFLOPS, 5205.70 GB/s)
*** 123 E_16384_2_2_4_6_4n14 arg 2 sz [16384, 1, 1] [4, 2, 2] OPs 226M/ 594.45G mem 3.07 GB tm 13.68us/ 16.89ms (16556.46 GFLOPS, 4292.42 GB/s)
*** 124 E_16384_16_6_2_2n8 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 594.68G mem 3.07 GB tm 27.20us/ 16.91ms ( 8326.93 GFLOPS, 2158.83 GB/s)
*** 125 E_32768_32_6n14 arg 2 sz [32768, 1, 1] [32, 1, 1] OPs 226M/ 594.90G mem 3.07 GB tm 27.64us/ 16.94ms ( 8194.07 GFLOPS, 2124.39 GB/s)
*** 126 E_16384_16_6_4n33 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 595.13G mem 3.07 GB tm 35.00us/ 16.98ms ( 6471.21 GFLOPS, 1677.72 GB/s)
*** 127 E_32768_32_6_6n2 arg 8 sz [32768, 1, 1] [32, 1, 1] OPs 9512M/ 595.35G mem 3.07 GB tm 347.48us/ 17.32ms (27375.96 GFLOPS, 1593.31 GB/s)
*** 128 r_32_2_10_5_16_8_7_7_4_4_2_2n6 arg 2 sz [50, 2, 32] [8, 16, 1] OPs 1284M/ 604.87G mem 3.07 GB tm 500.49us/ 17.83ms ( 2566.52 GFLOPS, 5237.80 GB/s)
*** 129 E_1024_32_8_8_2_4n15 arg 7 sz [32, 1024, 1] [2, 8, 8] OPs 603M/ 606.15G mem 3.07 GB tm 318.60us/ 18.14ms ( 1895.71 GFLOPS, 631.91 GB/s)
*** 130 r_256_2_8_64_16_4n8 arg 3 sz [256, 1, 1] [8, 2, 1] OPs 50M/ 606.76G mem 3.07 GB tm 71.60us/ 18.22ms ( 702.95 GFLOPS, 937.29 GB/s)
*** 131 r_256_16_64_64n21 arg 5 sz [256, 1, 1] [16, 1, 1] OPs 67M/ 606.81G mem 3.07 GB tm 120.12us/ 18.34ms ( 558.70 GFLOPS, 1117.38 GB/s)
*** 132 r_256_16_64_64n28 arg 3 sz [256, 1, 1] [16, 1, 1] OPs 67M/ 606.87G mem 3.07 GB tm 43.08us/ 18.38ms ( 1557.77 GFLOPS, 1557.82 GB/s)
*** 133 r_256_4_4_64_16_4n15 arg 5 sz [256, 1, 1] [4, 4, 1] OPs 83M/ 606.94G mem 3.07 GB tm 73.88us/ 18.45ms ( 1135.43 GFLOPS, 908.39 GB/s)
*** 134 E_1024_32_8_16_4n11 arg 7 sz [32, 1024, 1] [16, 8, 1] OPs 117M/ 607.02G mem 3.07 GB tm 151.20us/ 18.60ms ( 776.71 GFLOPS, 1109.62 GB/s)
*** 135 E_2097152_2_8_2n6 arg 4 sz [2097152, 1, 1] [2, 8, 2] OPs 134M/ 607.14G mem 3.07 GB tm 422.60us/ 19.03ms ( 317.60 GFLOPS, 793.99 GB/s)
*** 136 E_32768_32_4_6n2 arg 2 sz [32768, 1, 1] [4, 32, 1] OPs 629M/ 607.28G mem 3.07 GB tm 236.04us/ 19.26ms ( 2665.40 GFLOPS, 852.93 GB/s)
*** 137 E_32768_32_4_6_6n2 arg 3 sz [32768, 1, 1] [4, 32, 1] OPs 52848M/ 607.91G mem 3.07 GB tm 902.29us/ 20.16ms (58571.29 GFLOPS, 2789.11 GB/s)
*** 138 r_36_1024_8_4_256_4_4_2n8 arg 3 sz [1024, 36, 1] [4, 8, 1] OPs 19327M/ 660.75G mem 3.07 GB tm 812.41us/ 20.98ms (23790.20 GFLOPS, 466.10 GB/s)
*** 139 E_16384_32_6_2n11 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 226M/ 680.08G mem 3.07 GB tm 10.56us/ 20.99ms (21448.15 GFLOPS, 5560.63 GB/s)
*** 140 E_16384_2_4_2_6_4n26 arg 2 sz [16384, 1, 1] [2, 4, 2] OPs 226M/ 680.31G mem 3.07 GB tm 24.72us/ 21.01ms ( 9161.94 GFLOPS, 2375.32 GB/s)
*** 141 E_16384_16_6_4n52 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 680.53G mem 3.07 GB tm 25.60us/ 21.04ms ( 8847.36 GFLOPS, 2293.76 GB/s)
*** 142 E_16384_16_6_4n59 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 680.76G mem 3.07 GB tm 25.84us/ 21.06ms ( 8765.19 GFLOPS, 2272.46 GB/s)
*** 143 E_16384_2_16_6_2n12 arg 2 sz [16384, 1, 1] [16, 2, 1] OPs 226M/ 680.99G mem 3.07 GB tm 27.00us/ 21.09ms ( 8388.61 GFLOPS, 2174.83 GB/s)
*** 144 E_16384_16_2_6_2n30 arg 2 sz [16384, 1, 1] [2, 16, 1] OPs 226M/ 681.21G mem 3.07 GB tm 30.44us/ 21.12ms ( 7440.37 GFLOPS, 1928.99 GB/s)
*** 145 E_32768_32_6_6n5 arg 8 sz [32768, 1, 1] [32, 1, 1] OPs 9512M/ 681.44G mem 3.07 GB tm 337.36us/ 21.46ms (28197.17 GFLOPS, 1641.11 GB/s)
*** 146 r_32_2_18_9_32_7_7_4_2_2_4n4 arg 2 sz [162, 2, 32] [32, 1, 1] OPs 1040M/ 690.95G mem 3.07 GB tm 649.21us/ 22.11ms ( 1602.65 GFLOPS, 3270.71 GB/s)
*** 147 E_1024_32_2_16_4_4n14 arg 6 sz [32, 1024, 1] [4, 16, 2] OPs 587M/ 691.99G mem 3.07 GB tm 202.20us/ 22.31ms ( 2904.04 GFLOPS, 829.73 GB/s)
*** 148 r_1024_64_2_8_16n22 arg 4 sz [2, 64, 1024] [8, 1, 1] OPs 67M/ 692.58G mem 3.07 GB tm 196.40us/ 22.51ms ( 341.69 GFLOPS, 704.74 GB/s)
*** 149 r_64_16_4_4_16_4n34 arg 2 sz [64, 1, 1] [4, 16, 1] OPs 1M/ 692.65G mem 3.07 GB tm 7.32us/ 22.51ms ( 143.25 GFLOPS, 573.03 GB/s)
*** 150 r_64_64_4_16_4n34 arg 3 sz [64, 1, 1] [64, 1, 1] OPs 1M/ 692.65G mem 3.07 GB tm 5.56us/ 22.52ms ( 188.72 GFLOPS, 754.47 GB/s)
*** 151 r_512_64_16_16_2n24 arg 4 sz [64, 512, 1] [16, 1, 1] OPs 83M/ 692.65G mem 3.07 GB tm 61.96us/ 22.58ms ( 1353.87 GFLOPS, 1150.80 GB/s)
*** 152 r_512_64_4_2_2_16_2n10 arg 3 sz [64, 512, 1] [2, 2, 4] OPs 67M/ 692.73G mem 3.07 GB tm 87.28us/ 22.67ms ( 768.88 GFLOPS, 816.94 GB/s)
*** 153 r_64_16_4_4_16_4n34 arg 2 sz [64, 1, 1] [4, 16, 1] OPs 1M/ 692.80G mem 3.07 GB tm 4.24us/ 22.67ms ( 247.31 GFLOPS, 989.28 GB/s)
*** 154 r_64_8_8_4_16_4n26 arg 3 sz [64, 1, 1] [8, 8, 1] OPs 1M/ 692.80G mem 3.07 GB tm 5.88us/ 22.68ms ( 178.35 GFLOPS, 713.40 GB/s)
*** 155 E_1024_32_2_16_4_4n17 arg 8 sz [32, 1024, 1] [4, 16, 2] OPs 687M/ 692.80G mem 3.07 GB tm 132.68us/ 22.81ms ( 5184.36 GFLOPS, 758.70 GB/s)
*** 156 E_8192_32_4_6n10 arg 2 sz [8192, 1, 1] [4, 32, 1] OPs 157M/ 693.49G mem 3.07 GB tm 19.92us/ 22.83ms ( 7895.90 GFLOPS, 2526.69 GB/s)
*** 157 E_8192_32_4_6_6n2 arg 3 sz [8192, 1, 1] [4, 32, 1] OPs 13212M/ 693.65G mem 3.07 GB tm 59.44us/ 22.89ms (222271.79 GFLOPS, 10584.37 GB/s)
*** 158 r_36_1024_2_8_4_4_4_4_4_2_2n6 arg 3 sz [2, 1024, 36] [4, 8, 1] OPs 4831M/ 706.86G mem 3.07 GB tm 212.60us/ 23.10ms (22727.15 GFLOPS, 711.61 GB/s)
*** 159 E_16384_32_6_2n11 arg 2 sz [16384, 1, 1] [32, 1, 1] OPs 226M/ 711.69G mem 3.07 GB tm 10.20us/ 23.11ms (22205.14 GFLOPS, 5756.89 GB/s)
*** 160 E_16384_2_4_2_6_4n26 arg 2 sz [16384, 1, 1] [2, 4, 2] OPs 226M/ 711.92G mem 3.07 GB tm 10.84us/ 23.12ms (20894.13 GFLOPS, 5417.00 GB/s)
*** 161 E_16384_16_6_4n52 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 712.14G mem 3.07 GB tm 13.00us/ 23.14ms (17422.49 GFLOPS, 4516.94 GB/s)
*** 162 E_16384_16_6_4n59 arg 2 sz [16384, 1, 1] [16, 1, 1] OPs 226M/ 712.37G mem 3.07 GB tm 31.16us/ 23.17ms ( 7268.46 GFLOPS, 1884.42 GB/s)
*** 163 E_16384_2_16_6_2n12 arg 2 sz [16384, 1, 1] [16, 2, 1] OPs 226M/ 712.60G mem 3.07 GB tm 39.40us/ 23.21ms ( 5748.54 GFLOPS, 1490.36 GB/s)
*** 164 E_16384_16_2_6_2n30 arg 2 sz [16384, 1, 1] [2, 16, 1] OPs 226M/ 712.82G mem 3.07 GB tm 70.52us/ 23.28ms ( 3211.75 GFLOPS, 832.68 GB/s)
*** 165 E_32768_32_6_6n5 arg 8 sz [32768, 1, 1] [32, 1, 1] OPs 9512M/ 713.05G mem 3.07 GB tm 420.28us/ 23.70ms (22633.94 GFLOPS, 1317.32 GB/s)
*** 166 r_32_2_18_9_32_7_7_4_2_2_4n4 arg 2 sz [162, 2, 32] [32, 1, 1] OPs 1040M/ 722.56G mem 3.07 GB tm 636.29us/ 24.33ms ( 1635.19 GFLOPS, 3337.13 GB/s)
*** 167 E_1024_32_2_16_4_4n20 arg 7 sz [32, 1024, 1] [4, 16, 2] OPs 603M/ 723.60G mem 3.07 GB tm 275.48us/ 24.61ms ( 2192.44 GFLOPS, 730.82 GB/s)
*** 168 r_32768_32_16n6 arg 2 sz [32768, 1, 1] [32, 1, 1] OPs 16M/ 724.21G mem 3.07 GB tm 77.36us/ 24.69ms ( 216.87 GFLOPS, 921.69 GB/s)
*** 169 r_64_16_4_4_16_4n34 arg 2 sz [64, 1, 1] [4, 16, 1] OPs 1M/ 724.22G mem 3.07 GB tm 4.36us/ 24.69ms ( 240.50 GFLOPS, 962.06 GB/s)
*** 170 E_8_2_4n20 arg 4 sz [8, 1, 1] [1, 1, 1] OPs 0M/ 724.23G mem 3.07 GB tm 1.80us/ 24.69ms ( 0.14 GFLOPS, 0.57 GB/s)
*** 171 r_32768_32_16n6 arg 2 sz [32768, 1, 1] [32, 1, 1] OPs 16M/ 724.23G mem 3.07 GB tm 87.24us/ 24.78ms ( 192.31 GFLOPS, 817.31 GB/s)
*** 172 r_64_16_4_4_16_4n34 arg 2 sz [64, 1, 1] [4, 16, 1] OPs 1M/ 724.24G mem 3.07 GB tm 4.48us/ 24.79ms ( 234.06 GFLOPS, 936.29 GB/s)
*** 173 E_8_2_4n20 arg 4 sz [8, 1, 1] [1, 1, 1] OPs 0M/ 724.24G mem 3.07 GB tm 1.64us/ 24.79ms ( 0.16 GFLOPS, 0.63 GB/s)
*** 174 r_256_16_64_64n39 arg 2 sz [256, 1, 1] [16, 1, 1] OPs 16M/ 724.24G mem 3.07 GB tm 84.16us/ 24.87ms ( 199.35 GFLOPS, 797.40 GB/s)
*** 175 E_32_2_4n16 arg 4 sz [32, 1, 1] [1, 1, 1] OPs 0M/ 724.26G mem 3.07 GB tm 1.84us/ 24.87ms ( 0.56 GFLOPS, 2.23 GB/s)
*** 176 r_256_16_64_64n39 arg 2 sz [256, 1, 1] [16, 1, 1] OPs 16M/ 724.26G mem 3.07 GB tm 79.84us/ 24.95ms ( 210.14 GFLOPS, 840.55 GB/s)
*** 177 E_32_2_4n16 arg 4 sz [32, 1, 1] [1, 1, 1] OPs 0M/ 724.28G mem 3.07 GB tm 1.68us/ 24.95ms ( 0.61 GFLOPS, 2.44 GB/s)
*** 178 r_512_16_64_16n45 arg 2 sz [512, 1, 1] [16, 1, 1] OPs 8M/ 724.28G mem 3.07 GB tm 40.76us/ 25.00ms ( 205.80 GFLOPS, 823.27 GB/s)
*** 179 E_256_2n17 arg 4 sz [256, 1, 1] [1, 1, 1] OPs 0M/ 724.29G mem 3.07 GB tm 1.96us/ 25.00ms ( 1.04 GFLOPS, 4.18 GB/s)
*** 180 r_512_16_64_16n45 arg 2 sz [512, 1, 1] [16, 1, 1] OPs 8M/ 724.29G mem 3.07 GB tm 40.48us/ 25.04ms ( 207.23 GFLOPS, 828.96 GB/s)
*** 181 E_256_2n17 arg 4 sz [256, 1, 1] [1, 1, 1] OPs 0M/ 724.29G mem 3.07 GB tm 1.84us/ 25.04ms ( 1.11 GFLOPS, 4.46 GB/s)
*** 182 E_8_4_2n16 arg 5 sz [8, 1, 1] [4, 1, 1] OPs 0M/ 724.29G mem 3.07 GB tm 2.12us/ 25.04ms ( 0.18 GFLOPS, 0.49 GB/s)
*** 183 E_8_4_2n16 arg 5 sz [8, 1, 1] [4, 1, 1] OPs 0M/ 724.29G mem 3.07 GB tm 1.68us/ 25.04ms ( 0.23 GFLOPS, 0.62 GB/s)
*** 184 E_128_2n25 arg 5 sz [128, 1, 1] [1, 1, 1] OPs 0M/ 724.29G mem 3.07 GB tm 1.84us/ 25.05ms ( 0.83 GFLOPS, 2.23 GB/s)
*** 185 E_128_2n25 arg 5 sz [128, 1, 1] [1, 1, 1] OPs 0M/ 724.29G mem 3.07 GB tm 1.56us/ 25.05ms ( 0.98 GFLOPS, 2.63 GB/s)
*** 186 E_64_2_4n18 arg 5 sz [64, 1, 1] [1, 1, 1] OPs 0M/ 724.29G mem 3.07 GB tm 1.96us/ 25.05ms ( 1.57 GFLOPS, 4.18 GB/s)
*** 187 E_64_2_4n18 arg 5 sz [64, 1, 1] [1, 1, 1] OPs 0M/ 724.29G mem 3.07 GB tm 1.64us/ 25.05ms ( 1.87 GFLOPS, 5.00 GB/s)
*** 188 E_1048576_4_2_2_4n8 arg 3 sz [1048576, 1, 1] [2, 2, 4] OPs 134M/ 724.29G mem 3.07 GB tm 327.68us/ 25.38ms ( 409.60 GFLOPS, 921.59 GB/s)
*** 189 r_524288_32_4n17 arg 2 sz [524288, 1, 1] [32, 1, 1] OPs 67M/ 724.43G mem 3.07 GB tm 221.92us/ 25.60ms ( 302.40 GFLOPS, 755.99 GB/s)
*** 190 r_1024_64_2_8_16n22 arg 4 sz [2, 64, 1024] [8, 1, 1] OPs 67M/ 724.50G mem 3.07 GB tm 176.88us/ 25.78ms ( 379.40 GFLOPS, 782.52 GB/s)
*** 191 r_64_16_4_4_16_4n34 arg 2 sz [64, 1, 1] [4, 16, 1] OPs 1M/ 724.56G mem 3.07 GB tm 6.72us/ 25.78ms ( 156.04 GFLOPS, 624.19 GB/s)
*** 192 r_64_64_4_16_4n34 arg 3 sz [64, 1, 1] [64, 1, 1] OPs 1M/ 724.56G mem 3.07 GB tm 5.24us/ 25.79ms ( 200.24 GFLOPS, 800.54 GB/s)
*** 193 r_512_64_16_16_2n24 arg 4 sz [64, 512, 1] [16, 1, 1] OPs 83M/ 724.56G mem 3.07 GB tm 58.76us/ 25.85ms ( 1427.58 GFLOPS, 1213.45 GB/s)
*** 194 r_512_64_4_2_2_16_2n10 arg 3 sz [64, 512, 1] [2, 2, 4] OPs 67M/ 724.65G mem 3.07 GB tm 87.56us/ 25.94ms ( 766.42 GFLOPS, 814.33 GB/s)
*** 195 r_64_16_4_4_16_4n34 arg 2 sz [64, 1, 1] [4, 16, 1] OPs 1M/ 724.72G mem 3.07 GB tm 4.40us/ 25.94ms ( 238.31 GFLOPS, 953.31 GB/s)
*** 196 r_64_8_8_4_16_4n26 arg 3 sz [64, 1, 1] [8, 8, 1] OPs 1M/ 724.72G mem 3.07 GB tm 5.36us/ 25.95ms ( 195.65 GFLOPS, 782.62 GB/s)
*** 197 E_1024_32_2_16_4_4n23 arg 9 sz [32, 1024, 1] [4, 16, 2] OPs 704M/ 724.72G mem 3.07 GB tm 188.24us/ 26.13ms ( 3743.28 GFLOPS, 713.01 GB/s)
*** 198 E_1048576_2_16_2n22 arg 4 sz [1048576, 1, 1] [2, 16, 2] OPs 134M/ 725.42G mem 3.07 GB tm 413.16us/ 26.55ms ( 324.85 GFLOPS, 812.13 GB/s)
*** 199 E_32768_16_8_6n2 arg 2 sz [32768, 1, 1] [8, 16, 1] OPs 629M/ 725.56G mem 3.07 GB tm 124.44us/ 26.67ms ( 5055.73 GFLOPS, 1617.83 GB/s)
*** 200 E_32768_16_8_6_6n2 arg 3 sz [32768, 1, 1] [8, 16, 1] OPs 52848M/ 726.19G mem 3.07 GB tm 936.49us/ 27.61ms (56432.30 GFLOPS, 2687.25 GB/s)
*** 201 r_36_1024_8_4_8_4_4_4_4_2n18 arg 3 sz [8, 1024, 36] [8, 4, 1] OPs 9663M/ 779.03G mem 3.07 GB tm 613.21us/ 28.22ms (15759.27 GFLOPS, 738.96 GB/s)
*** 202 E_32768_32_6_2n4 arg 2 sz [32768, 1, 1] [32, 1, 1] OPs 452M/ 788.70G mem 3.07 GB tm 20.80us/ 28.24ms (21778.12 GFLOPS, 5646.18 GB/s)
*** 203 E_32768_16_6_4n10 arg 2 sz [32768, 1, 1] [16, 1, 1] OPs 452M/ 789.15G mem 3.07 GB tm 51.52us/ 28.29ms ( 8792.24 GFLOPS, 2279.47 GB/s)
*** 204 E_32768_16_6_4n19 arg 2 sz [32768, 1, 1] [16, 1, 1] OPs 452M/ 789.60G mem 3.07 GB tm 69.32us/ 28.36ms ( 6534.60 GFLOPS, 1694.16 GB/s)
*** 205 E_32768_16_6_4n30 arg 2 sz [32768, 1, 1] [16, 1, 1] OPs 452M/ 790.06G mem 3.07 GB tm 58.52us/ 28.42ms ( 7740.55 GFLOPS, 2006.81 GB/s)
*** 206 E_32768_2_4_2_6_4n12 arg 2 sz [32768, 1, 1] [2, 4, 2] OPs 452M/ 790.51G mem 3.07 GB tm 74.24us/ 28.50ms ( 6101.55 GFLOPS, 1581.88 GB/s)
*** 207 E_32768_16_6_4n47 arg 2 sz [32768, 1, 1] [16, 1, 1] OPs 452M/ 790.96G mem 3.07 GB tm 73.36us/ 28.57ms ( 6174.74 GFLOPS, 1600.86 GB/s)
*** 208 E_131072_16_6_6n8 arg 8 sz [131072, 1, 1] [16, 1, 1] OPs 19025M/ 791.42G mem 3.07 GB tm 637.09us/ 29.21ms (29863.10 GFLOPS, 1738.06 GB/s)
*** 209 r_32_8_34_17_4_2_4_7_7_4_2n6 arg 2 sz [578, 8, 32] [4, 2, 4] OPs 1856M/ 810.44G mem 3.07 GB tm 1026.73us/ 30.23ms ( 1807.79 GFLOPS, 3689.36 GB/s)
*** 210 E_32768_2_16_8_4n10 arg 3 sz [2, 32768, 1] [8, 16, 1] OPs 1040M/ 812.30G mem 3.07 GB tm 256.24us/ 30.49ms ( 4059.38 GFLOPS, 785.69 GB/s)
*** 211 r_1024_32_3_4_4_2_8_4_2_2n18 arg 3 sz [3, 32, 1024] [2, 4, 4] OPs 805M/ 813.34G mem 3.07 GB tm 236.00us/ 30.72ms ( 3412.29 GFLOPS, 497.63 GB/s)
*** 212 r_32_2_3_16_64_8_4_2n10 arg 2 sz [2, 32, 1] [16, 3, 1] OPs 12M/ 814.14G mem 3.07 GB tm 31.12us/ 30.76ms ( 404.34 GFLOPS, 808.70 GB/s)
*** 213 E_3_32_4n14 arg 4 sz [3, 1, 1] [32, 1, 1] OPs 0M/ 814.15G mem 3.07 GB tm 1.80us/ 30.76ms ( 0.85 GFLOPS, 1.71 GB/s)
*** 214 r_36_8_4_8_8_1024_16_4n26 arg 3 sz [4, 8, 36] [8, 8, 1] OPs 9663M/ 814.15G mem 3.07 GB tm 731.25us/ 31.49ms (13215.34 GFLOPS, 619.67 GB/s)
*** 215 E_64_32_3_3n2 arg 2 sz [64, 1, 1] [32, 1, 1] OPs 7M/ 823.82G mem 3.07 GB tm 3.36us/ 31.49ms ( 2139.43 GFLOPS, 249.93 GB/s)
*** 216 E_192_32_3n20 arg 4 sz [192, 1, 1] [32, 1, 1] OPs 0M/ 823.83G mem 3.07 GB tm 1.92us/ 31.49ms ( 38.40 GFLOPS, 76.80 GB/s)
*** 217 r_36_4_4_4_16_1024_16_4n18 arg 3 sz [4, 4, 36] [16, 4, 1] OPs 4831M/ 823.83G mem 3.07 GB tm 312.72us/ 31.81ms (15450.86 GFLOPS, 483.78 GB/s)
*** 218 E_128_32_3_3n2 arg 2 sz [128, 1, 1] [32, 1, 1] OPs 14M/ 828.66G mem 3.07 GB tm 3.32us/ 31.81ms ( 4330.41 GFLOPS, 505.85 GB/s)
*** 219 E_384_32_3n18 arg 4 sz [384, 1, 1] [32, 1, 1] OPs 0M/ 828.67G mem 3.07 GB tm 2.16us/ 31.81ms ( 68.27 GFLOPS, 136.54 GB/s)
*** 220 r_36_8_8_16_1024_16_4_4n2 arg 3 sz [8, 36, 1] [16, 8, 1] OPs 19327M/ 828.67G mem 3.07 GB tm 1315.29us/ 33.13ms (14694.33 GFLOPS, 287.90 GB/s)
*** 221 E_256_32_2_3_3n8 arg 2 sz [256, 1, 1] [2, 32, 1] OPs 57M/ 848.00G mem 3.07 GB tm 3.92us/ 33.13ms (14670.37 GFLOPS, 1713.65 GB/s)
*** 222 E_1536_32_3n18 arg 4 sz [1536, 1, 1] [32, 1, 1] OPs 0M/ 848.06G mem 3.07 GB tm 2.84us/ 33.13ms ( 207.68 GFLOPS, 415.37 GB/s)
*** 223 r_36_16_16_2_4_2_2_1024_4_4_2n4 arg 3 sz [16, 16, 36] [4, 4, 2] OPs 19327M/ 848.06G mem 3.07 GB tm 895.77us/ 34.03ms (21576.27 GFLOPS, 173.83 GB/s)
*** 224 E_1024_4_8_3_3_2n10 arg 2 sz [1024, 1, 1] [8, 4, 1] OPs 230M/ 867.38G mem 3.07 GB tm 6.64us/ 34.04ms (34643.28 GFLOPS, 4046.66 GB/s)
*** 225 E_6144_32_3n21 arg 4 sz [6144, 1, 1] [32, 1, 1] OPs 2M/ 867.61G mem 3.07 GB tm 5.32us/ 34.04ms ( 443.48 GFLOPS, 886.95 GB/s)
*** 226 r_36_32_2_16_4_1024_4_4_4_2n16 arg 3 sz [2, 32, 36] [4, 16, 1] OPs 38654M/ 867.62G mem 3.07 GB tm 1462.85us/ 35.50ms (26424.17 GFLOPS, 161.28 GB/s)
*** 227 E_1024_32_3_3_4n6 arg 2 sz [1024, 1, 1] [32, 1, 1] OPs 460M/ 906.27G mem 3.07 GB tm 11.12us/ 35.52ms (41372.55 GFLOPS, 4832.70 GB/s)
*** 228 E_12288_32_3n18 arg 4 sz [12288, 1, 1] [32, 1, 1] OPs 4M/ 906.73G mem 3.07 GB tm 9.80us/ 35.53ms ( 481.49 GFLOPS, 962.98 GB/s)
*** 229 r_36_8_16_4_4_4_1024_4_4_2n12 arg 3 sz [16, 8, 36] [4, 4, 4] OPs 19327M/ 906.74G mem 3.07 GB tm 746.21us/ 36.27ms (25900.79 GFLOPS, 126.47 GB/s)
*** 230 E_8192_32_3_3n12 arg 2 sz [8192, 1, 1] [32, 1, 1] OPs 920M/ 926.06G mem 3.07 GB tm 12.52us/ 36.28ms (73486.58 GFLOPS, 8583.91 GB/s)
*** 231 E_24576_32_3n18 arg 4 sz [24576, 1, 1] [32, 1, 1] OPs 9M/ 926.98G mem 3.07 GB tm 18.88us/ 36.30ms ( 499.82 GFLOPS, 999.65 GB/s)
*** 232 r_10_32_4_8_128_4n6 arg 3 sz [32, 10, 1] [8, 4, 1] OPs 10M/ 926.99G mem 3.07 GB tm 7.36us/ 36.31ms ( 1424.70 GFLOPS, 146.64 GB/s)
*** 233 E_160_8_4n18 arg 4 sz [160, 1, 1] [8, 1, 1] OPs 0M/ 927.00G mem 3.07 GB tm 1.88us/ 36.31ms ( 10.89 GFLOPS, 21.79 GB/s)
*** 234 E_96_4n20 arg 5 sz [96, 1, 1] [1, 1, 1] OPs 0M/ 927.00G mem 3.07 GB tm 1.92us/ 36.31ms ( 1.20 GFLOPS, 1.60 GB/s)
*** 235 E_288_32_2n20 arg 5 sz [288, 1, 1] [32, 1, 1] OPs 0M/ 927.00G mem 3.07 GB tm 1.92us/ 36.32ms ( 57.60 GFLOPS, 76.80 GB/s)
*** 236 E_384_32_3n25 arg 5 sz [384, 1, 1] [32, 1, 1] OPs 0M/ 927.00G mem 3.07 GB tm 2.20us/ 36.32ms ( 100.54 GFLOPS, 134.05 GB/s)
*** 237 E_2304_32_2n21 arg 5 sz [2304, 1, 1] [32, 1, 1] OPs 0M/ 927.00G mem 3.07 GB tm 2.92us/ 36.32ms ( 302.99 GFLOPS, 403.99 GB/s)
*** 238 E_6144_32_3n28 arg 5 sz [6144, 1, 1] [32, 1, 1] OPs 3M/ 927.01G mem 3.07 GB tm 6.00us/ 36.33ms ( 589.82 GFLOPS, 786.43 GB/s)
*** 239 E_12288_32_3n31 arg 5 sz [12288, 1, 1] [32, 1, 1] OPs 7M/ 927.01G mem 3.07 GB tm 10.92us/ 36.34ms ( 648.16 GFLOPS, 864.21 GB/s)
*** 240 E_24576_32_3n25 arg 5 sz [24576, 1, 1] [32, 1, 1] OPs 14M/ 927.02G mem 3.07 GB tm 8.48us/ 36.35ms ( 1669.31 GFLOPS, 2225.75 GB/s)
*** 241 E_40_32_4n12 arg 5 sz [40, 1, 1] [32, 1, 1] OPs 0M/ 927.03G mem 3.07 GB tm 1.84us/ 36.35ms ( 16.70 GFLOPS, 22.26 GB/s)
*** 242 E_n14 arg 2 sz [1, 1, 1] [1, 1, 1] OPs 0M/ 927.03G mem 3.07 GB tm 1.84us/ 36.35ms ( 0.00 GFLOPS, 0.00 GB/s)
*** 243 E_n4 arg 4 sz [1, 1, 1] [1, 1, 1] OPs 0M/ 927.03G mem 3.07 GB tm 1.92us/ 36.35ms ( 0.01 GFLOPS, 0.01 GB/s)
*** 244 E_n14 arg 2 sz [1, 1, 1] [1, 1, 1] OPs 0M/ 927.03G mem 3.07 GB tm 1.72us/ 36.35ms ( 0.00 GFLOPS, 0.00 GB/s)
*** 245 E_n4 arg 4 sz [1, 1, 1] [1, 1, 1] OPs 0M/ 927.03G mem 3.07 GB tm 1.68us/ 36.36ms ( 0.01 GFLOPS, 0.01 GB/s)
*** 246 E_256_4n14 arg 2 sz [256, 1, 1] [4, 1, 1] OPs 0M/ 927.03G mem 3.07 GB tm 1.84us/ 36.36ms ( 1.11 GFLOPS, 2.23 GB/s)
*** 247 r_32_16_10_2n14 arg 5 sz [16, 32, 1] [10, 1, 1] OPs 0M/ 927.03G mem 3.07 GB tm 2.12us/ 36.36ms ( 28.98 GFLOPS, 22.22 GB/s)
*** 248 r_64_4_4n10 arg 2 sz [1, 1, 1] [64, 1, 1] OPs 0M/ 927.03G mem 3.07 GB tm 2.36us/ 36.36ms ( 1.30 GFLOPS, 0.87 GB/s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment