Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/417b80f2c61ffd6fa31a37135e45cfb6 to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/417b80f2c61ffd6fa31a37135e45cfb6 to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 82.343 (vs. 71.565, 15.06%↑) 82.466 0.617
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 90.556 (vs. 79.760, 13.53%↑) 90.544 0.582

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 90.536 (vs. 94.234, 3.92%↓) 90.149 1.420
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 71.388 (vs. 74.206, 3.80%↓) 71.352 0.641
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 72.086 (vs. 74.629, 3.41%↓) 72.125 0.188
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 102.992 (vs. 105.475, 2.35%↓) 102.969 0.276
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 94.716 (vs. 96.746, 2.10%↓) 94.748 0.147
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 121.806 (vs. 123.899, 1.69%↓) 121.793 0.373
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 78.440 (vs. 79.380, 1.18%↓) 78.294 0.812
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 261.407 (vs. 259.732, 0.65%↑) 261.295 0.540
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 260.906 (vs. 260.529, 0.14%↑) 260.497 0.977

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1667 (vs. 1373, 21.41%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 24141 (vs. 22517, 7.21%↑) 144032 (vs. 144032, 0.00%) 399877 (vs. 399877, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 42854 (vs. 42021, 1.98%↑) 238720 (vs. 238720, 0.00%) 10458245 (vs. 10458245, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 30293 (vs. 36458, 16.91%↓) 177680 (vs. 177680, 0.00%) 2959045 (vs. 2959045, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 58717 (vs. 61682, 4.81%↓) 680000 (vs. 680000, 0.00%) 5603397 (vs. 5603397, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 28290 (vs. 22395, 26.32%↑) 174400 (vs. 174400, 0.00%) 17094405 (vs. 17094405, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 31694 (vs. 35493, 10.70%↓) 190096 (vs. 190096, 0.00%) 14173189 (vs. 14173189, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 63370 (vs. 64315, 1.47%↓) 568928 (vs. 568928, 0.00%) 4219653 (vs. 4219653, 0.00%) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 53208 (vs. 48130, 10.55%↑) 287600 (vs. 287600, 0.00%) 18229253 (vs. 18229253, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 25533 (vs. 23339, 9.40%↑) 142288 (vs. 142288, 0.00%) 5195653 (vs. 5195653, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 64970 (vs. 70348, 7.64%↓) 84496 (vs. 84496, 0.00%) 99926661 (vs. 99926661, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 67765 (vs. 72124, 6.04%↓) 93056 (vs. 93056, 0.00%) 98443077 (vs. 98443077, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 279518 (vs. 288262, 3.03%↓) 5843296 (vs. 5843296, 0.00%) 32142341 (vs. 32142341, 0.00%) 1102 (vs. 1102, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 101671 (vs. 104332, 2.55%↓) 216304 (vs. 216304, 0.00%) 164497708 (vs. 164497708, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 31261 (vs. 33190, 5.81%↓) 59344 (vs. 59344, 0.00%) 133996207 (vs. 133996207, 0.00%) 209 (vs. 209, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 36802 (vs. 34832, 5.66%↑) 27920 (vs. 27920, 0.00%) 652755092 (vs. 652755092, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 30593 (vs. 31128, 1.72%↓) 14736 (vs. 14736, 0.00%) 652736084 (vs. 652736084, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 38274 (vs. 39199, 2.36%↓) 68608 (vs. 68608, 0.00%) 533841087 (vs. 533841087, 0.00%) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 31582 (vs. 38808, 18.62%↓) 50128 (vs. 50128, 0.00%) 1336025023 (vs. 1336025023, 0.00%) 413 (vs. 413, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1649 (vs. 1359, 21.34%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 22100 (vs. 26605, 16.93%↓) 107248 (vs. 107248, 0.00%) 371397 (vs. 371397, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 33964 (vs. 32964, 3.03%↑) 96944 (vs. 96944, 0.00%) 10398725 (vs. 10398725, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 26124 (vs. 29310, 10.87%↓) 113392 (vs. 113392, 0.00%) 2921349 (vs. 2921349, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 49229 (vs. 48475, 1.56%↑) 269104 (vs. 269104, 0.00%) 5219205 (vs. 5219205, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 18213 (vs. 18496, 1.53%↓) 60288 (vs. 60288, 0.00%) 17017925 (vs. 17017925, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 31046 (vs. 30252, 2.62%↑) 95248 (vs. 95248, 0.00%) 14133765 (vs. 14133765, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 53914 (vs. 55964, 3.66%↓) 330048 (vs. 330048, 0.00%) 4003653 (vs. 4003653, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 42547 (vs. 43730, 2.71%↓) 136080 (vs. 136080, 0.00%) 18361413 (vs. 18361413, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 15891 (vs. 16867, 5.79%↓) 44208 (vs. 44208, 0.00%) 5147973 (vs. 5147973, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 85660 (vs. 88421, 3.12%↓) 52400 (vs. 52400, 0.00%) 100083141 (vs. 100083141, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 84363 (vs. 84235, 0.15%↑) 52832 (vs. 52832, 0.00%) 98588357 (vs. 98588357, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 240443 (vs. 252326, 4.71%↓) 2610240 (vs. 2610240, 0.00%) 29088325 (vs. 29088325, 0.00%) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 107673 (vs. 114110, 5.64%↓) 154864 (vs. 154864, 0.00%) 169910060 (vs. 169910060, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 36250 (vs. 36269, 0.05%↓) 37936 (vs. 37936, 0.00%) 219509935 (vs. 219509935, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 42104 (vs. 43196, 2.53%↓) 19328 (vs. 19328, 0.00%) 992542612 (vs. 992542612, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 39664 (vs. 40524, 2.12%↓) 12864 (vs. 12864, 0.00%) 992543572 (vs. 992543572, 0.00%) 367 (vs. 367, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 45146 (vs. 45548, 0.88%↓) 40144 (vs. 40144, 0.00%) 875939327 (vs. 875939327, 0.00%) 334 (vs. 334, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 53778 (vs. 53729, 0.09%↑) 43840 (vs. 43840, 0.00%) 1336053183 (vs. 1336053183, 0.00%) 654 (vs. 654, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1618 (vs. 1502, 7.72%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 24317 (vs. 26360, 7.75%↓) 99488 (vs. 99488, 0.00%) 363653 (vs. 363653, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 33156 (vs. 35192, 5.79%↓) 98656 (vs. 98656, 0.00%) 10400453 (vs. 10400453, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 28646 (vs. 34459, 16.87%↓) 117504 (vs. 117504, 0.00%) 2925445 (vs. 2925445, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 48886 (vs. 49212, 0.66%↓) 257952 (vs. 257952, 0.00%) 5208069 (vs. 5208069, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 18589 (vs. 19833, 6.27%↓) 65376 (vs. 65376, 0.00%) 17023045 (vs. 17023045, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 30827 (vs. 30440, 1.27%↑) 99104 (vs. 99104, 0.00%) 14137669 (vs. 14137669, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 57243 (vs. 59631, 4.00%↓) 323792 (vs. 323792, 0.00%) 3997381 (vs. 3997381, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 43487 (vs. 43323, 0.38%↑) 122656 (vs. 122656, 0.00%) 18348037 (vs. 18348037, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 17157 (vs. 19185, 10.57%↓) 47200 (vs. 47200, 0.00%) 5150981 (vs. 5150981, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 81894 (vs. 89434, 8.43%↓) 38880 (vs. 38880, 0.00%) 100069637 (vs. 100069637, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 84826 (vs. 89049, 4.74%↓) 39296 (vs. 39296, 0.00%) 98574789 (vs. 98574789, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 228010 (vs. 242646, 6.03%↓) 2599872 (vs. 2599872, 0.00%) 29077957 (vs. 29077957, 0.00%) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 109785 (vs. 111783, 1.79%↓) 140240 (vs. 140240, 0.00%) 169895468 (vs. 169895468, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 40362 (vs. 38513, 4.80%↑) 33904 (vs. 33904, 0.00%) 219505903 (vs. 219505903, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 46210 (vs. 45915, 0.64%↑) 18640 (vs. 18640, 0.00%) 992541972 (vs. 992541972, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 48738 (vs. 46486, 4.84%↑) 12336 (vs. 12336, 0.00%) 992543060 (vs. 992543060, 0.00%) 367 (vs. 367, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 47430 (vs. 49740, 4.64%↓) 34928 (vs. 34928, 0.00%) 875934143 (vs. 875934143, 0.00%) 334 (vs. 334, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 61156 (vs. 53159, 15.04%↑) 39360 (vs. 39360, 0.00%) 1336048703 (vs. 1336048703, 0.00%) 654 (vs. 654, 0.00%)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 89075 (vs. 97956, 9.07%↓) 844192 (vs. 844192, 0.00%) 165158932 (vs. 165158932, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 22817 (vs. 25713, 11.26%↓) 145212 (vs. 145212, 0.00%) 134088431 (vs. 134088431, 0.00%) 209 (vs. 209, 0.00%)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 33946 (vs. 35234, 3.66%↓) 227968 (vs. 227968, 0.00%) 534005858 (vs. 534005858, 0.00%) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 33509 (vs. 32937, 1.74%↑) 130852 (vs. 130852, 0.00%) 1336110955 (vs. 1336110955, 0.00%) 413 (vs. 413, 0.00%)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1633 (vs. 1501, 8.79%↑) 30540 (vs. 30540, 0.00%) 41263 (vs. 41263, 0.00%) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1994 (vs. 1839, 8.43%↑) 43516 (vs. 43516, 0.00%) 54239 (vs. 54239, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1602 (vs. 1443, 11.02%↑) 28492 (vs. 28492, 0.00%) 39151 (vs. 39151, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1861 (vs. 1780, 4.55%↑) 39892 (vs. 39892, 0.00%) 50551 (vs. 50551, 0.00%) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2234 (vs. 2010, 11.14%↑) 85344 (vs. 85344, 0.00%) 96003 (vs. 96003, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2234 (vs. 1990, 12.26%↑) 88224 (vs. 88224, 0.00%) 98947 (vs. 98947, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2536 (vs. 2556, 0.78%↓) 82792 (vs. 82792, 0.00%) 93515 (vs. 93515, 0.00%) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2057 (vs. 1915, 7.42%↑) 49772 (vs. 49772, 0.00%) 60494 (vs. 60494, 0.00%) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 912 (vs. 902, 1.11%↑) 8828 (vs. 8828, 0.00%) 26181 (vs. 26181, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 1058 (vs. 927, 14.13%↑) 9784 (vs. 9784, 0.00%) 27137 (vs. 27137, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 32235 (vs. 30266, 6.51%↑) 50552 (vs. 50552, 0.00%) 133987441 (vs. 133987441, 0.00%) 209 (vs. 209, 0.00%)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 19282 (vs. 21164, 8.89%↓) 42840 (vs. 42840, 0.00%) 2824263 (vs. 2824263, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 40129 (vs. 42228, 4.97%↓) 185480 (vs. 185480, 0.00%) 5108871 (vs. 5108871, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 63931 (vs. 69619, 8.17%↓) 49416 (vs. 49416, 0.00%) 98399431 (vs. 98399431, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 234614 (vs. 244158, 3.91%↓) 2108296 (vs. 2108296, 0.00%) 28407303 (vs. 28407303, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 16511 (vs. 16414, 0.59%↑) 50984 (vs. 50984, 0.00%) 16976263 (vs. 16976263, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 48979 (vs. 53256, 8.03%↓) 217032 (vs. 217032, 0.00%) 3867719 (vs. 3867719, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 17390 (vs. 20327, 14.45%↓) 58936 (vs. 58936, 0.00%) 314759 (vs. 314759, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 50816 (vs. 50985, 0.33%↓) 390748 (vs. 390748, 0.00%) 5314183 (vs. 5314183, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 271833 (vs. 286274, 5.04%↓) 2116860 (vs. 2116860, 0.00%) 28415879 (vs. 28415879, 0.00%) 1102 (vs. 1102, 0.00%)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 20453 (vs. 25353, 19.33%↓) 124428 (vs. 124428, 0.00%) 380231 (vs. 380231, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 58921 (vs. 62663, 5.97%↓) 328664 (vs. 328664, 0.00%) 3979399 (vs. 3979399, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 19064 (vs. 22021, 13.43%↓) 56272 (vs. 56272, 0.00%) 2837637 (vs. 2837637, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 47181 (vs. 47842, 1.38%↓) 32912 (vs. 32912, 0.00%) 98382917 (vs. 98382917, 0.00%) 704 (vs. 704, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 36299 (vs. 35022, 3.65%↑) 20320 (vs. 20320, 0.00%) 652747476 (vs. 652747476, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 34298 (vs. 33964, 0.98%↑) 9024 (vs. 9024, 0.00%) 652730324 (vs. 652730324, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 270382 (vs. 289032, 6.45%↓) 4387712 (vs. 4387712, 0.00%) 30686725 (vs. 30686725, 0.00%) 1102 (vs. 1102, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 103351 (vs. 107750, 4.08%↓) 694624 (vs. 694624, 0.00%) 88728645 (vs. 88728645, 0.00%) 268 (vs. 268, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 23682 (vs. 27588, 14.16%↓) 49984 (vs. 49984, 0.00%) 2848965 (vs. 2848965, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 83788 (vs. 85836, 2.39%↓) 21600 (vs. 21600, 0.00%) 98557061 (vs. 98557061, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 41268 (vs. 43454, 5.03%↓) 11040 (vs. 11040, 0.00%) 992509844 (vs. 992509844, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 48303 (vs. 44070, 9.61%↑) 9232 (vs. 9232, 0.00%) 992515476 (vs. 992515476, 0.00%) 367 (vs. 367, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 211263 (vs. 224297, 5.81%↓) 1372976 (vs. 1372976, 0.00%) 27851077 (vs. 27851077, 0.00%) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 79167 (vs. 85303, 7.19%↓) 125600 (vs. 125600, 0.00%) 88155333 (vs. 88155333, 0.00%) 386 (vs. 386, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 21890 (vs. 23767, 7.90%↓) 41008 (vs. 41008, 0.00%) 2840005 (vs. 2840005, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 77175 (vs. 79957, 3.48%↓) 22096 (vs. 22096, 0.00%) 98557509 (vs. 98557509, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 42767 (vs. 44220, 3.29%↓) 10640 (vs. 10640, 0.00%) 992509460 (vs. 992509460, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 39720 (vs. 39343, 0.96%↑) 8928 (vs. 8928, 0.00%) 992515156 (vs. 992515156, 0.00%) 367 (vs. 367, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 211888 (vs. 220690, 3.99%↓) 1379904 (vs. 1379904, 0.00%) 27857989 (vs. 27857989, 0.00%) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 67742 (vs. 70870, 4.41%↓) 128848 (vs. 128848, 0.00%) 88158533 (vs. 88158533, 0.00%) 386 (vs. 386, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1273 (vs. 993, 28.20%↑) 3872 (vs. 3872, 0.00%) 271673 (vs. 271673, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2081 (vs. 2061, 0.97%↑) 4480 (vs. 4480, 0.00%) 273605 (vs. 273605, 0.00%) 4 (vs. 4, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1597 (vs. 1490, 7.18%↑) 6512 (vs. 6512, 0.00%) 275653 (vs. 275653, 0.00%) 4 (vs. 4, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 62759 (vs. 65722, 4.51%↓) 246320 (vs. 246320, 0.00%) 98597530 (vs. 98597530, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 63310 (vs. 71425, 11.36%↓) 246320 (vs. 246320, 0.00%) 98597530 (vs. 98597530, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 72473 (vs. 69284, 4.60%↑) 143012 (vs. 143012, 0.00%) 98493753 (vs. 98493753, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 94960 (vs. 104257, 8.92%↓) 3004452 (vs. 3004452, 0.00%) 52898580 (vs. 52898580, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 188108 (vs. 195068, 3.57%↓) 6756400 (vs. 6756400, 0.00%) 33056470 (vs. 33056470, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 65823 (vs. 68346, 3.69%↓) 143012 (vs. 143012, 0.00%) 98493881 (vs. 98493881, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 79979 (vs. 89144, 10.28%↓) 3008480 (vs. 3008480, 0.00%) 52905492 (vs. 52905492, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 188648 (vs. 193306, 2.41%↓) 6736144 (vs. 6736144, 0.00%) 33024150 (vs. 33024150, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 103157 (vs. 105628, 2.34%↓) 143012 (vs. 143012, 0.00%) 99716025 (vs. 99716025, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 122703 (vs. 126490, 2.99%↓) 3008480 (vs. 3008480, 0.00%) 54169300 (vs. 54169300, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 291684 (vs. 304314, 4.15%↓) 6736144 (vs. 6736144, 0.00%) 34937238 (vs. 34937238, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 21554 (vs. 21167, 1.83%↑) 208629 (vs. 208629, 0.00%) 14213630 (vs. 14213630, 0.00%) 172 (vs. 172, 0.00%)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 34016 (vs. 33107, 2.75%↑) 311349 (vs. 311349, 0.00%) 10551678 (vs. 10551678, 0.00%) 210 (vs. 210, 0.00%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment