Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/c8c305ed3a478fb275e22f4bd7d0103d to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/c8c305ed3a478fb275e22f4bd7d0103d to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Data-Tiling Comparison Table

Name No-DT (baseline) DT-Only DT-UK
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 223.638 (1.0X) 136.319 (1.6X) 108.061 (2.1X)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 795.289 (1.0X) 277.674 (2.9X) 223.405 (3.6X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 31.868 (1.0X) 36.733 (0.9X) 29.907 (1.1X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.905 (1.0X) 9.323 (0.7X) 8.470 (0.8X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 271.769 (1.0X) 257.914 (1.1X) 228.485 (1.2X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.946 (1.0X) 36.094 (1.0X) 34.020 (1.1X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.744 (1.0X) 51.504 (0.5X) 13.015 (2.1X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.782 (1.0X) 10.973 (0.5X) 4.961 (1.2X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.251 (1.0X) 39.142 (1.8X) 40.540 (1.7X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.058 (1.0X) 8.522 (1.1X) 8.461 (1.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.049 (1.0X) 41.961 (2.1X) 41.831 (2.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.012 (1.0X) 8.960 (1.2X) 8.904 (1.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.791 (1.0X) 78.770 (1.0X) 56.624 (1.4X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.195 (1.0X) 15.342 (0.8X) 13.724 (0.9X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 181.720 (1.0X) 249.378 (0.7X) 185.809 (1.0X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.662 (1.0X) 65.461 (0.5X) 61.088 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.891 (1.0X) 258.015 (0.7X) 192.631 (0.9X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.884 (1.0X) 65.805 (0.5X) 61.495 (0.6X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 490.284 (1.0X) 1068.481 (0.5X) 213.945 (2.3X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.370 (1.0X) 133.004 (0.5X) 62.010 (1.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 25.303 (1.0X) 22.933 (1.1X) 17.381 (1.5X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.558 (1.0X) 5.333 (0.9X) 4.525 (1.0X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.745 (1.0X) 15.457 (0.8X) 11.705 (1.0X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.717 (1.0X) 5.332 (0.7X) 4.836 (0.8X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.516 (1.0X) 42.668 (0.5X) 11.825 (1.8X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.812 (1.0X) 9.555 (0.6X) 5.359 (1.1X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.769 (1.0X) 3.348 (0.8X) 2.726 (1.0X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.875 (1.0X) 3.467 (0.8X) 2.833 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.316 (1.0X) 39.147 (0.9X) 31.209 (1.1X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.382 (1.0X) 10.903 (0.8X) 9.744 (0.9X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.699 (1.0X) 1.294 (0.5X) 0.568 (1.2X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.767 (1.0X) 1.381 (0.6X) 0.631 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.471 (1.0X) 24.110 (0.7X) 19.243 (0.9X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.094 (1.0X) 5.892 (0.7X) 5.134 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.566 (1.0X) 7.568 (1.0X) 7.599 (1.0X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 48.447 (1.0X) 85.753 (0.6X) 43.804 (1.1X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.367 (1.0X) 87.264 (0.6X) 44.505 (1.1X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 30.256 (1.0X) 50.734 (0.6X) 27.658 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.684 (1.0X) 22.457 (4.1X) 21.462 (4.3X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.512 (1.0X) 22.656 (4.1X) 22.088 (4.2X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.179 (1.0X) 22.329 (2.3X) 21.787 (2.4X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 141.728 (1.0X) 28.366 (5.0X) 27.134 (5.2X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 139.420 (1.0X) 30.376 (4.6X) 29.124 (4.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 76.489 (1.0X) 26.460 (2.9X) 26.670 (2.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 698.097 (1.0X) 448.479 (1.6X) 349.676 (2.0X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 705.542 (1.0X) 467.320 (1.5X) 362.855 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 399.205 (1.0X) 276.041 (1.4X) 216.754 (1.8X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1127.034 (1.0X) 1115.804 (1.0X) 319.583 (3.5X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1127.525 (1.0X) 1085.853 (1.0X) 314.203 (3.6X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 582.697 (1.0X) 591.305 (1.0X) 188.041 (3.1X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2101.216 (1.0X) 1899.129 (1.1X) 302.231 (7.0X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2104.000 (1.0X) 1890.775 (1.1X) 299.558 (7.0X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1129.740 (1.0X) 1076.847 (1.0X) 179.183 (6.3X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.378 (1.0X) 14.459 (0.9X) 1.302 (9.5X)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 25.303 (vs. 23.662, 6.93%↑) 25.712 1.052
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 141.728 (vs. 134.324, 5.51%↑) 141.615 0.578

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 30.376 (vs. 32.913, 7.71%↓) 30.400 0.696
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 276.041 (vs. 296.368, 6.86%↓) 278.112 8.030
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 27.134 (vs. 29.036, 6.55%↓) 27.379 0.907
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 22.088 (vs. 23.582, 6.34%↓) 22.059 0.070
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 216.754 (vs. 231.079, 6.20%↓) 219.237 6.073
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 29.124 (vs. 30.976, 5.98%↓) 29.071 0.954
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 26.460 (vs. 28.113, 5.88%↓) 26.518 0.442
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 21.462 (vs. 22.791, 5.83%↓) 21.405 0.302
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 179.183 (vs. 190.278, 5.83%↓) 180.818 4.289
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 85.753 (vs. 90.917, 5.68%↓) 86.051 0.974
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 43.804 (vs. 46.414, 5.62%↓) 43.844 0.268
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 27.658 (vs. 29.284, 5.55%↓) 27.897 0.726
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.734 (vs. 53.661, 5.46%↓) 51.076 1.301
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 82.513 (vs. 87.142, 5.31%↓) 82.372 0.335
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1076.847 (vs. 1136.609, 5.26%↓) 1083.155 22.653
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 21.787 (vs. 22.972, 5.16%↓) 21.842 0.177
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1.302 (vs. 1.371, 5.00%↓) 1.307 0.012
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.110 (vs. 25.159, 4.17%↓) 24.100 0.102
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.733 (vs. 37.789, 2.79%↓) 36.669 0.178

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 15.457 (vs. 16.768, 7.82%↓) 15.452 0.111
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.558 (vs. 4.884, 6.66%↓) 4.551 0.040
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 72.123 (vs. 75.861, 4.93%↓) 72.209 0.605
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 698.097 (vs. 732.844, 4.74%↓) 701.740 9.719
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 93.469 (vs. 98.068, 4.69%↓) 93.424 0.381
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 399.205 (vs. 418.710, 4.66%↓) 400.641 8.790
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1129.740 (vs. 1184.488, 4.62%↓) 1134.340 19.085
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 467.320 (vs. 489.078, 4.45%↓) 469.972 8.110
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 76.489 (vs. 73.319, 4.32%↑) 76.508 0.226
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 271.769 (vs. 283.989, 4.30%↓) 272.128 9.035
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 349.676 (vs. 365.016, 4.20%↓) 350.748 5.598
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 22.656 (vs. 23.636, 4.15%↓) 22.664 0.085
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 30.256 (vs. 31.559, 4.13%↓) 30.436 0.805
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 795.289 (vs. 763.837, 4.12%↑) 785.090 48.660
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 448.479 (vs. 467.520, 4.07%↓) 449.007 6.222
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 28.366 (vs. 29.523, 3.92%↓) 28.548 0.987
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 299.558 (vs. 311.779, 3.92%↓) 300.315 2.663
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 302.231 (vs. 314.340, 3.85%↓) 303.412 3.833
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 591.305 (vs. 614.992, 3.85%↓) 594.686 14.320
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 22.329 (vs. 23.194, 3.73%↓) 22.384 0.202
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.316 (vs. 34.047, 3.73%↑) 35.228 0.426
EfficientNetV2STF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 4.294 (vs. 4.459, 3.71%↓) 4.265 0.079
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1085.853 (vs. 1126.804, 3.63%↓) 1092.053 12.635
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 44.505 (vs. 46.173, 3.61%↓) 44.728 0.465
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.705 (vs. 11.309, 3.50%↑) 11.706 0.092
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1890.775 (vs. 1956.109, 3.34%↓) 1890.170 29.871
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 26.670 (vs. 27.564, 3.24%↓) 26.770 0.335
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1899.129 (vs. 1959.226, 3.07%↓) 1908.177 24.059
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 48.447 (vs. 49.966, 3.04%↓) 48.599 0.830
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 705.542 (vs. 727.285, 2.99%↓) 707.306 7.755
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 139.420 (vs. 135.388, 2.98%↑) 139.262 0.380
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 582.697 (vs. 600.416, 2.95%↓) 584.720 8.464
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 3.348 (vs. 3.446, 2.86%↓) 3.330 0.056
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.726 (vs. 2.651, 2.83%↑) 2.719 0.033
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 102.995 (vs. 105.855, 2.70%↓) 102.962 0.338
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 91.415 (vs. 93.895, 2.64%↓) 91.409 0.285
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 40.540 (vs. 39.529, 2.56%↑) 40.034 1.253
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 14.459 (vs. 14.839, 2.56%↓) 14.485 0.090
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 362.855 (vs. 371.815, 2.41%↓) 363.879 3.177
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.381 (vs. 17.797, 2.34%↓) 17.315 0.163
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 223.638 (vs. 218.649, 2.28%↑) 218.628 13.927
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 72.646 (vs. 74.276, 2.19%↓) 72.390 0.776
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.367 (vs. 51.481, 2.16%↓) 50.332 0.591
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.745 (vs. 11.510, 2.04%↑) 11.641 0.243
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 19.243 (vs. 18.863, 2.01%↑) 19.204 0.087
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.724 (vs. 14.003, 1.99%↓) 13.668 0.146
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 188.041 (vs. 191.789, 1.95%↓) 188.473 3.789
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 15.342 (vs. 15.637, 1.89%↓) 15.298 0.126
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 82.089 (vs. 83.658, 1.88%↓) 82.604 1.427
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.744 (vs. 9.908, 1.65%↓) 9.728 0.042
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 87.264 (vs. 88.707, 1.63%↓) 87.800 1.308
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.875 (vs. 2.918, 1.46%↓) 2.871 0.015
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.179 (vs. 52.943, 1.44%↓) 52.154 0.203
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.884 (vs. 34.376, 1.43%↓) 33.343 0.922
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.058 (vs. 9.177, 1.30%↓) 8.839 0.424
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.961 (vs. 5.025, 1.27%↓) 4.964 0.013
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 314.203 (vs. 318.235, 1.27%↓) 313.264 4.189
matmul\_128x256x8192\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.047 (vs. 0.048, 1.23%↓) 0.047 0.000
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 78.910 (vs. 79.881, 1.21%↓) 78.576 0.798
MiniLML12H384Uncased(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.602 (vs. 1.621, 1.20%↓) 1.602 0.009
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.094 (vs. 4.144, 1.19%↓) 4.092 0.012
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.769 (vs. 2.802, 1.18%↓) 2.763 0.018
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.836 (vs. 4.892, 1.16%↓) 4.834 0.014
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 319.583 (vs. 315.993, 1.14%↑) 320.086 2.667
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.359 (vs. 5.419, 1.10%↓) 5.359 0.016
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.891 (vs. 181.874, 1.09%↓) 178.876 2.435
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 122.821 (vs. 124.163, 1.08%↓) 122.817 0.772
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1115.804 (vs. 1127.788, 1.06%↓) 1115.802 7.681
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.662 (vs. 34.022, 1.06%↓) 33.175 0.905
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 39.147 (vs. 39.563, 1.05%↓) 38.925 0.502
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.332 (vs. 5.387, 1.02%↓) 5.330 0.012
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.049 (vs. 87.930, 1.00%↓) 87.378 3.015
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2104.000 (vs. 2124.765, 0.98%↓) 2104.192 2.119
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.020 (vs. 34.355, 0.97%↓) 33.792 0.354
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 108.061 (vs. 107.052, 0.94%↑) 107.426 1.087
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 257.914 (vs. 260.322, 0.93%↓) 255.810 4.442
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.525 (vs. 4.567, 0.92%↓) 4.528 0.027
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.195 (vs. 12.307, 0.91%↓) 11.996 0.301
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.522 (vs. 8.448, 0.88%↑) 8.500 0.092
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 22.933 (vs. 23.135, 0.87%↓) 22.849 0.204
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 39.142 (vs. 39.453, 0.79%↓) 38.465 1.641
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.461 (vs. 8.526, 0.77%↓) 8.414 0.074
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 41.961 (vs. 42.285, 0.77%↓) 41.335 1.427
matmul\_2560x2560x2560\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.141 (vs. 0.142, 0.77%↓) 0.141 0.001
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.684 (vs. 93.394, 0.76%↓) 92.737 0.158
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.471 (vs. 17.604, 0.76%↓) 17.452 0.052
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.555 (vs. 9.626, 0.73%↓) 9.557 0.015
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.134 (vs. 5.171, 0.71%↓) 5.127 0.014
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.088 (vs. 61.520, 0.70%↓) 60.789 0.528
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 249.378 (vs. 251.133, 0.70%↓) 247.413 3.613
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.905 (vs. 6.954, 0.70%↓) 6.909 0.052
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 65.805 (vs. 66.263, 0.69%↓) 65.515 0.528
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.767 (vs. 0.772, 0.69%↓) 0.771 0.008
matmul\_3456x1024x2048\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.131 (vs. 0.130, 0.68%↑) 0.130 0.002
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.467 (vs. 3.490, 0.68%↓) 3.464 0.039
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.382 (vs. 8.438, 0.67%↓) 8.357 0.049
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 181.720 (vs. 180.517, 0.67%↑) 180.683 2.908
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.744 (vs. 26.920, 0.66%↓) 26.808 0.226
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 192.631 (vs. 191.392, 0.65%↑) 190.740 3.317
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1127.525 (vs. 1120.289, 0.65%↑) 1127.388 1.280
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 136.319 (vs. 137.197, 0.64%↓) 135.675 1.381
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.378 (vs. 12.457, 0.64%↓) 12.382 0.089
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.782 (vs. 5.819, 0.64%↓) 5.780 0.013
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.370 (vs. 66.786, 0.62%↓) 66.106 0.505
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 1.294 (vs. 1.302, 0.61%↓) 1.292 0.008
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.094 (vs. 36.316, 0.61%↓) 36.018 0.286
matmul\_128x256x8192\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.025 (vs. 0.026, 0.61%↓) 0.025 0.000
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.812 (vs. 5.845, 0.57%↓) 5.806 0.017
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 42.668 (vs. 42.913, 0.57%↓) 42.615 0.160
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.323 (vs. 9.376, 0.56%↓) 9.336 0.028
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 277.674 (vs. 276.254, 0.51%↑) 277.563 1.502
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.631 (vs. 0.635, 0.51%↓) 0.631 0.001
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.825 (vs. 11.884, 0.50%↓) 11.841 0.055
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.333 (vs. 5.307, 0.50%↑) 5.330 0.022
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.568 (vs. 0.571, 0.49%↓) 0.568 0.001
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.495 (vs. 61.794, 0.48%↓) 61.314 0.503
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.512 (vs. 93.957, 0.47%↓) 93.501 0.079
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 78.770 (vs. 79.145, 0.47%↓) 77.956 1.331
matmul\_2562x2564x2562\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.108 (vs. 1.113, 0.42%↓) 1.108 0.000
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.470 (vs. 8.504, 0.40%↓) 8.463 0.040
matmul\_2564x2564x2564\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.904 (vs. 0.908, 0.40%↓) 0.904 0.000
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.516 (vs. 21.601, 0.39%↓) 21.515 0.054
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1127.034 (vs. 1122.759, 0.38%↑) 1126.635 1.201
BertLargeTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 10.720 (vs. 10.682, 0.36%↑) 10.723 0.015
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 10.903 (vs. 10.942, 0.36%↓) 10.897 0.036
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 22.457 (vs. 22.535, 0.35%↓) 22.368 0.256
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 56.624 (vs. 56.428, 0.35%↑) 56.271 0.930
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.599 (vs. 7.574, 0.33%↑) 7.594 0.026
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.568 (vs. 7.593, 0.33%↓) 7.564 0.013
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 31.209 (vs. 31.311, 0.32%↓) 31.124 0.328
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 65.461 (vs. 65.660, 0.30%↓) 65.077 0.654
matmul\_123x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.222 (vs. 0.223, 0.29%↓) 0.222 0.000
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 62.010 (vs. 62.175, 0.27%↓) 61.868 0.373
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 258.015 (vs. 258.699, 0.26%↓) 256.343 3.343
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.015 (vs. 12.982, 0.26%↑) 13.000 0.051
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.904 (vs. 8.927, 0.26%↓) 8.880 0.072
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 29.907 (vs. 29.840, 0.22%↑) 29.850 0.134
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.717 (vs. 3.725, 0.21%↓) 3.711 0.018
matmul\_3456x1024x2048\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.063 (vs. 0.063, 0.21%↓) 0.063 0.000
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 51.504 (vs. 51.610, 0.21%↓) 51.487 0.118
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 228.485 (vs. 228.035, 0.20%↑) 226.501 4.488
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1.381 (vs. 1.378, 0.20%↑) 1.383 0.007
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 223.405 (vs. 223.008, 0.18%↑) 223.250 1.979
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.791 (vs. 79.656, 0.17%↑) 79.261 1.351
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 490.284 (vs. 489.485, 0.16%↑) 489.130 2.146
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 185.809 (vs. 186.112, 0.16%↓) 184.492 2.893
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 258.914 (vs. 259.321, 0.16%↓) 258.897 0.650
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 41.831 (vs. 41.894, 0.15%↓) 41.078 1.619
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1068.481 (vs. 1070.026, 0.14%↓) 1068.176 3.338
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.833 (vs. 2.837, 0.14%↓) 2.825 0.038
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 133.004 (vs. 132.815, 0.14%↑) 132.698 1.126
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.699 (vs. 0.698, 0.12%↑) 0.699 0.001
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.566 (vs. 7.556, 0.12%↑) 7.569 0.009
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1802.227 (vs. 1804.460, 0.12%↓) 1800.974 2.834
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.892 (vs. 5.899, 0.12%↓) 5.892 0.015
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 213.945 (vs. 213.739, 0.10%↑) 213.465 1.324
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.946 (vs. 35.915, 0.09%↑) 35.289 1.030
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 10.973 (vs. 10.982, 0.09%↓) 10.970 0.017
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.251 (vs. 70.311, 0.08%↓) 70.303 2.547
BertForMaskedLMTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 6.926 (vs. 6.931, 0.08%↓) 6.908 0.053
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2101.216 (vs. 2102.974, 0.08%↓) 2100.867 1.873
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.960 (vs. 8.955, 0.07%↑) 8.936 0.082
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.012 (vs. 11.016, 0.04%↓) 10.643 0.614
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 259.273 (vs. 259.183, 0.03%↑) 259.342 0.759
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 31.868 (vs. 31.858, 0.03%↑) 31.800 0.111
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7903.512 (vs. 7901.060, 0.03%↑) 7905.096 5.073
matmul\_2560x2560x2560\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.303 (vs. 0.303, 0.03%↑) 0.303 0.000
matmul\_2562x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.534 (vs. 1.534, 0.02%↑) 1.534 0.000

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1643 (vs. 1210, 35.79%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 25212 (vs. 19941, 26.43%↑) 144032 (vs. 144032, 0.00%) 399877 (vs. 399877, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 42985 (vs. 43111, 0.29%↓) 238720 (vs. 238720, 0.00%) 10458245 (vs. 10458245, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 37808 (vs. 36379, 3.93%↑) 177680 (vs. 177680, 0.00%) 2959045 (vs. 2959045, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 62174 (vs. 53500, 16.21%↑) 680016 (vs. 680016, 0.00%) 5603397 (vs. 5603397, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 22962 (vs. 22332, 2.82%↑) 174400 (vs. 174400, 0.00%) 17094405 (vs. 17094405, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 37676 (vs. 34416, 9.47%↑) 190096 (vs. 190096, 0.00%) 14173189 (vs. 14173189, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 68246 (vs. 72415, 5.76%↓) 569008 (vs. 569008, 0.00%) 4219717 (vs. 4219717, 0.00%) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 49668 (vs. 45779, 8.50%↑) 287600 (vs. 287600, 0.00%) 18229253 (vs. 18229253, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 23822 (vs. 21940, 8.58%↑) 142288 (vs. 142288, 0.00%) 5195653 (vs. 5195653, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 69056 (vs. 67413, 2.44%↑) 84496 (vs. 84496, 0.00%) 99926661 (vs. 99926661, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 70232 (vs. 70507, 0.39%↓) 93056 (vs. 93056, 0.00%) 98443077 (vs. 98443077, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 294611 (vs. 282346, 4.34%↑) 5843296 (vs. 5843296, 0.00%) 32142341 (vs. 32142341, 0.00%) 1102 (vs. 1102, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 101953 (vs. 99983, 1.97%↑) 216304 (vs. 216304, 0.00%) 164497708 (vs. 164497708, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 28926 (vs. 27293, 5.98%↑) 59344 (vs. 59344, 0.00%) 133996207 (vs. 133996207, 0.00%) 209 (vs. 209, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 40202 (vs. 30765, 30.67%↑) 27920 (vs. 27920, 0.00%) 652755092 (vs. 652755092, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 34706 (vs. 32754, 5.96%↑) 14736 (vs. 14736, 0.00%) 652736084 (vs. 652736084, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 37456 (vs. 40118, 6.64%↓) 68608 (vs. 68608, 0.00%) 533841087 (vs. 533841087, 0.00%) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 36192 (vs. 35506, 1.93%↑) 50128 (vs. 50128, 0.00%) 1336025023 (vs. 1336025023, 0.00%) 413 (vs. 413, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1533 (vs. 1242, 23.43%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 27156 (vs. 24775, 9.61%↑) 107248 (vs. 107248, 0.00%) 371397 (vs. 371397, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 35253 (vs. 29826, 18.20%↑) 96944 (vs. 96944, 0.00%) 10398725 (vs. 10398725, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 32228 (vs. 30264, 6.49%↑) 113392 (vs. 113392, 0.00%) 2921349 (vs. 2921349, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 50400 (vs. 44125, 14.22%↑) 269120 (vs. 269120, 0.00%) 5219205 (vs. 5219205, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 17950 (vs. 18313, 1.98%↓) 60288 (vs. 60288, 0.00%) 17017925 (vs. 17017925, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 33580 (vs. 26540, 26.53%↑) 95248 (vs. 95248, 0.00%) 14133765 (vs. 14133765, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 56973 (vs. 59798, 4.72%↓) 330176 (vs. 330176, 0.00%) 4003781 (vs. 4003781, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 47514 (vs. 40500, 17.32%↑) 136080 (vs. 136080, 0.00%) 18361413 (vs. 18361413, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 17064 (vs. 13298, 28.32%↑) 44208 (vs. 44208, 0.00%) 5147973 (vs. 5147973, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 91553 (vs. 89554, 2.23%↑) 53488 (vs. 53488, 0.00%) 100084229 (vs. 100084229, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 89438 (vs. 85824, 4.21%↑) 53920 (vs. 53920, 0.00%) 98589445 (vs. 98589445, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 254362 (vs. 241699, 5.24%↑) 2610608 (vs. 2610608, 0.00%) 29088709 (vs. 29088709, 0.00%) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 112692 (vs. 110592, 1.90%↑) 154864 (vs. 154864, 0.00%) 169910060 (vs. 169910060, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 38786 (vs. 33173, 16.92%↑) 36480 (vs. 36480, 0.00%) 219477487 (vs. 219477487, 0.00%) 342 (vs. 342, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 49854 (vs. 46847, 6.42%↑) 18224 (vs. 18224, 0.00%) 992542996 (vs. 992542996, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 41886 (vs. 39370, 6.39%↑) 11392 (vs. 11392, 0.00%) 992539860 (vs. 992539860, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 44516 (vs. 49523, 10.11%↓) 38336 (vs. 38336, 0.00%) 875869119 (vs. 875869119, 0.00%) 346 (vs. 346, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 52896 (vs. 47718, 10.85%↑) 41632 (vs. 41632, 0.00%) 1336061759 (vs. 1336061759, 0.00%) 678 (vs. 678, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1719 (vs. 1174, 46.42%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 26529 (vs. 22806, 16.32%↑) 99488 (vs. 99488, 0.00%) 363653 (vs. 363653, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 37740 (vs. 36806, 2.54%↑) 98656 (vs. 98656, 0.00%) 10400453 (vs. 10400453, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 33850 (vs. 33643, 0.62%↑) 117504 (vs. 117504, 0.00%) 2925445 (vs. 2925445, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 50187 (vs. 47436, 5.80%↑) 257968 (vs. 257968, 0.00%) 5208069 (vs. 5208069, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 19176 (vs. 20413, 6.06%↓) 65376 (vs. 65376, 0.00%) 17023045 (vs. 17023045, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 33614 (vs. 26609, 26.33%↑) 99104 (vs. 99104, 0.00%) 14137669 (vs. 14137669, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 62014 (vs. 51935, 19.41%↑) 323920 (vs. 323920, 0.00%) 3997509 (vs. 3997509, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 48168 (vs. 42311, 13.84%↑) 122656 (vs. 122656, 0.00%) 18348037 (vs. 18348037, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 18427 (vs. 18004, 2.35%↑) 47200 (vs. 47200, 0.00%) 5150981 (vs. 5150981, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 91102 (vs. 82836, 9.98%↑) 39040 (vs. 39040, 0.00%) 100069765 (vs. 100069765, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 89005 (vs. 78734, 13.05%↑) 39456 (vs. 39456, 0.00%) 98574981 (vs. 98574981, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 245365 (vs. 236664, 3.68%↑) 2600320 (vs. 2600320, 0.00%) 29078405 (vs. 29078405, 0.00%) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 113532 (vs. 108994, 4.16%↑) 140240 (vs. 140240, 0.00%) 169895468 (vs. 169895468, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 37756 (vs. 38401, 1.68%↓) 32480 (vs. 32480, 0.00%) 219473455 (vs. 219473455, 0.00%) 342 (vs. 342, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 50982 (vs. 48154, 5.87%↑) 18784 (vs. 18784, 0.00%) 992543572 (vs. 992543572, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 50341 (vs. 48473, 3.85%↑) 11280 (vs. 11280, 0.00%) 992539796 (vs. 992539796, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 45142 (vs. 52879, 14.63%↓) 33072 (vs. 33072, 0.00%) 875863871 (vs. 875863871, 0.00%) 346 (vs. 346, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 61193 (vs. 48047, 27.36%↑) 37328 (vs. 37328, 0.00%) 1336057407 (vs. 1336057407, 0.00%) 678 (vs. 678, 0.00%)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 92991 (vs. 95171, 2.29%↓) 843788 (vs. 843796, 0.00%↓) 165158548 (vs. 165158548, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 25389 (vs. 25792, 1.56%↓) 145080 (vs. 145080, 0.00%) 134088303 (vs. 134088303, 0.00%) 209 (vs. 209, 0.00%)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 35222 (vs. 30227, 16.52%↑) 227472 (vs. 227480, 0.00%↓) 534005346 (vs. 534005410, 0.00%↓) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 39370 (vs. 27654, 42.37%↑) 130696 (vs. 130704, 0.01%↓) 1336110763 (vs. 1336110827, 0.00%↓) 413 (vs. 413, 0.00%)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1673 (vs. 1305, 28.20%↑) 30556 (vs. 30556, 0.00%) 41279 (vs. 41279, 0.00%) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1993 (vs. 1615, 23.41%↑) 45028 (vs. 45028, 0.00%) 55751 (vs. 55751, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1783 (vs. 1273, 40.06%↑) 28524 (vs. 28524, 0.00%) 39183 (vs. 39183, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1883 (vs. 1550, 21.48%↑) 41404 (vs. 41404, 0.00%) 52063 (vs. 52063, 0.00%) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2221 (vs. 1946, 14.13%↑) 86844 (vs. 86844, 0.00%) 97503 (vs. 97503, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2356 (vs. 1876, 25.59%↑) 89724 (vs. 89724, 0.00%) 100447 (vs. 100447, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2795 (vs. 2713, 3.02%↑) 84196 (vs. 84288, 0.11%↓) 94919 (vs. 95011, 0.10%↓) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1970 (vs. 1869, 5.40%↑) 51232 (vs. 51328, 0.19%↓) 61954 (vs. 62050, 0.15%↓) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 984 (vs. 772, 27.46%↑) 8828 (vs. 8828, 0.00%) 26181 (vs. 26181, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 1081 (vs. 714, 51.40%↑) 9784 (vs. 9784, 0.00%) 27137 (vs. 27137, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 33547 (vs. 26165, 28.21%↑) 50648 (vs. 50648, 0.00%) 133987505 (vs. 133987505, 0.00%) 209 (vs. 209, 0.00%)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 20512 (vs. 18340, 11.84%↑) 42840 (vs. 42840, 0.00%) 2824263 (vs. 2824263, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 40497 (vs. 35744, 13.30%↑) 185480 (vs. 185480, 0.00%) 5108871 (vs. 5108871, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 73134 (vs. 60983, 19.93%↑) 49416 (vs. 49416, 0.00%) 98399431 (vs. 98399431, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 253597 (vs. 237746, 6.67%↑) 2108280 (vs. 2108280, 0.00%) 28407303 (vs. 28407303, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 15318 (vs. 15846, 3.33%↓) 50984 (vs. 50984, 0.00%) 16976263 (vs. 16976263, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 52738 (vs. 54855, 3.86%↓) 217016 (vs. 217016, 0.00%) 3867719 (vs. 3867719, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 18227 (vs. 17596, 3.59%↑) 58936 (vs. 58936, 0.00%) 314759 (vs. 314759, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 53884 (vs. 47627, 13.14%↑) 390748 (vs. 390748, 0.00%) 5314183 (vs. 5314183, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 290888 (vs. 272210, 6.86%↑) 2116860 (vs. 2116860, 0.00%) 28415879 (vs. 28415879, 0.00%) 1102 (vs. 1102, 0.00%)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 21394 (vs. 18975, 12.75%↑) 124428 (vs. 124428, 0.00%) 380231 (vs. 380231, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 60819 (vs. 60897, 0.13%↓) 328664 (vs. 328664, 0.00%) 3979399 (vs. 3979399, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 22853 (vs. 20772, 10.02%↑) 56176 (vs. 56176, 0.00%) 2837573 (vs. 2837573, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 51509 (vs. 45479, 13.26%↑) 32912 (vs. 32912, 0.00%) 98382917 (vs. 98382917, 0.00%) 704 (vs. 704, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 38229 (vs. 30164, 26.74%↑) 20320 (vs. 20320, 0.00%) 652747476 (vs. 652747476, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 36903 (vs. 36959, 0.15%↓) 9024 (vs. 9024, 0.00%) 652730324 (vs. 652730324, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 285705 (vs. 279563, 2.20%↑) 4388208 (vs. 4387664, 0.01%↑) 30687237 (vs. 30686661, 0.00%↑) 1102 (vs. 1102, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 107362 (vs. 104742, 2.50%↑) 694752 (vs. 694752, 0.00%) 88728773 (vs. 88728773, 0.00%) 268 (vs. 268, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 31261 (vs. 22337, 39.95%↑) 50080 (vs. 50080, 0.00%) 2849093 (vs. 2849093, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 87808 (vs. 84245, 4.23%↑) 21520 (vs. 21520, 0.00%) 98556933 (vs. 98556933, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 45825 (vs. 39190, 16.93%↑) 11552 (vs. 11552, 0.00%) 992511764 (vs. 992511764, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 48209 (vs. 49829, 3.25%↓) 9120 (vs. 9120, 0.00%) 992513044 (vs. 992513044, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 224275 (vs. 216438, 3.62%↑) 1373856 (vs. 1373344, 0.04%↑) 27851973 (vs. 27851461, 0.00%↑) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 82719 (vs. 73403, 12.69%↑) 121904 (vs. 121904, 0.00%) 88145797 (vs. 88145797, 0.00%) 375 (vs. 375, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 25201 (vs. 21352, 18.03%↑) 40992 (vs. 40992, 0.00%) 2840005 (vs. 2840005, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 78173 (vs. 73663, 6.12%↑) 22064 (vs. 22064, 0.00%) 98557509 (vs. 98557509, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 43285 (vs. 43104, 0.42%↑) 10720 (vs. 10720, 0.00%) 992510932 (vs. 992510932, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 41467 (vs. 37380, 10.93%↑) 8688 (vs. 8688, 0.00%) 992512596 (vs. 992512596, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 226303 (vs. 215878, 4.83%↑) 1380992 (vs. 1380480, 0.04%↑) 27859077 (vs. 27858565, 0.00%↑) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 73138 (vs. 66759, 9.56%↑) 124400 (vs. 124400, 0.00%) 88148293 (vs. 88148293, 0.00%) 375 (vs. 375, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1202 (vs. 977, 23.03%↑) 3872 (vs. 3872, 0.00%) 271673 (vs. 271673, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2074 (vs. 1649, 25.77%↑) 4480 (vs. 4480, 0.00%) 273605 (vs. 273605, 0.00%) 4 (vs. 4, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1549 (vs. 1343, 15.34%↑) 6496 (vs. 6496, 0.00%) 275653 (vs. 275653, 0.00%) 4 (vs. 4, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 64657 (vs. 65865, 1.83%↓) 246320 (vs. 246320, 0.00%) 98597530 (vs. 98597530, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 74255 (vs. 79367, 6.44%↓) 246320 (vs. 246320, 0.00%) 98597530 (vs. 98597530, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 73943 (vs. 68790, 7.49%↑) 143012 (vs. 143012, 0.00%) 98493753 (vs. 98493753, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 101511 (vs. 106586, 4.76%↓) 3004452 (vs. 3004452, 0.00%) 52898580 (vs. 52898580, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 193126 (vs. 179562, 7.55%↑) 6756400 (vs. 6756400, 0.00%) 33056470 (vs. 33056470, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 70421 (vs. 59468, 18.42%↑) 143012 (vs. 143012, 0.00%) 98493881 (vs. 98493881, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 86510 (vs. 80866, 6.98%↑) 3008480 (vs. 3008480, 0.00%) 52905492 (vs. 52905492, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 195935 (vs. 180213, 8.72%↑) 6736144 (vs. 6736144, 0.00%) 33024150 (vs. 33024150, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 110243 (vs. 105197, 4.80%↑) 143012 (vs. 143012, 0.00%) 99716025 (vs. 99716025, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 128105 (vs. 120775, 6.07%↑) 3008480 (vs. 3008480, 0.00%) 54169300 (vs. 54169300, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 301704 (vs. 290501, 3.86%↑) 6736144 (vs. 6736144, 0.00%) 34937238 (vs. 34937238, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 21542 (vs. 21938, 1.81%↓) 208629 (vs. 208629, 0.00%) 14213630 (vs. 14213630, 0.00%) 172 (vs. 172, 0.00%)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 33222 (vs. 27576, 20.47%↑) 311349 (vs. 311349, 0.00%) 10551678 (vs. 10551678, 0.00%) 210 (vs. 210, 0.00%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment