Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/dc482044b66cfafc37e5e7bc7c30de13 to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/dc482044b66cfafc37e5e7bc7c30de13 to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Data-Tiling Comparison Table

Name No-DT (baseline) DT-Only DT-UK
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 227.422 (1.0X) N/A 104.759 (2.2X)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 800.833 (1.0X) N/A 220.898 (3.6X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.949 (1.0X) N/A 8.481 (0.8X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.083 (1.0X) N/A 30.052 (1.1X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.857 (1.0X) N/A 34.199 (1.0X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.340 (1.0X) N/A 228.502 (1.2X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.830 (1.0X) N/A 4.972 (1.2X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.953 (1.0X) N/A 13.070 (2.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.304 (1.0X) N/A 8.780 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.037 (1.0X) N/A 40.104 (1.7X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.035 (1.0X) N/A 8.603 (1.3X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.927 (1.0X) N/A 41.792 (2.1X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.206 (1.0X) N/A 12.947 (0.9X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.243 (1.0X) N/A 57.226 (1.4X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.086 (1.0X) N/A 60.991 (0.6X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.738 (1.0X) N/A 186.406 (1.0X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.967 (1.0X) N/A 61.386 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.522 (1.0X) N/A 191.305 (0.9X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.799 (1.0X) N/A 61.883 (1.1X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 490.199 (1.0X) N/A 214.419 (2.3X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.766 (1.0X) N/A 4.551 (1.0X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.913 (1.0X) N/A 17.781 (1.4X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.727 (1.0X) N/A 4.897 (0.8X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.017 (1.0X) N/A 11.341 (1.1X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.868 (1.0X) N/A 5.420 (1.1X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.694 (1.0X) N/A 11.804 (1.8X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.867 (1.0X) N/A 2.820 (1.0X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.788 (1.0X) N/A 2.718 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.533 (1.0X) N/A 9.835 (0.9X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.735 (1.0X) N/A 31.670 (1.1X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.766 (1.0X) N/A 0.632 (1.2X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.697 (1.0X) N/A 0.571 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.139 (1.0X) N/A 5.136 (0.8X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.716 (1.0X) N/A 19.086 (0.9X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.563 (1.0X) N/A 7.565 (1.0X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 48.441 (1.0X) N/A 43.501 (1.1X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.244 (1.0X) N/A 44.233 (1.1X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 29.508 (1.0X) N/A 27.670 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.046 (1.0X) N/A 21.410 (4.3X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.200 (1.0X) N/A 21.640 (4.3X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.366 (1.0X) N/A 22.027 (2.4X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 136.808 (1.0X) N/A 27.352 (5.0X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 138.163 (1.0X) N/A 28.868 (4.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 76.570 (1.0X) N/A 26.510 (2.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 702.075 (1.0X) N/A 350.395 (2.0X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 712.406 (1.0X) N/A 359.557 (2.0X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 394.100 (1.0X) N/A 216.485 (1.8X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1114.063 (1.0X) N/A 304.435 (3.7X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1115.914 (1.0X) N/A 303.993 (3.7X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 575.180 (1.0X) N/A 181.078 (3.2X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2103.239 (1.0X) N/A 303.562 (6.9X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2106.533 (1.0X) N/A 306.298 (6.9X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1125.398 (1.0X) N/A 184.552 (6.1X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.127 (1.0X) N/A 1.302 (9.3X)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 111.133 (vs. 89.571, 24.07%↑) 111.922 2.049
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 89.208 (vs. 82.139, 8.61%↑) 87.805 3.318
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 75.411 (vs. 69.460, 8.57%↑) 75.434 0.719
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 138.163 (vs. 127.510, 8.35%↑) 138.003 0.520
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 76.570 (vs. 71.461, 7.15%↑) 76.401 0.563

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 21.640 (vs. 23.601, 8.31%↓) 21.634 0.044
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 28.868 (vs. 31.010, 6.91%↓) 28.976 0.736
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 27.352 (vs. 29.065, 5.89%↓) 27.497 0.782
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 27.670 (vs. 29.376, 5.81%↓) 27.904 0.713
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 29.508 (vs. 31.287, 5.69%↓) 29.678 0.845
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 216.485 (vs. 229.406, 5.63%↓) 219.614 6.894
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 184.552 (vs. 194.515, 5.12%↓) 185.248 4.608
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 303.562 (vs. 319.614, 5.02%↓) 303.939 3.560
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.913 (vs. 26.012, 4.22%↓) 24.885 0.248

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 227.422 (vs. 208.451, 9.10%↑) 222.179 14.891
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 800.833 (vs. 744.697, 7.54%↑) 790.668 39.152
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 22.027 (vs. 23.141, 4.81%↓) 22.043 0.193
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 21.410 (vs. 22.493, 4.81%↓) 21.314 0.281
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 394.100 (vs. 413.714, 4.74%↓) 394.682 8.697
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 181.078 (vs. 190.064, 4.73%↓) 182.128 4.005
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 44.233 (vs. 46.341, 4.55%↓) 44.240 0.324
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 359.557 (vs. 376.160, 4.41%↓) 357.933 6.293
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1.302 (vs. 1.359, 4.17%↓) 1.305 0.009
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1125.398 (vs. 1173.980, 4.14%↓) 1131.295 19.856
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 306.298 (vs. 318.759, 3.91%↓) 307.400 2.666
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 43.501 (vs. 45.125, 3.60%↓) 43.577 0.287
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 350.395 (vs. 363.121, 3.50%↓) 350.488 5.242
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 26.510 (vs. 27.467, 3.48%↓) 26.572 0.362
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 304.435 (vs. 315.208, 3.42%↓) 304.492 0.567
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 303.993 (vs. 314.551, 3.36%↓) 304.266 1.147
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 84.496 (vs. 81.852, 3.23%↑) 86.630 3.869
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 575.180 (vs. 594.314, 3.22%↓) 576.376 7.313
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 98.411 (vs. 95.373, 3.19%↑) 98.491 0.279
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 702.075 (vs. 724.533, 3.10%↓) 703.397 10.844
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 105.754 (vs. 102.673, 3.00%↑) 105.745 0.251
MiniLML12H384Uncased(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.670 (vs. 1.625, 2.75%↑) 1.664 0.017
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 81.070 (vs. 78.906, 2.74%↑) 80.906 0.558
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.788 (vs. 2.862, 2.59%↓) 2.786 0.012
EfficientNetV2STF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 4.385 (vs. 4.285, 2.33%↑) 4.389 0.072
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.017 (vs. 11.760, 2.19%↑) 12.036 0.087
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.244 (vs. 51.290, 2.04%↓) 50.133 0.678
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.533 (vs. 8.366, 1.99%↑) 8.523 0.053
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.304 (vs. 9.133, 1.88%↑) 9.125 0.493
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.341 (vs. 11.527, 1.62%↓) 11.338 0.089
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.766 (vs. 4.693, 1.56%↑) 4.740 0.076
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 712.406 (vs. 723.546, 1.54%↓) 714.435 9.289
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.127 (vs. 12.305, 1.45%↓) 12.133 0.081
matmul\_128x256x8192\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.026 (vs. 0.026, 1.42%↑) 0.026 0.000
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 31.670 (vs. 31.231, 1.40%↑) 31.622 0.373
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.139 (vs. 4.196, 1.34%↓) 4.134 0.015
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.716 (vs. 17.954, 1.32%↓) 17.717 0.086
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.820 (vs. 2.787, 1.18%↑) 2.815 0.037
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.867 (vs. 2.899, 1.12%↓) 2.866 0.017
BertLargeTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 10.802 (vs. 10.686, 1.09%↑) 10.758 0.079
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.718 (vs. 2.688, 1.08%↑) 2.722 0.015
BertForMaskedLMTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 7.009 (vs. 6.936, 1.05%↑) 6.991 0.055
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.366 (vs. 52.916, 1.04%↓) 52.305 0.179
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.200 (vs. 94.163, 1.02%↓) 93.150 0.118
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.735 (vs. 35.093, 1.02%↓) 34.662 0.274
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 40.104 (vs. 39.703, 1.01%↑) 39.594 1.332
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 136.808 (vs. 138.168, 0.98%↓) 136.709 0.358
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 48.441 (vs. 48.918, 0.98%↓) 48.236 0.714
matmul\_3456x1024x2048\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.166 (vs. 0.164, 0.90%↑) 0.166 0.000
matmul\_2564x2564x2564\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.949 (vs. 0.941, 0.85%↑) 0.945 0.014
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.035 (vs. 11.128, 0.83%↓) 10.735 0.598
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.046 (vs. 92.791, 0.80%↓) 92.146 0.217
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 104.759 (vs. 105.588, 0.79%↓) 104.102 1.244
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2106.533 (vs. 2122.903, 0.77%↓) 2106.490 1.970
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.206 (vs. 12.113, 0.77%↑) 12.052 0.302
matmul\_123x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.202 (vs. 0.201, 0.76%↑) 0.202 0.000
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.727 (vs. 3.755, 0.73%↓) 3.726 0.031
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.897 (vs. 4.862, 0.72%↑) 4.899 0.023
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1804.642 (vs. 1817.387, 0.70%↓) 1803.776 1.594
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 259.355 (vs. 261.124, 0.68%↓) 259.170 0.830
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 123.800 (vs. 122.970, 0.68%↑) 123.763 0.503
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.830 (vs. 5.791, 0.67%↑) 5.832 0.023
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 228.502 (vs. 229.803, 0.57%↓) 226.757 4.300
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.563 (vs. 7.605, 0.55%↓) 7.557 0.010
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.799 (vs. 66.436, 0.55%↑) 66.601 0.909
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 57.226 (vs. 56.916, 0.54%↑) 56.828 1.013
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 259.786 (vs. 261.169, 0.53%↓) 260.004 0.788
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.086 (vs. 34.265, 0.52%↓) 33.582 0.920
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.738 (vs. 180.680, 0.52%↓) 178.738 2.348
matmul\_128x256x8192\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.047 (vs. 0.047, 0.52%↑) 0.047 0.000
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.571 (vs. 0.568, 0.50%↑) 0.570 0.000
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.766 (vs. 0.770, 0.50%↓) 0.768 0.008
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.967 (vs. 33.806, 0.48%↑) 33.452 0.947
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.927 (vs. 88.333, 0.46%↓) 88.344 3.393
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 60.991 (vs. 61.266, 0.45%↓) 60.772 0.464
matmul\_2560x2560x2560\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.300 (vs. 0.299, 0.43%↑) 0.300 0.000
matmul\_3456x1024x2048\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.063 (vs. 0.063, 0.43%↑) 0.063 0.000
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 191.305 (vs. 190.509, 0.42%↑) 189.524 3.265
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.953 (vs. 26.848, 0.39%↑) 26.925 0.111
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.868 (vs. 5.848, 0.34%↑) 5.862 0.022
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.243 (vs. 78.982, 0.33%↑) 78.971 0.913
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.781 (vs. 17.840, 0.33%↓) 17.674 0.196
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.340 (vs. 275.496, 0.31%↑) 274.438 6.541
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 73.756 (vs. 73.536, 0.30%↑) 73.808 0.279
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.697 (vs. 0.699, 0.29%↓) 0.697 0.000
matmul\_2562x2564x2562\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.116 (vs. 1.119, 0.27%↓) 1.113 0.008
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.694 (vs. 21.637, 0.26%↑) 21.681 0.062
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.199 (vs. 34.112, 0.25%↑) 33.915 0.471
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.420 (vs. 5.433, 0.25%↓) 5.415 0.017
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.037 (vs. 70.205, 0.24%↓) 70.124 2.682
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.972 (vs. 4.984, 0.24%↓) 4.971 0.015
matmul\_2562x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.371 (vs. 1.368, 0.23%↑) 1.370 0.003
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.804 (vs. 11.829, 0.21%↓) 11.790 0.077
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.632 (vs. 0.631, 0.16%↑) 0.632 0.002
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 186.406 (vs. 186.108, 0.16%↑) 185.815 3.112
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.551 (vs. 4.544, 0.16%↑) 4.551 0.019
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.522 (vs. 180.796, 0.15%↓) 179.366 2.501
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.481 (vs. 8.468, 0.15%↑) 8.480 0.025
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7895.757 (vs. 7907.404, 0.15%↓) 7897.777 4.704
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 41.792 (vs. 41.736, 0.13%↑) 41.050 1.681
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.565 (vs. 7.556, 0.13%↑) 7.566 0.007
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1115.914 (vs. 1114.593, 0.12%↑) 1115.864 1.008
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 19.086 (vs. 19.065, 0.11%↑) 19.068 0.100
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.136 (vs. 5.142, 0.11%↓) 5.133 0.021
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.083 (vs. 32.051, 0.10%↑) 32.054 0.136
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 490.199 (vs. 490.576, 0.08%↓) 489.183 2.257
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.949 (vs. 6.954, 0.07%↓) 6.943 0.025
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.947 (vs. 12.940, 0.06%↑) 12.986 0.109
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 220.898 (vs. 220.771, 0.06%↑) 220.489 2.643
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2103.239 (vs. 2102.039, 0.06%↑) 2104.551 3.645
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.386 (vs. 61.421, 0.06%↓) 61.022 0.681
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.883 (vs. 61.915, 0.05%↓) 61.696 0.423
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.835 (vs. 9.831, 0.05%↑) 9.831 0.038
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 30.052 (vs. 30.041, 0.04%↑) 29.996 0.323
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1114.063 (vs. 1114.441, 0.03%↓) 1113.966 1.814
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 214.419 (vs. 214.379, 0.02%↑) 213.703 1.514
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.857 (vs. 35.851, 0.02%↑) 35.312 0.954
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.070 (vs. 13.069, 0.01%↑) 13.056 0.039
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.603 (vs. 8.604, 0.01%↓) 8.576 0.075
matmul\_2560x2560x2560\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.142 (vs. 0.142, 0.01%↓) 0.142 0.000
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.780 (vs. 8.780, 0.00%↑) 8.734 0.084

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1392 (vs. 1478, 5.82%↓) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 22087 (vs. 27813, 20.59%↓) 144032 (vs. 144032, 0.00%) 399877 (vs. 399877, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35637 (vs. 44640, 20.17%↓) 238720 (vs. 238720, 0.00%) 10458245 (vs. 10458245, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 33273 (vs. 30312, 9.77%↑) 177680 (vs. 177680, 0.00%) 2959045 (vs. 2959045, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 58793 (vs. 59534, 1.24%↓) 680000 (vs. 680000, 0.00%) 5603397 (vs. 5603397, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 21966 (vs. 23607, 6.95%↓) 174400 (vs. 174400, 0.00%) 17094405 (vs. 17094405, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 28830 (vs. 32876, 12.31%↓) 190096 (vs. 190096, 0.00%) 14173189 (vs. 14173189, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 65557 (vs. 64980, 0.89%↑) 568928 (vs. 568928, 0.00%) 4219653 (vs. 4219653, 0.00%) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 47525 (vs. 50757, 6.37%↓) 287600 (vs. 287600, 0.00%) 18229253 (vs. 18229253, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 22413 (vs. 20503, 9.32%↑) 142288 (vs. 142288, 0.00%) 5195653 (vs. 5195653, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 68207 (vs. 68042, 0.24%↑) 84496 (vs. 84496, 0.00%) 99926661 (vs. 99926661, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 68048 (vs. 65164, 4.43%↑) 93056 (vs. 93056, 0.00%) 98443077 (vs. 98443077, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 284984 (vs. 281320, 1.30%↑) 5843296 (vs. 5843296, 0.00%) 32142341 (vs. 32142341, 0.00%) 1102 (vs. 1102, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 100561 (vs. 100403, 0.16%↑) 216080 (vs. 216304, 0.10%↓) 164497516 (vs. 164497708, 0.00%↓) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 29717 (vs. 28404, 4.62%↑) 59344 (vs. 59344, 0.00%) 133996207 (vs. 133996207, 0.00%) 209 (vs. 209, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35442 (vs. 34864, 1.66%↑) 27920 (vs. 27920, 0.00%) 652755092 (vs. 652755092, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 31194 (vs. 33441, 6.72%↓) 14736 (vs. 14736, 0.00%) 652736084 (vs. 652736084, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 39171 (vs. 39822, 1.63%↓) 68608 (vs. 68608, 0.00%) 533841087 (vs. 533841087, 0.00%) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35289 (vs. 36152, 2.39%↓) 50128 (vs. 50128, 0.00%) 1336025023 (vs. 1336025023, 0.00%) 413 (vs. 413, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1465 (vs. 1619, 9.51%↓) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 21238 (vs. 26612, 20.19%↓) 107248 (vs. 107248, 0.00%) 371397 (vs. 371397, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 29535 (vs. 34786, 15.10%↓) 96944 (vs. 96944, 0.00%) 10398725 (vs. 10398725, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 30102 (vs. 26628, 13.05%↑) 113392 (vs. 113392, 0.00%) 2921349 (vs. 2921349, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 46110 (vs. 49079, 6.05%↓) 269104 (vs. 269104, 0.00%) 5219205 (vs. 5219205, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 18425 (vs. 19631, 6.14%↓) 60288 (vs. 60288, 0.00%) 17017925 (vs. 17017925, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 25196 (vs. 29210, 13.74%↓) 95248 (vs. 95248, 0.00%) 14133765 (vs. 14133765, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 52419 (vs. 56452, 7.14%↓) 330048 (vs. 330048, 0.00%) 4003653 (vs. 4003653, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 39387 (vs. 42235, 6.74%↓) 136080 (vs. 136080, 0.00%) 18361413 (vs. 18361413, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 14462 (vs. 14901, 2.95%↓) 44208 (vs. 44208, 0.00%) 5147973 (vs. 5147973, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 87089 (vs. 88112, 1.16%↓) 52400 (vs. 52400, 0.00%) 100083141 (vs. 100083141, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 83464 (vs. 83496, 0.04%↓) 52832 (vs. 52832, 0.00%) 98588357 (vs. 98588357, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 245154 (vs. 247175, 0.82%↓) 2610240 (vs. 2610240, 0.00%) 29088325 (vs. 29088325, 0.00%) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 104937 (vs. 112729, 6.91%↓) 154640 (vs. 154864, 0.14%↓) 169909868 (vs. 169910060, 0.00%↓) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 36054 (vs. 38012, 5.15%↓) 37936 (vs. 37936, 0.00%) 219509935 (vs. 219509935, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 45116 (vs. 44352, 1.72%↑) 19328 (vs. 19328, 0.00%) 992542612 (vs. 992542612, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 39809 (vs. 40339, 1.31%↓) 12864 (vs. 12864, 0.00%) 992543572 (vs. 992543572, 0.00%) 367 (vs. 367, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 47178 (vs. 46554, 1.34%↑) 40144 (vs. 40144, 0.00%) 875939327 (vs. 875939327, 0.00%) 334 (vs. 334, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 50573 (vs. 52472, 3.62%↓) 43840 (vs. 43840, 0.00%) 1336053183 (vs. 1336053183, 0.00%) 654 (vs. 654, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1459 (vs. 1595, 8.53%↓) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 20324 (vs. 26861, 24.34%↓) 99488 (vs. 99488, 0.00%) 363653 (vs. 363653, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 32482 (vs. 32044, 1.37%↑) 98656 (vs. 98656, 0.00%) 10400453 (vs. 10400453, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 31733 (vs. 27956, 13.51%↑) 117504 (vs. 117504, 0.00%) 2925445 (vs. 2925445, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 46411 (vs. 49984, 7.15%↓) 257952 (vs. 257952, 0.00%) 5208069 (vs. 5208069, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 20774 (vs. 18819, 10.39%↑) 65376 (vs. 65376, 0.00%) 17023045 (vs. 17023045, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 28986 (vs. 33329, 13.03%↓) 99104 (vs. 99104, 0.00%) 14137669 (vs. 14137669, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 55426 (vs. 57342, 3.34%↓) 323792 (vs. 323792, 0.00%) 3997381 (vs. 3997381, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 40719 (vs. 43694, 6.81%↓) 122656 (vs. 122656, 0.00%) 18348037 (vs. 18348037, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 16650 (vs. 15989, 4.13%↑) 47200 (vs. 47200, 0.00%) 5150981 (vs. 5150981, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 84932 (vs. 86455, 1.76%↓) 38880 (vs. 38880, 0.00%) 100069637 (vs. 100069637, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 80787 (vs. 86669, 6.79%↓) 39296 (vs. 39296, 0.00%) 98574789 (vs. 98574789, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 233800 (vs. 229312, 1.96%↑) 2599872 (vs. 2599872, 0.00%) 29077957 (vs. 29077957, 0.00%) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 104951 (vs. 107224, 2.12%↓) 140016 (vs. 140240, 0.16%↓) 169895212 (vs. 169895468, 0.00%↓) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 38106 (vs. 36971, 3.07%↑) 33904 (vs. 33904, 0.00%) 219505903 (vs. 219505903, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 46185 (vs. 45428, 1.67%↑) 18640 (vs. 18640, 0.00%) 992541972 (vs. 992541972, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 47229 (vs. 46434, 1.71%↑) 12336 (vs. 12336, 0.00%) 992543060 (vs. 992543060, 0.00%) 367 (vs. 367, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 45636 (vs. 45667, 0.07%↓) 34928 (vs. 34928, 0.00%) 875934143 (vs. 875934143, 0.00%) 334 (vs. 334, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 57085 (vs. 55556, 2.75%↑) 39360 (vs. 39360, 0.00%) 1336048703 (vs. 1336048703, 0.00%) 654 (vs. 654, 0.00%)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 89163 (vs. 91641, 2.70%↓) 836820 (vs. 844192, 0.87%↓) 165151572 (vs. 165158932, 0.00%↓) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 27041 (vs. 27442, 1.46%↓) 145212 (vs. 145212, 0.00%) 134088431 (vs. 134088431, 0.00%) 209 (vs. 209, 0.00%)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 33864 (vs. 34177, 0.92%↓) 227968 (vs. 227968, 0.00%) 534005858 (vs. 534005858, 0.00%) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 31251 (vs. 35625, 12.28%↓) 130852 (vs. 130852, 0.00%) 1336110955 (vs. 1336110955, 0.00%) 413 (vs. 413, 0.00%)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1605 (vs. 1694, 5.25%↓) 30540 (vs. 30540, 0.00%) 41263 (vs. 41263, 0.00%) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1929 (vs. 2096, 7.97%↓) 43516 (vs. 43516, 0.00%) 54239 (vs. 54239, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1721 (vs. 1700, 1.24%↑) 28492 (vs. 28492, 0.00%) 39151 (vs. 39151, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1741 (vs. 1790, 2.74%↓) 39892 (vs. 39892, 0.00%) 50551 (vs. 50551, 0.00%) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2056 (vs. 2052, 0.19%↑) 85344 (vs. 85344, 0.00%) 96003 (vs. 96003, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2042 (vs. 2062, 0.97%↓) 88224 (vs. 88224, 0.00%) 98947 (vs. 98947, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2573 (vs. 2513, 2.39%↑) 82792 (vs. 82792, 0.00%) 93515 (vs. 93515, 0.00%) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2080 (vs. 1956, 6.34%↑) 49772 (vs. 49772, 0.00%) 60494 (vs. 60494, 0.00%) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 1032 (vs. 934, 10.49%↑) 8828 (vs. 8828, 0.00%) 26181 (vs. 26181, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 802 (vs. 957, 16.20%↓) 9784 (vs. 9784, 0.00%) 27137 (vs. 27137, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 32627 (vs. 31517, 3.52%↑) 50552 (vs. 50552, 0.00%) 133987441 (vs. 133987441, 0.00%) 209 (vs. 209, 0.00%)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 19527 (vs. 18227, 7.13%↑) 42840 (vs. 42840, 0.00%) 2824263 (vs. 2824263, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 39609 (vs. 42194, 6.13%↓) 185480 (vs. 185480, 0.00%) 5108871 (vs. 5108871, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 70587 (vs. 71918, 1.85%↓) 49416 (vs. 49416, 0.00%) 98399431 (vs. 98399431, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 238205 (vs. 243847, 2.31%↓) 2108296 (vs. 2108296, 0.00%) 28407303 (vs. 28407303, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 15717 (vs. 17691, 11.16%↓) 50984 (vs. 50984, 0.00%) 16976263 (vs. 16976263, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 50357 (vs. 50934, 1.13%↓) 217032 (vs. 217032, 0.00%) 3867719 (vs. 3867719, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 17902 (vs. 20014, 10.55%↓) 58936 (vs. 58936, 0.00%) 314759 (vs. 314759, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 52524 (vs. 53702, 2.19%↓) 390748 (vs. 390748, 0.00%) 5314183 (vs. 5314183, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 272685 (vs. 281476, 3.12%↓) 2116860 (vs. 2116860, 0.00%) 28415879 (vs. 28415879, 0.00%) 1102 (vs. 1102, 0.00%)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 20738 (vs. 21594, 3.96%↓) 124428 (vs. 124428, 0.00%) 380231 (vs. 380231, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 56950 (vs. 59590, 4.43%↓) 328664 (vs. 328664, 0.00%) 3979399 (vs. 3979399, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 18857 (vs. 19871, 5.10%↓) 56272 (vs. 56272, 0.00%) 2837637 (vs. 2837637, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 46662 (vs. 50186, 7.02%↓) 32912 (vs. 32912, 0.00%) 98382917 (vs. 98382917, 0.00%) 704 (vs. 704, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 36229 (vs. 35195, 2.94%↑) 20320 (vs. 20320, 0.00%) 652747476 (vs. 652747476, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 32384 (vs. 34501, 6.14%↓) 9024 (vs. 9024, 0.00%) 652730324 (vs. 652730324, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 276806 (vs. 274663, 0.78%↑) 4387712 (vs. 4387712, 0.00%) 30686725 (vs. 30686725, 0.00%) 1102 (vs. 1102, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 101706 (vs. 106052, 4.10%↓) 694624 (vs. 694624, 0.00%) 88728645 (vs. 88728645, 0.00%) 268 (vs. 268, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 22936 (vs. 23859, 3.87%↓) 49984 (vs. 49984, 0.00%) 2848965 (vs. 2848965, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 85248 (vs. 86136, 1.03%↓) 21600 (vs. 21600, 0.00%) 98557061 (vs. 98557061, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 39815 (vs. 43650, 8.79%↓) 11040 (vs. 11040, 0.00%) 992509844 (vs. 992509844, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 45054 (vs. 45858, 1.75%↓) 9232 (vs. 9232, 0.00%) 992515476 (vs. 992515476, 0.00%) 367 (vs. 367, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 220342 (vs. 212485, 3.70%↑) 1372976 (vs. 1372976, 0.00%) 27851077 (vs. 27851077, 0.00%) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 75152 (vs. 81216, 7.47%↓) 125600 (vs. 125600, 0.00%) 88155333 (vs. 88155333, 0.00%) 386 (vs. 386, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 21144 (vs. 24012, 11.94%↓) 41008 (vs. 41008, 0.00%) 2840005 (vs. 2840005, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 75315 (vs. 77869, 3.28%↓) 22096 (vs. 22096, 0.00%) 98557509 (vs. 98557509, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 39477 (vs. 43348, 8.93%↓) 10640 (vs. 10640, 0.00%) 992509460 (vs. 992509460, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 37698 (vs. 41190, 8.48%↓) 8928 (vs. 8928, 0.00%) 992515156 (vs. 992515156, 0.00%) 367 (vs. 367, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 209005 (vs. 211584, 1.22%↓) 1235968 (vs. 1235968, 0.00%) 27714053 (vs. 27714053, 0.00%) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 67589 (vs. 68099, 0.75%↓) 116368 (vs. 116368, 0.00%) 88146053 (vs. 88146053, 0.00%) 386 (vs. 386, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1058 (vs. 1179, 10.26%↓) 3872 (vs. 3872, 0.00%) 271673 (vs. 271673, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 1998 (vs. 1960, 1.94%↑) 4480 (vs. 4480, 0.00%) 273605 (vs. 273605, 0.00%) 4 (vs. 4, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1134 (vs. 1305, 13.10%↓) 3856 (vs. 3856, 0.00%) 272965 (vs. 272965, 0.00%) 4 (vs. 4, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 62329 (vs. 61120, 1.98%↑) 246320 (vs. 246320, 0.00%) 98597530 (vs. 98597530, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 69761 (vs. 72553, 3.85%↓) 246320 (vs. 246320, 0.00%) 98597530 (vs. 98597530, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 64891 (vs. 66616, 2.59%↓) 143012 (vs. 143012, 0.00%) 98493753 (vs. 98493753, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 98465 (vs. 98323, 0.14%↑) 3004452 (vs. 3004452, 0.00%) 52898580 (vs. 52898580, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 184430 (vs. 191875, 3.88%↓) 6756400 (vs. 6756400, 0.00%) 33056470 (vs. 33056470, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 62399 (vs. 66018, 5.48%↓) 143012 (vs. 143012, 0.00%) 98493881 (vs. 98493881, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 82666 (vs. 85084, 2.84%↓) 3008480 (vs. 3008480, 0.00%) 52905492 (vs. 52905492, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 180249 (vs. 191395, 5.82%↓) 6736144 (vs. 6736144, 0.00%) 33024150 (vs. 33024150, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 103591 (vs. 104261, 0.64%↓) 143012 (vs. 143012, 0.00%) 99716025 (vs. 99716025, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 121344 (vs. 125680, 3.45%↓) 3008480 (vs. 3008480, 0.00%) 54169300 (vs. 54169300, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 294606 (vs. 297392, 0.94%↓) 6736144 (vs. 6736144, 0.00%) 34937238 (vs. 34937238, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 21263 (vs. 21268, 0.02%↓) 208629 (vs. 208629, 0.00%) 14213630 (vs. 14213630, 0.00%) 172 (vs. 172, 0.00%)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 28949 (vs. 31492, 8.08%↓) 311349 (vs. 311349, 0.00%) 10551678 (vs. 10551678, 0.00%) 210 (vs. 210, 0.00%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment