Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/019bff253bf8d99d553c4a744dd842f4 to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/019bff253bf8d99d553c4a744dd842f4 to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Data-Tiling Comparison Table

Name No-DT (baseline) DT-Only DT-UK
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 216.921 (1.0X) N/A 109.340 (2.0X)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 758.494 (1.0X) N/A 223.328 (3.4X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.942 (1.0X) N/A 8.489 (0.8X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.131 (1.0X) N/A 29.966 (1.1X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.866 (1.0X) N/A 34.080 (1.0X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.768 (1.0X) N/A 225.886 (1.2X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.815 (1.0X) N/A 4.983 (1.2X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.858 (1.0X) N/A 13.115 (2.0X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.221 (1.0X) N/A 8.446 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.083 (1.0X) N/A 39.393 (1.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.297 (1.0X) N/A 9.686 (1.2X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 88.344 (1.0X) N/A 41.488 (2.1X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.747 (1.0X) N/A 14.324 (0.9X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.687 (1.0X) N/A 57.401 (1.4X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.574 (1.0X) N/A 61.500 (0.6X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.747 (1.0X) N/A 184.963 (1.0X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.566 (1.0X) N/A 62.307 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 183.036 (1.0X) N/A 197.783 (0.9X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.954 (1.0X) N/A 62.489 (1.1X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 492.452 (1.0X) N/A 214.371 (2.3X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.906 (1.0X) N/A 4.547 (1.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.593 (1.0X) N/A 17.876 (1.4X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.745 (1.0X) N/A 4.877 (0.8X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.673 (1.0X) N/A 11.799 (1.0X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.909 (1.0X) N/A 5.392 (1.1X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.607 (1.0X) N/A 11.758 (1.8X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.849 (1.0X) N/A 2.828 (1.0X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.821 (1.0X) N/A 2.711 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.483 (1.0X) N/A 9.761 (0.9X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.513 (1.0X) N/A 31.717 (1.1X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.774 (1.0X) N/A 0.631 (1.2X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.700 (1.0X) N/A 0.566 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.199 (1.0X) N/A 5.113 (0.8X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.663 (1.0X) N/A 18.918 (0.9X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.563 (1.0X) N/A 7.566 (1.0X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 49.380 (1.0X) N/A 43.718 (1.1X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 51.648 (1.0X) N/A 44.585 (1.2X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 31.058 (1.0X) N/A 28.112 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.754 (1.0X) N/A 21.105 (4.4X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.098 (1.0X) N/A 22.085 (4.2X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.436 (1.0X) N/A 21.817 (2.4X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 138.138 (1.0X) N/A 27.482 (5.0X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 139.666 (1.0X) N/A 29.835 (4.7X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 69.901 (1.0X) N/A 27.310 (2.6X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 736.719 (1.0X) N/A 365.128 (2.0X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 739.152 (1.0X) N/A 368.830 (2.0X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 411.371 (1.0X) N/A 222.725 (1.8X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1136.649 (1.0X) N/A 321.015 (3.5X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1135.628 (1.0X) N/A 321.286 (3.5X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 595.702 (1.0X) N/A 190.482 (3.1X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2059.999 (1.0X) N/A 307.429 (6.7X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2064.209 (1.0X) N/A 304.907 (6.8X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1121.338 (1.0X) N/A 181.830 (6.2X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.300 (1.0X) N/A 1.323 (9.3X)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 83.181 (vs. 75.567, 10.08%↑) 83.092 0.206
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 139.666 (vs. 129.693, 7.69%↑) 139.560 0.415
MiniLML12H384Uncased(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.693 (vs. 1.598, 5.97%↑) 1.691 0.006
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 73.302 (vs. 69.217, 5.90%↑) 73.473 0.608
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 95.452 (vs. 90.653, 5.29%↑) 91.671 6.548

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 27.482 (vs. 29.182, 5.82%↓) 27.411 1.053
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1121.338 (vs. 1182.442, 5.17%↓) 1124.684 21.667
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 181.830 (vs. 191.496, 5.05%↓) 182.213 4.081

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.711 (vs. 3.146, 13.83%↓) 2.693 0.037
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.686 (vs. 8.989, 7.75%↑) 9.658 0.083
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 223.328 (vs. 236.138, 5.42%↓) 222.622 1.658
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 29.835 (vs. 31.362, 4.87%↓) 29.748 0.730
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 44.585 (vs. 46.778, 4.69%↓) 44.575 0.262
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 138.138 (vs. 132.200, 4.49%↑) 138.135 0.109
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 321.015 (vs. 335.800, 4.40%↓) 321.181 1.691
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 28.112 (vs. 29.401, 4.38%↓) 28.411 0.829
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 21.817 (vs. 22.756, 4.13%↓) 21.856 0.186
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.906 (vs. 4.711, 4.12%↑) 4.871 0.096
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.747 (vs. 12.248, 4.07%↑) 12.614 0.313
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.115 (vs. 13.668, 4.04%↓) 13.114 0.107
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 21.105 (vs. 21.942, 3.81%↓) 21.075 0.077
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 22.085 (vs. 22.959, 3.81%↓) 22.067 0.135
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 69.901 (vs. 72.539, 3.64%↓) 69.882 0.350
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 758.494 (vs. 786.772, 3.59%↓) 741.793 47.077
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.113 (vs. 5.300, 3.52%↓) 5.116 0.014
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.877 (vs. 5.053, 3.48%↓) 4.882 0.030
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 736.719 (vs. 712.047, 3.46%↑) 739.214 10.488
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 222.725 (vs. 230.518, 3.38%↓) 224.219 6.188
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2059.999 (vs. 2128.879, 3.24%↓) 2060.026 2.994
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 197.783 (vs. 191.886, 3.07%↑) 196.624 3.801
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 190.482 (vs. 196.480, 3.05%↓) 191.645 4.109
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.392 (vs. 5.561, 3.04%↓) 5.396 0.017
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 27.310 (vs. 28.156, 3.00%↓) 27.402 0.451
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.761 (vs. 10.062, 2.99%↓) 9.755 0.033
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 97.869 (vs. 95.088, 2.92%↑) 97.916 0.238
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2064.209 (vs. 2124.217, 2.82%↓) 2064.418 1.779
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 43.718 (vs. 44.952, 2.74%↓) 43.715 0.299
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.080 (vs. 35.021, 2.69%↓) 33.876 0.336
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 365.128 (vs. 374.084, 2.39%↓) 367.022 5.550
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.799 (vs. 11.530, 2.33%↑) 11.768 0.105
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 321.286 (vs. 328.950, 2.33%↓) 320.462 2.443
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 307.429 (vs. 314.694, 2.31%↓) 308.869 4.337
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.866 (vs. 35.668, 2.25%↓) 34.209 0.960
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.300 (vs. 12.582, 2.24%↓) 12.314 0.053
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.500 (vs. 62.908, 2.24%↓) 61.260 0.505
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1.323 (vs. 1.353, 2.17%↓) 1.329 0.015
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 109.340 (vs. 107.016, 2.17%↑) 108.600 1.259
EfficientNetV2STF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 4.357 (vs. 4.268, 2.10%↑) 4.295 0.167
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 14.324 (vs. 14.032, 2.08%↑) 14.309 0.109
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 304.907 (vs. 311.328, 2.06%↓) 305.278 2.897
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 225.886 (vs. 230.306, 1.92%↓) 226.232 4.296
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.297 (vs. 11.086, 1.91%↑) 10.949 0.593
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 39.393 (vs. 40.122, 1.82%↓) 38.992 1.281
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 31.058 (vs. 31.629, 1.80%↓) 31.284 0.940
BertLargeTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 10.871 (vs. 10.682, 1.77%↑) 10.866 0.025
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 6.942 (vs. 7.067, 1.77%↓) 6.946 0.038
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.593 (vs. 24.190, 1.66%↑) 24.822 0.539
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.566 (vs. 34.018, 1.61%↑) 34.117 0.993
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.483 (vs. 8.621, 1.60%↓) 8.458 0.048
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 31.717 (vs. 31.220, 1.59%↑) 31.671 0.335
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 101.316 (vs. 102.865, 1.51%↓) 101.437 0.437
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.436 (vs. 53.216, 1.46%↓) 52.392 0.198
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.098 (vs. 94.474, 1.46%↓) 93.077 0.079
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.983 (vs. 5.055, 1.41%↓) 4.978 0.021
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.673 (vs. 11.841, 1.41%↓) 11.682 0.118
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.489 (vs. 8.607, 1.37%↓) 8.492 0.026
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 368.830 (vs. 373.944, 1.37%↓) 368.536 7.038
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.663 (vs. 17.428, 1.35%↑) 17.614 0.111
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 595.702 (vs. 603.744, 1.33%↓) 597.598 12.534
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 51.648 (vs. 51.003, 1.26%↑) 52.006 1.304
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 41.488 (vs. 42.018, 1.26%↓) 40.982 1.621
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 62.307 (vs. 63.094, 1.25%↓) 61.982 0.610
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 82.768 (vs. 81.808, 1.17%↑) 82.767 0.193
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.199 (vs. 4.248, 1.15%↓) 4.204 0.023
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 411.371 (vs. 416.086, 1.13%↓) 415.581 11.291
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.446 (vs. 8.543, 1.13%↓) 8.407 0.131
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.513 (vs. 34.134, 1.11%↑) 34.525 0.339
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 258.211 (vs. 260.876, 1.02%↓) 258.027 0.534
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.876 (vs. 18.054, 0.98%↓) 17.795 0.210
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 18.918 (vs. 19.099, 0.95%↓) 18.925 0.153
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.754 (vs. 93.619, 0.92%↓) 92.742 0.083
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.849 (vs. 2.875, 0.90%↓) 2.850 0.019
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 88.344 (vs. 89.077, 0.82%↓) 88.791 3.265
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1135.628 (vs. 1126.364, 0.82%↑) 1134.850 2.141
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 257.428 (vs. 259.546, 0.82%↓) 257.649 0.509
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1136.649 (vs. 1127.739, 0.79%↑) 1136.637 2.227
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 29.966 (vs. 30.203, 0.78%↓) 29.874 0.198
matmul\_128x256x8192\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.026 (vs. 0.025, 0.78%↑) 0.026 0.000
matmul\_3456x1024x2048\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.063 (vs. 0.063, 0.70%↑) 0.063 0.000
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 184.963 (vs. 186.268, 0.70%↓) 183.700 3.005
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.083 (vs. 70.545, 0.65%↓) 70.082 2.759
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1813.691 (vs. 1802.272, 0.63%↑) 1813.456 1.771
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.566 (vs. 0.570, 0.62%↓) 0.566 0.001
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.828 (vs. 2.811, 0.59%↑) 2.820 0.024
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 79.485 (vs. 79.037, 0.57%↑) 79.368 0.848
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 74.082 (vs. 73.670, 0.56%↑) 73.913 0.457
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 183.036 (vs. 182.048, 0.54%↑) 182.013 2.251
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.574 (vs. 34.754, 0.52%↓) 34.248 0.954
matmul\_128x256x8192\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.047 (vs. 0.047, 0.52%↓) 0.047 0.000
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 492.452 (vs. 489.948, 0.51%↑) 491.380 2.155
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 122.802 (vs. 123.433, 0.51%↓) 122.784 0.103
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 216.921 (vs. 215.866, 0.49%↑) 210.808 16.026
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 739.152 (vs. 735.566, 0.49%↑) 741.798 10.669
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 49.380 (vs. 49.618, 0.48%↓) 49.617 1.249
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 276.768 (vs. 275.448, 0.48%↑) 276.115 3.705
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.547 (vs. 4.569, 0.47%↓) 4.545 0.022
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.747 (vs. 180.585, 0.46%↓) 179.045 2.563
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.774 (vs. 0.771, 0.46%↑) 0.773 0.007
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.954 (vs. 66.681, 0.41%↑) 66.635 0.500
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.815 (vs. 5.794, 0.36%↑) 5.808 0.018
matmul\_3456x1024x2048\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.130 (vs. 0.130, 0.35%↑) 0.130 0.001
matmul\_2562x2564x2562\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.109 (vs. 1.113, 0.34%↓) 1.109 0.001
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.758 (vs. 11.798, 0.34%↓) 11.745 0.053
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.221 (vs. 9.192, 0.31%↑) 8.970 0.446
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 62.489 (vs. 62.666, 0.28%↓) 62.334 0.327
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.909 (vs. 5.925, 0.27%↓) 5.914 0.019
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.858 (vs. 26.789, 0.26%↑) 26.842 0.069
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.607 (vs. 21.558, 0.23%↑) 21.585 0.086
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.745 (vs. 3.736, 0.23%↑) 3.748 0.013
matmul\_2564x2564x2564\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.904 (vs. 0.906, 0.22%↓) 0.904 0.001
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.563 (vs. 7.579, 0.21%↓) 7.559 0.014
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.131 (vs. 32.066, 0.20%↑) 32.147 0.199
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.821 (vs. 2.815, 0.20%↑) 2.818 0.013
BertForMaskedLMTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 7.057 (vs. 7.069, 0.17%↓) 7.034 0.078
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 57.401 (vs. 57.473, 0.12%↓) 57.087 0.903
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.566 (vs. 7.571, 0.07%↓) 7.563 0.017
matmul\_2560x2560x2560\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.141 (vs. 0.141, 0.05%↑) 0.141 0.000
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7898.894 (vs. 7902.203, 0.04%↓) 7900.971 3.610
matmul\_2562x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.535 (vs. 1.534, 0.04%↑) 1.535 0.001
matmul\_123x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.222 (vs. 0.222, 0.03%↓) 0.223 0.001
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 214.371 (vs. 214.435, 0.03%↓) 214.519 1.839
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.700 (vs. 0.700, 0.03%↓) 0.700 0.000
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.631 (vs. 0.631, 0.02%↑) 0.631 0.001
matmul\_2560x2560x2560\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.303 (vs. 0.303, 0.00%↓) 0.303 0.000
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.687 (vs. 79.685, 0.00%↑) 79.266 0.983

Regressed Total Dispatch Sizes 🚩

Benchmark Name Total Dispatch Size (bytes)
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt,compile-stats] 749360 (vs. 694752, 7.86%↑)

Improved Total Dispatch Sizes 🎉

Benchmark Name Total Dispatch Size (bytes)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 28992 (vs. 32480, 10.74%↓)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 33520 (vs. 37328, 10.20%↓)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 37776 (vs. 41632, 9.26%↓)
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 131572 (vs. 143012, 8.00%↓)
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 131572 (vs. 143012, 8.00%↓)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 33584 (vs. 36480, 7.94%↓)
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt,compile-stats] 30384 (vs. 32912, 7.68%↓)
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags,compile-stats] 229200 (vs. 246320, 6.95%↓)
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,compile-stats] 229200 (vs. 246320, 6.95%↓)
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,compile-stats] 134196 (vs. 143012, 6.16%↓)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 17680 (vs. 18784, 5.88%↓)
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] 1301888 (vs. 1373856, 5.24%↓)
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only,compile-stats] 1309056 (vs. 1380992, 5.21%↓)
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 88224 (vs. 93056, 5.19%↓)
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 2468160 (vs. 2600320, 5.08%↓)
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 2478544 (vs. 2610608, 5.06%↓)

Regressed Stream IR Dispatch Count (# of cmd.dispatch ops) 🚩

Benchmark Name Stream IR Dispatch Count (# of cmd.dispatch ops)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 380 (vs. 330, 15.15%↑)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 380 (vs. 330, 15.15%↑)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] 380 (vs. 330, 15.15%↑)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only,compile-stats] 380 (vs. 330, 15.15%↑)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 234 (vs. 209, 11.96%↑)
MiniLML12H384Uncased(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats] 234 (vs. 209, 11.96%↑)
MiniLML12H384Uncased(stablehlo) [riscv\_64-generic-linux\_gnu-llvm\_cpu][default-flags,compile-stats] 234 (vs. 209, 11.96%↑)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 462 (vs. 413, 11.86%↑)
BertLargeTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats] 462 (vs. 413, 11.86%↑)
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 237 (vs. 212, 11.79%↑)
BertForMaskedLMTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats] 237 (vs. 212, 11.79%↑)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 258 (vs. 233, 10.73%↑)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt,compile-stats] 258 (vs. 233, 10.73%↑)
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 187 (vs. 172, 8.72%↑)
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 372 (vs. 346, 7.51%↑)
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 372 (vs. 346, 7.51%↑)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 728 (vs. 678, 7.37%↑)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 728 (vs. 678, 7.37%↑)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 367 (vs. 342, 7.31%↑)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 367 (vs. 342, 7.31%↑)
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 220 (vs. 210, 4.76%↑)
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 728 (vs. 704, 3.41%↑)
MobileBertSquad\_fp32(tflite) [riscv\_64-generic-linux\_gnu-llvm\_cpu][default-flags,compile-stats] 728 (vs. 704, 3.41%↑)
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt,compile-stats] 728 (vs. 704, 3.41%↑)
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags,compile-stats] 728 (vs. 704, 3.41%↑)
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,compile-stats] 728 (vs. 704, 3.41%↑)
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,compile-stats] 728 (vs. 704, 3.41%↑)
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 728 (vs. 704, 3.41%↑)
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 728 (vs. 704, 3.41%↑)
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt,compile-stats] 1126 (vs. 1102, 2.18%↑)
MobileBertSquad\_int8(tflite) [riscv\_64-generic-linux\_gnu-llvm\_cpu][default-flags,compile-stats] 1126 (vs. 1102, 2.18%↑)
MobileBertSquad\_int8(tflite) [riscv\_32-generic-linux\_gnu-llvm\_cpu][default-flags,compile-stats] 1126 (vs. 1102, 2.18%↑)
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt,compile-stats] 1126 (vs. 1102, 2.18%↑)
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,compile-stats] 1126 (vs. 1102, 2.18%↑)
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 1126 (vs. 1102, 2.18%↑)
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 1126 (vs. 1102, 2.18%↑)
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] 382 (vs. 375, 1.87%↑)
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only,compile-stats] 382 (vs. 375, 1.87%↑)
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt,compile-stats] 272 (vs. 268, 1.49%↑)
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 1786 (vs. 1762, 1.36%↑)
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 1786 (vs. 1762, 1.36%↑)
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] 1786 (vs. 1762, 1.36%↑)
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only,compile-stats] 1786 (vs. 1762, 1.36%↑)
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 2184 (vs. 2160, 1.11%↑)
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 2184 (vs. 2160, 1.11%↑)
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] 2184 (vs. 2160, 1.11%↑)
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only,compile-stats] 2184 (vs. 2160, 1.11%↑)

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1568 (vs. 1549, 1.23%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 26620 (vs. 23220, 14.64%↑) 144032 (vs. 144032, 0.00%) 399877 (vs. 399877, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 43834 (vs. 39575, 10.76%↑) 238720 (vs. 238720, 0.00%) 10458245 (vs. 10458245, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 29063 (vs. 35396, 17.89%↓) 177680 (vs. 177680, 0.00%) 2959045 (vs. 2959045, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 63589 (vs. 64481, 1.38%↓) 680016 (vs. 680016, 0.00%) 5603397 (vs. 5603397, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 29042 (vs. 23526, 23.45%↑) 174400 (vs. 174400, 0.00%) 17094405 (vs. 17094405, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 33758 (vs. 34182, 1.24%↓) 190096 (vs. 190096, 0.00%) 14173189 (vs. 14173189, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 66867 (vs. 67204, 0.50%↓) 569008 (vs. 569008, 0.00%) 4219717 (vs. 4219717, 0.00%) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 47539 (vs. 50931, 6.66%↓) 287600 (vs. 287600, 0.00%) 18229253 (vs. 18229253, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 24313 (vs. 23219, 4.71%↑) 142288 (vs. 142288, 0.00%) 5195653 (vs. 5195653, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 69626 (vs. 75095, 7.28%↓) 84496 (vs. 84496, 0.00%) 99926661 (vs. 99926661, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 67730 (vs. 72283, 6.30%↓) 88224 (vs. 93056, 5.19%↓) 98444037 (vs. 98443077, 0.00%↑) 728 (vs. 704, 3.41%↑)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 281404 (vs. 283475, 0.73%↓) 5721504 (vs. 5843296, 2.08%↓) 32028613 (vs. 32142341, 0.35%↓) 1126 (vs. 1102, 2.18%↑)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 103721 (vs. 96962, 6.97%↑) 216304 (vs. 216304, 0.00%) 164497708 (vs. 164497708, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35341 (vs. 32204, 9.74%↑) 56416 (vs. 59344, 4.93%↓) 133999279 (vs. 133996207, 0.00%↑) 234 (vs. 209, 11.96%↑)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 34905 (vs. 37728, 7.48%↓) 28016 (vs. 27920, 0.34%↑) 652761364 (vs. 652755092, 0.00%↑) 258 (vs. 233, 10.73%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 33311 (vs. 35404, 5.91%↓) 14736 (vs. 14736, 0.00%) 652736084 (vs. 652736084, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 41247 (vs. 36199, 13.95%↑) 70464 (vs. 68608, 2.71%↑) 533849023 (vs. 533841087, 0.00%↑) 237 (vs. 212, 11.79%↑)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35684 (vs. 36987, 3.52%↓) 49360 (vs. 50128, 1.53%↓) 1336036095 (vs. 1336025023, 0.00%↑) 462 (vs. 413, 11.86%↑)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1511 (vs. 1512, 0.07%↓) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 29844 (vs. 24836, 20.16%↑) 107248 (vs. 107248, 0.00%) 371397 (vs. 371397, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 37847 (vs. 37293, 1.49%↑) 96944 (vs. 96944, 0.00%) 10398725 (vs. 10398725, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 27699 (vs. 30371, 8.80%↓) 113392 (vs. 113392, 0.00%) 2921349 (vs. 2921349, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 49924 (vs. 49158, 1.56%↑) 269120 (vs. 269120, 0.00%) 5219205 (vs. 5219205, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 19251 (vs. 18659, 3.17%↑) 60288 (vs. 60288, 0.00%) 17017925 (vs. 17017925, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 32490 (vs. 28341, 14.64%↑) 95248 (vs. 95248, 0.00%) 14133765 (vs. 14133765, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 56392 (vs. 59719, 5.57%↓) 330176 (vs. 330176, 0.00%) 4003781 (vs. 4003781, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 43525 (vs. 42295, 2.91%↑) 136080 (vs. 136080, 0.00%) 18361413 (vs. 18361413, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 17774 (vs. 15629, 13.72%↑) 44208 (vs. 44208, 0.00%) 5147973 (vs. 5147973, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 88064 (vs. 89504, 1.61%↓) 53488 (vs. 53488, 0.00%) 100084229 (vs. 100084229, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 84273 (vs. 85638, 1.59%↓) 53808 (vs. 53920, 0.21%↓) 98595333 (vs. 98589445, 0.01%↑) 1786 (vs. 1762, 1.36%↑)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 245836 (vs. 246977, 0.46%↓) 2478544 (vs. 2610608, 5.06%↓) 28946693 (vs. 29088709, 0.49%↓) 2184 (vs. 2160, 1.11%↑)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 110706 (vs. 111520, 0.73%↓) 154864 (vs. 154864, 0.00%) 169910060 (vs. 169910060, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 43100 (vs. 36873, 16.89%↑) 33584 (vs. 36480, 7.94%↓) 219480623 (vs. 219477487, 0.00%↑) 367 (vs. 342, 7.31%↑)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 44365 (vs. 47239, 6.08%↓) 17968 (vs. 18224, 1.40%↓) 992556564 (vs. 992542996, 0.00%↑) 380 (vs. 330, 15.15%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 41219 (vs. 44905, 8.21%↓) 11392 (vs. 11392, 0.00%) 992539860 (vs. 992539860, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 49791 (vs. 43959, 13.27%↑) 37488 (vs. 38336, 2.21%↓) 875874559 (vs. 875869119, 0.00%↑) 372 (vs. 346, 7.51%↑)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 54173 (vs. 52444, 3.30%↑) 37776 (vs. 41632, 9.26%↓) 1336070015 (vs. 1336061759, 0.00%↑) 728 (vs. 678, 7.37%↑)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1612 (vs. 1713, 5.90%↓) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 24298 (vs. 27859, 12.78%↓) 99488 (vs. 99488, 0.00%) 363653 (vs. 363653, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 37631 (vs. 33651, 11.83%↑) 98656 (vs. 98656, 0.00%) 10400453 (vs. 10400453, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 28455 (vs. 32025, 11.15%↓) 117504 (vs. 117504, 0.00%) 2925445 (vs. 2925445, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 49206 (vs. 48363, 1.74%↑) 257968 (vs. 257968, 0.00%) 5208069 (vs. 5208069, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 20641 (vs. 19437, 6.19%↑) 65376 (vs. 65376, 0.00%) 17023045 (vs. 17023045, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 35321 (vs. 31408, 12.46%↑) 99104 (vs. 99104, 0.00%) 14137669 (vs. 14137669, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 59688 (vs. 59636, 0.09%↑) 323920 (vs. 323920, 0.00%) 3997509 (vs. 3997509, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 44462 (vs. 44430, 0.07%↑) 122656 (vs. 122656, 0.00%) 18348037 (vs. 18348037, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 16955 (vs. 18833, 9.97%↓) 47200 (vs. 47200, 0.00%) 5150981 (vs. 5150981, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 86954 (vs. 83834, 3.72%↑) 39040 (vs. 39040, 0.00%) 100069765 (vs. 100069765, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 84957 (vs. 84122, 0.99%↑) 39360 (vs. 39456, 0.24%↓) 98580869 (vs. 98574981, 0.01%↑) 1786 (vs. 1762, 1.36%↑)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 231589 (vs. 239954, 3.49%↓) 2468160 (vs. 2600320, 5.08%↓) 28936325 (vs. 29078405, 0.49%↓) 2184 (vs. 2160, 1.11%↑)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 112325 (vs. 114572, 1.96%↓) 140240 (vs. 140240, 0.00%) 169895468 (vs. 169895468, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 42353 (vs. 36722, 15.33%↑) 28992 (vs. 32480, 10.74%↓) 219476015 (vs. 219473455, 0.00%↑) 367 (vs. 342, 7.31%↑)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 47430 (vs. 51452, 7.82%↓) 17680 (vs. 18784, 5.88%↓) 992556308 (vs. 992543572, 0.00%↑) 380 (vs. 330, 15.15%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 48533 (vs. 47872, 1.38%↑) 11280 (vs. 11280, 0.00%) 992539796 (vs. 992539796, 0.00%) 355 (vs. 355, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 45671 (vs. 46808, 2.43%↓) 31520 (vs. 33072, 4.69%↓) 875868607 (vs. 875863871, 0.00%↑) 372 (vs. 346, 7.51%↑)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 57726 (vs. 56350, 2.44%↑) 33520 (vs. 37328, 10.20%↓) 1336065791 (vs. 1336057407, 0.00%↑) 728 (vs. 678, 7.37%↑)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 94432 (vs. 91950, 2.70%↑) 843788 (vs. 843788, 0.00%) 165158548 (vs. 165158548, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 25945 (vs. 24996, 3.80%↑) 145212 (vs. 145080, 0.09%↑) 134094813 (vs. 134088303, 0.00%↑) 234 (vs. 209, 11.96%↑)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 38243 (vs. 32540, 17.53%↑) 233552 (vs. 227472, 2.67%↑) 534018303 (vs. 534005346, 0.00%↑) 237 (vs. 212, 11.79%↑)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 38640 (vs. 32247, 19.83%↑) 131556 (vs. 130696, 0.66%↑) 1336123809 (vs. 1336110763, 0.00%↑) 462 (vs. 413, 11.86%↑)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1856 (vs. 1789, 3.75%↑) 30556 (vs. 30556, 0.00%) 41279 (vs. 41279, 0.00%) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1983 (vs. 1835, 8.07%↑) 45028 (vs. 45028, 0.00%) 55751 (vs. 55751, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1516 (vs. 1618, 6.30%↓) 28524 (vs. 28524, 0.00%) 39183 (vs. 39183, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1940 (vs. 1888, 2.75%↑) 41404 (vs. 41404, 0.00%) 52063 (vs. 52063, 0.00%) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2356 (vs. 2143, 9.94%↑) 86844 (vs. 86844, 0.00%) 97503 (vs. 97503, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2123 (vs. 2225, 4.58%↓) 89724 (vs. 89724, 0.00%) 100447 (vs. 100447, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2694 (vs. 2656, 1.43%↑) 84196 (vs. 84196, 0.00%) 94919 (vs. 94919, 0.00%) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2107 (vs. 2064, 2.08%↑) 51232 (vs. 51232, 0.00%) 61954 (vs. 61954, 0.00%) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 1125 (vs. 788, 42.77%↑) 8828 (vs. 8828, 0.00%) 26181 (vs. 26181, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 1059 (vs. 1002, 5.69%↑) 9784 (vs. 9784, 0.00%) 27137 (vs. 27137, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 35920 (vs. 32657, 9.99%↑) 49448 (vs. 50648, 2.37%↓) 133992369 (vs. 133987505, 0.00%↑) 234 (vs. 209, 11.96%↑)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 18935 (vs. 21729, 12.86%↓) 42840 (vs. 42840, 0.00%) 2824263 (vs. 2824263, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 39564 (vs. 39539, 0.06%↑) 185480 (vs. 185480, 0.00%) 5108871 (vs. 5108871, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 71765 (vs. 66503, 7.91%↑) 47768 (vs. 49416, 3.33%↓) 98403591 (vs. 98399431, 0.00%↑) 728 (vs. 704, 3.41%↑)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 249028 (vs. 239662, 3.91%↑) 2069864 (vs. 2108280, 1.82%↓) 28376967 (vs. 28407303, 0.11%↓) 1126 (vs. 1102, 2.18%↑)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 15702 (vs. 18346, 14.41%↓) 50984 (vs. 50984, 0.00%) 16976263 (vs. 16976263, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 50835 (vs. 54075, 5.99%↓) 217016 (vs. 217016, 0.00%) 3867719 (vs. 3867719, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 18766 (vs. 17782, 5.53%↑) 58936 (vs. 58936, 0.00%) 314759 (vs. 314759, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 52684 (vs. 50592, 4.14%↑) 390748 (vs. 390748, 0.00%) 5314183 (vs. 5314183, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 276747 (vs. 273804, 1.07%↑) 2080380 (vs. 2116860, 1.72%↓) 28387463 (vs. 28415879, 0.10%↓) 1126 (vs. 1102, 2.18%↑)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 20891 (vs. 20758, 0.64%↑) 124428 (vs. 124428, 0.00%) 380231 (vs. 380231, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 60248 (vs. 62561, 3.70%↓) 328664 (vs. 328664, 0.00%) 3979399 (vs. 3979399, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 18594 (vs. 19612, 5.19%↓) 56176 (vs. 56176, 0.00%) 2837573 (vs. 2837573, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 50333 (vs. 48489, 3.80%↑) 30384 (vs. 32912, 7.68%↓) 98386181 (vs. 98382917, 0.00%↑) 728 (vs. 704, 3.41%↑)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 37055 (vs. 38769, 4.42%↓) 20464 (vs. 20320, 0.71%↑) 652753812 (vs. 652747476, 0.00%↑) 258 (vs. 233, 10.73%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 37038 (vs. 40149, 7.75%↓) 9024 (vs. 9024, 0.00%) 652730324 (vs. 652730324, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 270757 (vs. 278851, 2.90%↓) 4219376 (vs. 4388208, 3.85%↓) 30526469 (vs. 30687237, 0.52%↓) 1126 (vs. 1102, 2.18%↑)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 110193 (vs. 108411, 1.64%↑) 749360 (vs. 694752, 7.86%↑) 88784773 (vs. 88728773, 0.06%↑) 272 (vs. 268, 1.49%↑)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 28403 (vs. 25442, 11.64%↑) 50080 (vs. 50080, 0.00%) 2849093 (vs. 2849093, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 86103 (vs. 83710, 2.86%↑) 21136 (vs. 21520, 1.78%↓) 98562565 (vs. 98556933, 0.01%↑) 1786 (vs. 1762, 1.36%↑)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 47009 (vs. 49953, 5.89%↓) 11776 (vs. 11552, 1.94%↑) 992525780 (vs. 992511764, 0.00%↑) 380 (vs. 330, 15.15%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 49320 (vs. 49103, 0.44%↑) 9120 (vs. 9120, 0.00%) 992513044 (vs. 992513044, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 221923 (vs. 219441, 1.13%↑) 1301888 (vs. 1373856, 5.24%↓) 27770053 (vs. 27851973, 0.29%↓) 2184 (vs. 2160, 1.11%↑)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 77907 (vs. 81518, 4.43%↓) 123680 (vs. 121904, 1.46%↑) 88153541 (vs. 88145797, 0.01%↑) 382 (vs. 375, 1.87%↑)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 27936 (vs. 22768, 22.70%↑) 40992 (vs. 40992, 0.00%) 2840005 (vs. 2840005, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 75818 (vs. 82532, 8.14%↓) 21680 (vs. 22064, 1.74%↓) 98563141 (vs. 98557509, 0.01%↑) 1786 (vs. 1762, 1.36%↑)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 42742 (vs. 42794, 0.12%↓) 10944 (vs. 10720, 2.09%↑) 992524948 (vs. 992510932, 0.00%↑) 380 (vs. 330, 15.15%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 41359 (vs. 44950, 7.99%↓) 8688 (vs. 8688, 0.00%) 992512596 (vs. 992512596, 0.00%) 355 (vs. 355, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 218747 (vs. 219618, 0.40%↓) 1309056 (vs. 1380992, 5.21%↓) 27777221 (vs. 27859077, 0.29%↓) 2184 (vs. 2160, 1.11%↑)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 72143 (vs. 72444, 0.42%↓) 126160 (vs. 124400, 1.41%↑) 88155973 (vs. 88148293, 0.01%↑) 382 (vs. 375, 1.87%↑)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1045 (vs. 1241, 15.79%↓) 3872 (vs. 3872, 0.00%) 271673 (vs. 271673, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 1984 (vs. 1901, 4.37%↑) 4480 (vs. 4480, 0.00%) 273605 (vs. 273605, 0.00%) 4 (vs. 4, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1663 (vs. 1498, 11.01%↑) 6496 (vs. 6496, 0.00%) 275653 (vs. 275653, 0.00%) 4 (vs. 4, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 61414 (vs. 63750, 3.66%↓) 229200 (vs. 246320, 6.95%↓) 98586202 (vs. 98597530, 0.01%↓) 728 (vs. 704, 3.41%↑)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 68840 (vs. 76906, 10.49%↓) 229200 (vs. 246320, 6.95%↓) 98586202 (vs. 98597530, 0.01%↓) 728 (vs. 704, 3.41%↑)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 66395 (vs. 70105, 5.29%↓) 134196 (vs. 143012, 6.16%↓) 98490745 (vs. 98493753, 0.00%↓) 728 (vs. 704, 3.41%↑)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 102785 (vs. 106055, 3.08%↓) 3004452 (vs. 3004452, 0.00%) 52898580 (vs. 52898580, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 192933 (vs. 192549, 0.20%↑) 6716612 (vs. 6756400, 0.59%↓) 33024406 (vs. 33056470, 0.10%↓) 1126 (vs. 1102, 2.18%↑)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 66510 (vs. 65713, 1.21%↑) 131572 (vs. 143012, 8.00%↓) 98486009 (vs. 98493881, 0.01%↓) 728 (vs. 704, 3.41%↑)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 85470 (vs. 88539, 3.47%↓) 3008480 (vs. 3008480, 0.00%) 52905492 (vs. 52905492, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 195748 (vs. 190467, 2.77%↑) 6714448 (vs. 6736144, 0.32%↓) 33010838 (vs. 33024150, 0.04%↓) 1126 (vs. 1102, 2.18%↑)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 108980 (vs. 105615, 3.19%↑) 131572 (vs. 143012, 8.00%↓) 99749817 (vs. 99716025, 0.03%↑) 728 (vs. 704, 3.41%↑)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 125504 (vs. 128126, 2.05%↓) 3008480 (vs. 3008480, 0.00%) 54169300 (vs. 54169300, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 299786 (vs. 297955, 0.61%↑) 6714448 (vs. 6736144, 0.32%↓) 34965526 (vs. 34937238, 0.08%↑) 1126 (vs. 1102, 2.18%↑)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 24757 (vs. 21218, 16.68%↑) 201397 (vs. 208629, 3.47%↓) 14209598 (vs. 14213630, 0.03%↓) 187 (vs. 172, 8.72%↑)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 33317 (vs. 33798, 1.42%↓) 307701 (vs. 311349, 1.17%↓) 10550462 (vs. 10551678, 0.01%↓) 220 (vs. 210, 4.76%↑)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment