Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/ffe2f4f0eb7f638ce074b383447572c8 to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/ffe2f4f0eb7f638ce074b383447572c8 to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Data-Tiling Comparison Table

Name No-DT (baseline) DT-Only DT-UK
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 212.634 (1.0X) 136.582 (1.6X) 108.019 (2.0X)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 782.834 (1.0X) 270.840 (2.9X) 222.449 (3.5X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.038 (1.0X) 36.978 (0.9X) 29.898 (1.1X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.083 (1.0X) 9.289 (0.8X) 8.485 (0.8X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 273.012 (1.0X) 259.041 (1.1X) 228.362 (1.2X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.729 (1.0X) 35.964 (1.0X) 34.136 (1.0X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.885 (1.0X) 51.662 (0.5X) 13.072 (2.1X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.802 (1.0X) 10.945 (0.5X) 4.994 (1.2X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 68.878 (1.0X) 39.970 (1.7X) 40.339 (1.7X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.164 (1.0X) 8.502 (1.1X) 8.483 (1.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.410 (1.0X) 42.210 (2.1X) 41.902 (2.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.111 (1.0X) 8.996 (1.2X) 8.986 (1.2X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.032 (1.0X) 78.957 (1.0X) 56.799 (1.4X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.520 (1.0X) 15.447 (0.8X) 13.752 (0.9X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.632 (1.0X) 250.208 (0.7X) 185.065 (1.0X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.114 (1.0X) 65.766 (0.5X) 61.634 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.382 (1.0X) 260.021 (0.7X) 189.807 (1.0X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.943 (1.0X) 66.765 (0.5X) 61.551 (0.6X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 490.261 (1.0X) 1071.706 (0.5X) 213.692 (2.3X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.386 (1.0X) 132.343 (0.5X) 62.172 (1.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.840 (1.0X) 23.079 (1.1X) 17.963 (1.4X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.678 (1.0X) 5.333 (0.9X) 4.541 (1.0X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.088 (1.0X) 15.372 (0.8X) 11.520 (1.0X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.707 (1.0X) 5.385 (0.7X) 4.888 (0.8X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.608 (1.0X) 42.677 (0.5X) 11.896 (1.8X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.884 (1.0X) 9.566 (0.6X) 5.399 (1.1X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.749 (1.0X) 3.344 (0.8X) 2.657 (1.0X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.863 (1.0X) 3.496 (0.8X) 2.810 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.230 (1.0X) 39.355 (0.9X) 31.462 (1.1X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.524 (1.0X) 10.918 (0.8X) 9.856 (0.9X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.697 (1.0X) 1.295 (0.5X) 0.569 (1.2X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.770 (1.0X) 1.379 (0.6X) 0.631 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.846 (1.0X) 24.306 (0.7X) 18.987 (0.9X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.187 (1.0X) 5.938 (0.7X) 5.134 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.572 (1.0X) 7.547 (1.0X) 7.597 (1.0X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 48.713 (1.0X) 84.937 (0.6X) 42.702 (1.1X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.227 (1.0X) 86.346 (0.6X) 43.681 (1.1X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 29.978 (1.0X) 49.751 (0.6X) 27.452 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.272 (1.0X) 22.318 (4.1X) 23.251 (4.0X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 92.264 (1.0X) 23.165 (4.0X) 23.063 (4.0X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 51.929 (1.0X) 22.489 (2.3X) 22.394 (2.3X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 138.634 (1.0X) 28.299 (4.9X) 27.909 (5.0X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 139.972 (1.0X) 30.402 (4.6X) 29.218 (4.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 76.115 (1.0X) 26.977 (2.8X) 27.029 (2.8X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 704.579 (1.0X) 456.507 (1.5X) 347.461 (2.0X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 699.443 (1.0X) 458.146 (1.5X) 354.587 (2.0X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 393.142 (1.0X) 273.676 (1.4X) 217.559 (1.8X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1117.207 (1.0X) 1077.798 (1.0X) 304.056 (3.7X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1117.128 (1.0X) 1055.845 (1.1X) 304.967 (3.7X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 574.393 (1.0X) 583.447 (1.0X) 179.329 (3.2X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2102.566 (1.0X) 1849.042 (1.1X) 295.385 (7.1X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2103.631 (1.0X) 1879.486 (1.1X) 298.229 (7.1X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1122.553 (1.0X) 1070.830 (1.0X) 177.885 (6.3X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.130 (1.0X) 14.438 (0.8X) 1.328 (9.1X)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 103.662 (vs. 92.339, 12.26%↑) 102.683 7.225
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 95.818 (vs. 85.499, 12.07%↑) 96.383 2.410
matmul\_2562x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.534 (vs. 1.369, 12.07%↑) 1.534 0.001
matmul\_123x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.222 (vs. 0.201, 10.61%↑) 0.222 0.001
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 138.634 (vs. 130.283, 6.41%↑) 138.491 0.503
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 76.115 (vs. 72.304, 5.27%↑) 76.095 0.312

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
matmul\_3456x1024x2048\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.130 (vs. 0.165, 21.15%↓) 0.130 0.000
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1077.798 (vs. 1217.547, 11.48%↓) 1078.856 17.231
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1055.845 (vs. 1182.092, 10.68%↓) 1060.264 12.370
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 583.447 (vs. 652.379, 10.57%↓) 588.415 10.953
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 177.885 (vs. 194.974, 8.77%↓) 178.441 3.894
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 42.702 (vs. 46.557, 8.28%↓) 42.739 0.257
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 295.385 (vs. 320.741, 7.91%↓) 296.646 4.127
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1849.042 (vs. 2004.352, 7.75%↓) 1856.732 21.548
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 347.461 (vs. 375.292, 7.42%↓) 349.831 5.329
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1070.830 (vs. 1154.594, 7.25%↓) 1076.398 24.865
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 27.452 (vs. 29.550, 7.10%↓) 27.792 0.791
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 298.229 (vs. 317.530, 6.08%↓) 298.558 2.441
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 14.438 (vs. 15.348, 5.93%↓) 14.479 0.108
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 304.056 (vs. 323.173, 5.92%↓) 304.131 1.037
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 354.587 (vs. 376.801, 5.90%↓) 355.117 4.711
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 179.329 (vs. 190.563, 5.90%↓) 180.465 3.817
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 29.218 (vs. 31.024, 5.82%↓) 29.211 0.703
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 393.142 (vs. 416.804, 5.68%↓) 394.691 9.342
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 27.909 (vs. 29.541, 5.52%↓) 28.167 0.843
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.130 (vs. 12.838, 5.52%↓) 12.154 0.069
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 217.559 (vs. 230.027, 5.42%↓) 219.987 6.324
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 273.676 (vs. 289.196, 5.37%↓) 275.931 7.778
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1879.486 (vs. 1984.119, 5.27%↓) 1879.933 28.877
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 43.681 (vs. 46.084, 5.22%↓) 43.612 0.418

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 15.447 (vs. 14.502, 6.51%↑) 15.394 0.130
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 40.339 (vs. 38.174, 5.67%↑) 39.780 1.447
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.986 (vs. 8.546, 5.14%↑) 8.927 0.129
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 29.978 (vs. 31.553, 4.99%↓) 30.058 0.775
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.032 (vs. 75.301, 4.95%↑) 78.810 1.263
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 458.146 (vs. 481.663, 4.88%↓) 458.829 5.817
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1122.553 (vs. 1179.505, 4.83%↓) 1129.639 19.642
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 41.902 (vs. 39.991, 4.78%↑) 41.076 1.611
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 49.751 (vs. 52.228, 4.74%↓) 50.070 1.396
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 84.937 (vs. 89.141, 4.72%↓) 85.071 1.001
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 26.977 (vs. 28.265, 4.56%↓) 27.088 0.359
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.657 (vs. 2.777, 4.34%↓) 2.654 0.026
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 30.402 (vs. 31.774, 4.32%↓) 30.429 0.717
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.752 (vs. 13.189, 4.27%↑) 13.714 0.152
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 456.507 (vs. 476.711, 4.24%↓) 457.545 6.197
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 574.393 (vs. 599.584, 4.20%↓) 575.740 8.161
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 39.970 (vs. 38.400, 4.09%↑) 39.551 1.223
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 28.299 (vs. 29.430, 3.84%↓) 28.472 0.949
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 304.967 (vs. 316.983, 3.79%↓) 304.269 2.066
matmul\_2564x2564x2564\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.907 (vs. 0.942, 3.71%↓) 0.904 0.008
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 699.443 (vs. 726.162, 3.68%↓) 701.609 6.710
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.230 (vs. 33.982, 3.67%↑) 35.164 0.386
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 78.403 (vs. 75.647, 3.64%↑) 78.232 0.630
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 782.834 (vs. 755.756, 3.58%↑) 758.675 50.872
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.840 (vs. 25.742, 3.50%↓) 24.815 0.311
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.483 (vs. 8.785, 3.44%↓) 8.431 0.084
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 22.394 (vs. 23.185, 3.41%↓) 22.443 0.157
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 260.021 (vs. 251.484, 3.39%↑) 258.350 3.886
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.520 (vs. 12.115, 3.34%↑) 12.316 0.292
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 139.972 (vs. 135.469, 3.32%↑) 139.837 0.520
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.996 (vs. 8.707, 3.32%↑) 8.963 0.090
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 704.579 (vs. 728.132, 3.23%↓) 706.783 10.812
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1.328 (vs. 1.372, 3.21%↓) 1.333 0.014
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.502 (vs. 8.758, 2.93%↓) 8.449 0.112
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 136.582 (vs. 132.752, 2.88%↑) 135.772 1.501
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 78.957 (vs. 76.783, 2.83%↑) 78.225 1.277
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 22.489 (vs. 23.114, 2.70%↓) 22.501 0.192
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.520 (vs. 11.230, 2.58%↑) 11.532 0.072
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.765 (vs. 65.193, 2.41%↑) 66.501 0.550
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 108.019 (vs. 105.562, 2.33%↑) 107.647 2.309
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 23.063 (vs. 23.611, 2.32%↓) 23.026 0.109
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.083 (vs. 6.929, 2.23%↑) 7.075 0.036
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.088 (vs. 11.826, 2.22%↑) 11.961 0.322
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 27.029 (vs. 27.623, 2.15%↓) 27.079 0.356
BertLargeTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 10.657 (vs. 10.888, 2.12%↓) 10.657 0.006
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 23.251 (vs. 22.769, 2.12%↑) 22.433 2.514
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.114 (vs. 33.448, 1.99%↑) 33.519 0.977
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 86.244 (vs. 84.622, 1.92%↑) 86.388 0.803
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 222.449 (vs. 218.324, 1.89%↑) 223.216 2.171
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 73.511 (vs. 74.846, 1.78%↓) 73.358 0.717
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.707 (vs. 3.773, 1.74%↓) 3.699 0.024
matmul\_128x256x8192\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.025 (vs. 0.026, 1.69%↓) 0.025 0.000
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 92.264 (vs. 93.832, 1.67%↓) 92.266 0.088
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 68.878 (vs. 70.006, 1.61%↓) 68.319 2.755
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.749 (vs. 2.793, 1.58%↓) 2.744 0.037
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.227 (vs. 51.020, 1.55%↓) 50.088 0.596
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.896 (vs. 11.735, 1.37%↑) 11.898 0.083
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.846 (vs. 17.613, 1.32%↑) 17.830 0.094
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 56.799 (vs. 56.067, 1.30%↑) 56.149 1.066
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 80.627 (vs. 79.592, 1.30%↑) 80.438 0.788
matmul\_2560x2560x2560\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.303 (vs. 0.299, 1.27%↑) 0.303 0.000
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 86.346 (vs. 87.440, 1.25%↓) 86.203 1.547
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 51.929 (vs. 52.585, 1.25%↓) 51.861 0.304
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 87.410 (vs. 88.476, 1.21%↓) 87.102 3.605
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 42.210 (vs. 41.709, 1.20%↑) 41.570 1.456
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 18.987 (vs. 18.763, 1.20%↑) 18.977 0.113
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2102.566 (vs. 2127.281, 1.16%↓) 2102.283 2.234
EfficientNetV2STF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 4.292 (vs. 4.244, 1.13%↑) 4.262 0.093
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.187 (vs. 4.141, 1.09%↑) 4.186 0.029
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 3.344 (vs. 3.380, 1.09%↓) 3.344 0.012
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 212.634 (vs. 214.928, 1.07%↓) 209.196 11.528
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.678 (vs. 4.629, 1.06%↑) 4.673 0.039
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.943 (vs. 33.591, 1.05%↑) 33.375 0.908
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.634 (vs. 61.004, 1.03%↑) 61.300 0.636
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 23.079 (vs. 22.847, 1.01%↑) 23.009 0.203
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2103.631 (vs. 2125.148, 1.01%↓) 2103.162 2.321
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 15.372 (vs. 15.223, 0.98%↑) 15.358 0.114
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 65.766 (vs. 65.132, 0.97%↑) 65.456 0.556
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 250.208 (vs. 247.807, 0.97%↑) 248.159 3.864
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.289 (vs. 9.379, 0.96%↓) 9.273 0.035
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 39.355 (vs. 38.986, 0.95%↑) 39.147 0.515
matmul\_128x256x8192\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.047 (vs. 0.047, 0.91%↓) 0.047 0.000
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 31.462 (vs. 31.183, 0.90%↑) 31.366 0.345
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 270.840 (vs. 268.489, 0.88%↑) 271.237 1.685
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 48.713 (vs. 49.139, 0.87%↓) 49.160 0.829
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 24.306 (vs. 24.098, 0.86%↑) 24.295 0.080
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.810 (vs. 2.833, 0.83%↓) 2.798 0.038
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.485 (vs. 8.555, 0.82%↓) 8.485 0.025
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.164 (vs. 9.232, 0.74%↓) 8.912 0.445
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.964 (vs. 36.221, 0.71%↓) 35.772 0.383
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.333 (vs. 5.299, 0.63%↑) 5.330 0.027
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.072 (vs. 12.997, 0.58%↑) 13.071 0.044
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.547 (vs. 7.591, 0.58%↓) 7.565 0.055
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.632 (vs. 179.600, 0.57%↑) 179.602 2.479
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.888 (vs. 4.914, 0.53%↓) 4.883 0.023
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 23.165 (vs. 23.045, 0.52%↑) 22.971 0.458
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.134 (vs. 5.160, 0.52%↓) 5.129 0.023
matmul\_2562x2564x2562\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.110 (vs. 1.116, 0.52%↓) 1.108 0.006
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.863 (vs. 2.848, 0.50%↑) 2.859 0.011
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.884 (vs. 5.855, 0.50%↑) 5.879 0.023
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.272 (vs. 92.716, 0.48%↓) 92.265 0.184
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 124.547 (vs. 125.131, 0.47%↓) 124.649 0.350
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 35.729 (vs. 35.885, 0.44%↓) 35.275 0.818
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1071.706 (vs. 1067.451, 0.40%↑) 1070.888 3.804
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 26.885 (vs. 26.780, 0.39%↑) 26.868 0.148
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.963 (vs. 17.895, 0.38%↑) 17.911 0.162
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.524 (vs. 8.493, 0.37%↑) 8.513 0.042
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.608 (vs. 21.531, 0.36%↑) 21.635 0.061
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.399 (vs. 5.417, 0.33%↓) 5.398 0.015
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 66.386 (vs. 66.604, 0.33%↓) 66.143 0.467
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 189.807 (vs. 189.192, 0.33%↑) 188.286 3.055
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 62.172 (vs. 61.980, 0.31%↑) 62.028 0.327
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.385 (vs. 5.369, 0.31%↑) 5.383 0.021
matmul\_3456x1024x2048\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.063 (vs. 0.063, 0.29%↑) 0.063 0.000
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 36.978 (vs. 36.878, 0.27%↑) 36.879 0.450
MiniLML12H384Uncased(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.607 (vs. 1.603, 0.27%↑) 1.607 0.003
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.038 (vs. 31.953, 0.27%↑) 32.024 0.116
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 22.318 (vs. 22.266, 0.24%↑) 22.271 0.111
matmul\_2560x2560x2560\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.141 (vs. 0.141, 0.23%↓) 0.141 0.000
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.136 (vs. 34.212, 0.22%↓) 33.975 0.338
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 273.012 (vs. 273.621, 0.22%↓) 271.451 3.634
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 97.886 (vs. 97.669, 0.22%↑) 97.899 0.279
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 51.662 (vs. 51.557, 0.20%↑) 51.618 0.102
BertForMaskedLMTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 6.942 (vs. 6.928, 0.20%↑) 6.901 0.132
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1807.801 (vs. 1804.186, 0.20%↑) 1807.855 3.000
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.541 (vs. 4.532, 0.19%↑) 4.535 0.017
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.994 (vs. 5.003, 0.17%↓) 4.993 0.008
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 259.041 (vs. 258.601, 0.17%↑) 256.468 4.162
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 42.677 (vs. 42.744, 0.16%↓) 42.658 0.097
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.569 (vs. 0.569, 0.14%↑) 0.569 0.003
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.631 (vs. 0.632, 0.14%↓) 0.631 0.001
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.802 (vs. 5.810, 0.14%↓) 5.801 0.015
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1117.207 (vs. 1118.710, 0.13%↓) 1116.966 1.714
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 61.551 (vs. 61.474, 0.13%↑) 61.357 0.450
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.111 (vs. 11.125, 0.12%↓) 10.727 0.609
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1.379 (vs. 1.380, 0.12%↓) 1.379 0.003
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.697 (vs. 0.698, 0.12%↓) 0.698 0.001
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.566 (vs. 9.577, 0.12%↓) 9.558 0.033
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7892.110 (vs. 7901.133, 0.11%↓) 7891.911 0.561
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.938 (vs. 5.931, 0.11%↑) 5.939 0.018
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 260.214 (vs. 260.498, 0.11%↓) 259.928 1.050
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.496 (vs. 3.499, 0.11%↓) 3.481 0.034
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.856 (vs. 9.866, 0.10%↓) 9.858 0.035
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 490.261 (vs. 489.791, 0.10%↑) 488.847 2.312
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 185.065 (vs. 184.891, 0.09%↑) 183.845 2.956
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.770 (vs. 0.769, 0.09%↑) 0.771 0.004
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.572 (vs. 7.579, 0.09%↓) 7.576 0.015
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 260.168 (vs. 260.390, 0.09%↓) 260.323 0.990
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 213.692 (vs. 213.868, 0.08%↓) 213.090 1.307
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.597 (vs. 7.591, 0.08%↑) 7.584 0.031
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 10.918 (vs. 10.926, 0.07%↓) 10.926 0.053
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 29.898 (vs. 29.882, 0.05%↑) 29.798 0.201
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 105.781 (vs. 105.837, 0.05%↓) 105.743 0.303
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 228.362 (vs. 228.459, 0.04%↓) 226.124 4.734
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 10.945 (vs. 10.949, 0.04%↓) 10.941 0.020
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1117.128 (vs. 1116.859, 0.02%↑) 1116.278 1.550
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.382 (vs. 180.340, 0.02%↑) 179.512 2.156
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 132.343 (vs. 132.324, 0.01%↑) 131.956 0.982
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 1.295 (vs. 1.295, 0.00%↑) 1.295 0.002

Improved Total Dispatch Sizes 🎉

Benchmark Name Total Dispatch Size (bytes)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 11392 (vs. 12864, 11.44%↓)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 11280 (vs. 12336, 8.56%↓)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 18224 (vs. 19328, 5.71%↓)
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 33072 (vs. 34928, 5.31%↓)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 37328 (vs. 39360, 5.16%↓)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 41632 (vs. 43840, 5.04%↓)

Regressed Stream IR Dispatch Count (# of cmd.dispatch ops) 🚩

Benchmark Name Stream IR Dispatch Count (# of cmd.dispatch ops)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 330 (vs. 318, 3.77%↑)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 330 (vs. 318, 3.77%↑)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] 330 (vs. 318, 3.77%↑)
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only,compile-stats] 330 (vs. 318, 3.77%↑)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 678 (vs. 654, 3.67%↑)
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 678 (vs. 654, 3.67%↑)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 342 (vs. 330, 3.64%↑)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 342 (vs. 330, 3.64%↑)
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 346 (vs. 334, 3.59%↑)
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 346 (vs. 334, 3.59%↑)

Improved Stream IR Dispatch Count (# of cmd.dispatch ops) 🎉

Benchmark Name Stream IR Dispatch Count (# of cmd.dispatch ops)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,dt-only,compile-stats] 355 (vs. 367, 3.27%↓)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk,compile-stats] 355 (vs. 367, 3.27%↓)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] 355 (vs. 367, 3.27%↓)
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only,compile-stats] 355 (vs. 367, 3.27%↓)
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk,compile-stats] 375 (vs. 386, 2.85%↓)
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only,compile-stats] 375 (vs. 386, 2.85%↓)

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1532 (vs. 1462, 4.79%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 25239 (vs. 24013, 5.11%↑) 144032 (vs. 144032, 0.00%) 399877 (vs. 399877, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 41307 (vs. 43777, 5.64%↓) 238720 (vs. 238720, 0.00%) 10458245 (vs. 10458245, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 37167 (vs. 37437, 0.72%↓) 177680 (vs. 177680, 0.00%) 2959045 (vs. 2959045, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 65393 (vs. 65342, 0.08%↑) 680016 (vs. 680000, 0.00%↑) 5603397 (vs. 5603397, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 23953 (vs. 27859, 14.02%↓) 174400 (vs. 174400, 0.00%) 17094405 (vs. 17094405, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 37701 (vs. 35600, 5.90%↑) 190096 (vs. 190096, 0.00%) 14173189 (vs. 14173189, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 73809 (vs. 75837, 2.67%↓) 569008 (vs. 568928, 0.01%↑) 4219717 (vs. 4219653, 0.00%↑) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 53685 (vs. 49933, 7.51%↑) 287600 (vs. 287600, 0.00%) 18229253 (vs. 18229253, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 22863 (vs. 23884, 4.27%↓) 142288 (vs. 142288, 0.00%) 5195653 (vs. 5195653, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 77465 (vs. 75099, 3.15%↑) 84496 (vs. 84496, 0.00%) 99926661 (vs. 99926661, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 77041 (vs. 77539, 0.64%↓) 93056 (vs. 93056, 0.00%) 98443077 (vs. 98443077, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 298077 (vs. 292685, 1.84%↑) 5843296 (vs. 5843296, 0.00%) 32142341 (vs. 32142341, 0.00%) 1102 (vs. 1102, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 112502 (vs. 114259, 1.54%↓) 216304 (vs. 216304, 0.00%) 164497708 (vs. 164497708, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 32318 (vs. 29166, 10.81%↑) 59344 (vs. 59344, 0.00%) 133996207 (vs. 133996207, 0.00%) 209 (vs. 209, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 41202 (vs. 41406, 0.49%↓) 27920 (vs. 27920, 0.00%) 652755092 (vs. 652755092, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35567 (vs. 38681, 8.05%↓) 14736 (vs. 14736, 0.00%) 652736084 (vs. 652736084, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 40057 (vs. 39740, 0.80%↑) 68608 (vs. 68608, 0.00%) 533841087 (vs. 533841087, 0.00%) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 37910 (vs. 36709, 3.27%↑) 50128 (vs. 50128, 0.00%) 1336025023 (vs. 1336025023, 0.00%) 413 (vs. 413, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1519 (vs. 1326, 14.56%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 24347 (vs. 25159, 3.23%↓) 107248 (vs. 107248, 0.00%) 371397 (vs. 371397, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 34457 (vs. 34080, 1.11%↑) 96944 (vs. 96944, 0.00%) 10398725 (vs. 10398725, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 35035 (vs. 30261, 15.78%↑) 113392 (vs. 113392, 0.00%) 2921349 (vs. 2921349, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 52756 (vs. 51386, 2.67%↑) 269120 (vs. 269104, 0.01%↑) 5219205 (vs. 5219205, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 20748 (vs. 20528, 1.07%↑) 60288 (vs. 60288, 0.00%) 17017925 (vs. 17017925, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 35495 (vs. 31390, 13.08%↑) 95248 (vs. 95248, 0.00%) 14133765 (vs. 14133765, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 59703 (vs. 59396, 0.52%↑) 330176 (vs. 330048, 0.04%↑) 4003781 (vs. 4003653, 0.00%↑) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 51892 (vs. 41008, 26.54%↑) 136080 (vs. 136080, 0.00%) 18361413 (vs. 18361413, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 18874 (vs. 19484, 3.13%↓) 44208 (vs. 44208, 0.00%) 5147973 (vs. 5147973, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 98699 (vs. 95045, 3.84%↑) 53488 (vs. 52400, 2.08%↑) 100084229 (vs. 100083141, 0.00%↑) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 93013 (vs. 92168, 0.92%↑) 53920 (vs. 52832, 2.06%↑) 98589445 (vs. 98588357, 0.00%↑) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 265800 (vs. 255258, 4.13%↑) 2610608 (vs. 2610240, 0.01%↑) 29088709 (vs. 29088325, 0.00%↑) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 122804 (vs. 118362, 3.75%↑) 154864 (vs. 154864, 0.00%) 169910060 (vs. 169910060, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 38798 (vs. 41960, 7.54%↓) 36480 (vs. 37936, 3.84%↓) 219477487 (vs. 219509935, 0.01%↓) 342 (vs. 330, 3.64%↑)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 53745 (vs. 48592, 10.60%↑) 18224 (vs. 19328, 5.71%↓) 992542996 (vs. 992542612, 0.00%↑) 330 (vs. 318, 3.77%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 46189 (vs. 43282, 6.72%↑) 11392 (vs. 12864, 11.44%↓) 992539860 (vs. 992543572, 0.00%↓) 355 (vs. 367, 3.27%↓)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 51599 (vs. 51201, 0.78%↑) 38336 (vs. 40144, 4.50%↓) 875869119 (vs. 875939327, 0.01%↓) 346 (vs. 334, 3.59%↑)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 60034 (vs. 60359, 0.54%↓) 41632 (vs. 43840, 5.04%↓) 1336061759 (vs. 1336053183, 0.00%↑) 678 (vs. 654, 3.67%↑)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1553 (vs. 1466, 5.93%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 23805 (vs. 24963, 4.64%↓) 99488 (vs. 99488, 0.00%) 363653 (vs. 363653, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 36879 (vs. 34030, 8.37%↑) 98656 (vs. 98656, 0.00%) 10400453 (vs. 10400453, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 35374 (vs. 33843, 4.52%↑) 117504 (vs. 117504, 0.00%) 2925445 (vs. 2925445, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 55730 (vs. 54062, 3.09%↑) 257968 (vs. 257952, 0.01%↑) 5208069 (vs. 5208069, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 23545 (vs. 23060, 2.10%↑) 65376 (vs. 65376, 0.00%) 17023045 (vs. 17023045, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 34100 (vs. 35108, 2.87%↓) 99104 (vs. 99104, 0.00%) 14137669 (vs. 14137669, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 62806 (vs. 59508, 5.54%↑) 323920 (vs. 323792, 0.04%↑) 3997509 (vs. 3997381, 0.00%↑) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 45886 (vs. 45016, 1.93%↑) 122656 (vs. 122656, 0.00%) 18348037 (vs. 18348037, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 19214 (vs. 19518, 1.56%↓) 47200 (vs. 47200, 0.00%) 5150981 (vs. 5150981, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 98941 (vs. 96476, 2.56%↑) 39040 (vs. 38880, 0.41%↑) 100069765 (vs. 100069637, 0.00%↑) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 94139 (vs. 92699, 1.55%↑) 39456 (vs. 39296, 0.41%↑) 98574981 (vs. 98574789, 0.00%↑) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 252409 (vs. 245264, 2.91%↑) 2600320 (vs. 2599872, 0.02%↑) 29078405 (vs. 29077957, 0.00%↑) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 121029 (vs. 117415, 3.08%↑) 140240 (vs. 140240, 0.00%) 169895468 (vs. 169895468, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 40953 (vs. 39123, 4.68%↑) 32480 (vs. 33904, 4.20%↓) 219473455 (vs. 219505903, 0.01%↓) 342 (vs. 330, 3.64%↑)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 52850 (vs. 50443, 4.77%↑) 18784 (vs. 18640, 0.77%↑) 992543572 (vs. 992541972, 0.00%↑) 330 (vs. 318, 3.77%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 52876 (vs. 50131, 5.48%↑) 11280 (vs. 12336, 8.56%↓) 992539796 (vs. 992543060, 0.00%↓) 355 (vs. 367, 3.27%↓)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 54280 (vs. 51471, 5.46%↑) 33072 (vs. 34928, 5.31%↓) 875863871 (vs. 875934143, 0.01%↓) 346 (vs. 334, 3.59%↑)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 62509 (vs. 63457, 1.49%↓) 37328 (vs. 39360, 5.16%↓) 1336057407 (vs. 1336048703, 0.00%↑) 678 (vs. 654, 3.67%↑)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 103467 (vs. 96277, 7.47%↑) 843796 (vs. 844192, 0.05%↓) 165158548 (vs. 165158932, 0.00%↓) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 26526 (vs. 25247, 5.07%↑) 145080 (vs. 145212, 0.09%↓) 134088303 (vs. 134088431, 0.00%↓) 209 (vs. 209, 0.00%)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 38875 (vs. 37263, 4.33%↑) 227480 (vs. 227968, 0.21%↓) 534005410 (vs. 534005858, 0.00%↓) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 36872 (vs. 35905, 2.69%↑) 130704 (vs. 130852, 0.11%↓) 1336110827 (vs. 1336110955, 0.00%↓) 413 (vs. 413, 0.00%)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1674 (vs. 1629, 2.76%↑) 30556 (vs. 30540, 0.05%↑) 41279 (vs. 41263, 0.04%↑) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2039 (vs. 1920, 6.20%↑) 45028 (vs. 43516, 3.47%↑) 55751 (vs. 54239, 2.79%↑) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1798 (vs. 1611, 11.61%↑) 28524 (vs. 28492, 0.11%↑) 39183 (vs. 39151, 0.08%↑) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1910 (vs. 1755, 8.83%↑) 41404 (vs. 39892, 3.79%↑) 52063 (vs. 50551, 2.99%↑) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2282 (vs. 2292, 0.44%↓) 86844 (vs. 85344, 1.76%↑) 97503 (vs. 96003, 1.56%↑) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2284 (vs. 2118, 7.84%↑) 89724 (vs. 88224, 1.70%↑) 100447 (vs. 98947, 1.52%↑) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2805 (vs. 2585, 8.51%↑) 84288 (vs. 82792, 1.81%↑) 95011 (vs. 93515, 1.60%↑) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2125 (vs. 1907, 11.43%↑) 51328 (vs. 49772, 3.13%↑) 62050 (vs. 60494, 2.57%↑) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 1005 (vs. 904, 11.17%↑) 8828 (vs. 8828, 0.00%) 26181 (vs. 26181, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 964 (vs. 913, 5.59%↑) 9784 (vs. 9784, 0.00%) 27137 (vs. 27137, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 37487 (vs. 31853, 17.69%↑) 50648 (vs. 50552, 0.19%↑) 133987505 (vs. 133987441, 0.00%↑) 209 (vs. 209, 0.00%)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 22239 (vs. 20586, 8.03%↑) 42840 (vs. 42840, 0.00%) 2824263 (vs. 2824263, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 42331 (vs. 43312, 2.26%↓) 185480 (vs. 185480, 0.00%) 5108871 (vs. 5108871, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 77723 (vs. 74032, 4.99%↑) 49416 (vs. 49416, 0.00%) 98399431 (vs. 98399431, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 259500 (vs. 255683, 1.49%↑) 2108280 (vs. 2108296, 0.00%↓) 28407303 (vs. 28407303, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 18623 (vs. 20198, 7.80%↓) 50984 (vs. 50984, 0.00%) 16976263 (vs. 16976263, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 58882 (vs. 59840, 1.60%↓) 217016 (vs. 217032, 0.01%↓) 3867719 (vs. 3867719, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 19757 (vs. 21361, 7.51%↓) 58936 (vs. 58936, 0.00%) 314759 (vs. 314759, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 56698 (vs. 55396, 2.35%↑) 390748 (vs. 390748, 0.00%) 5314183 (vs. 5314183, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 297818 (vs. 289497, 2.87%↑) 2116860 (vs. 2116860, 0.00%) 28415879 (vs. 28415879, 0.00%) 1102 (vs. 1102, 0.00%)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 22032 (vs. 21650, 1.76%↑) 124428 (vs. 124428, 0.00%) 380231 (vs. 380231, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 65721 (vs. 64271, 2.26%↑) 328664 (vs. 328664, 0.00%) 3979399 (vs. 3979399, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 21712 (vs. 21768, 0.26%↓) 56176 (vs. 56272, 0.17%↓) 2837573 (vs. 2837637, 0.00%↓) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 52178 (vs. 49651, 5.09%↑) 32912 (vs. 32912, 0.00%) 98382917 (vs. 98382917, 0.00%) 704 (vs. 704, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 39296 (vs. 35564, 10.49%↑) 20320 (vs. 20320, 0.00%) 652747476 (vs. 652747476, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 39567 (vs. 35153, 12.56%↑) 9024 (vs. 9024, 0.00%) 652730324 (vs. 652730324, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 287347 (vs. 292493, 1.76%↓) 4387664 (vs. 4387712, 0.00%↓) 30686661 (vs. 30686725, 0.00%↓) 1102 (vs. 1102, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 116316 (vs. 109850, 5.89%↑) 694752 (vs. 694624, 0.02%↑) 88728773 (vs. 88728645, 0.00%↑) 268 (vs. 268, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 28875 (vs. 25208, 14.55%↑) 50080 (vs. 49984, 0.19%↑) 2849093 (vs. 2848965, 0.00%↑) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 94248 (vs. 92322, 2.09%↑) 21520 (vs. 21600, 0.37%↓) 98556933 (vs. 98557061, 0.00%↓) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 48635 (vs. 48325, 0.64%↑) 11552 (vs. 11040, 4.64%↑) 992511764 (vs. 992509844, 0.00%↑) 330 (vs. 318, 3.77%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 50435 (vs. 47455, 6.28%↑) 9120 (vs. 9232, 1.21%↓) 992513044 (vs. 992515476, 0.00%↓) 355 (vs. 367, 3.27%↓)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 236176 (vs. 226749, 4.16%↑) 1373344 (vs. 1372976, 0.03%↑) 27851461 (vs. 27851077, 0.00%↑) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 90064 (vs. 88416, 1.86%↑) 121904 (vs. 125600, 2.94%↓) 88145797 (vs. 88155333, 0.01%↓) 375 (vs. 386, 2.85%↓)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 24363 (vs. 23501, 3.67%↑) 40992 (vs. 41008, 0.04%↓) 2840005 (vs. 2840005, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 86185 (vs. 84423, 2.09%↑) 22064 (vs. 22096, 0.14%↓) 98557509 (vs. 98557509, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 48017 (vs. 48280, 0.54%↓) 10720 (vs. 10640, 0.75%↑) 992510932 (vs. 992509460, 0.00%↑) 330 (vs. 318, 3.77%↑)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 44705 (vs. 43069, 3.80%↑) 8688 (vs. 8928, 2.69%↓) 992512596 (vs. 992515156, 0.00%↓) 355 (vs. 367, 3.27%↓)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 235048 (vs. 226208, 3.91%↑) 1380480 (vs. 1379904, 0.04%↑) 27858565 (vs. 27857989, 0.00%↑) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 79590 (vs. 79086, 0.64%↑) 124400 (vs. 128848, 3.45%↓) 88148293 (vs. 88158533, 0.01%↓) 375 (vs. 386, 2.85%↓)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1261 (vs. 1093, 15.37%↑) 3872 (vs. 3872, 0.00%) 271673 (vs. 271673, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 2102 (vs. 1889, 11.28%↑) 4480 (vs. 4480, 0.00%) 273605 (vs. 273605, 0.00%) 4 (vs. 4, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1794 (vs. 1466, 22.37%↑) 6496 (vs. 6512, 0.25%↓) 275653 (vs. 275653, 0.00%) 4 (vs. 4, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 73905 (vs. 69095, 6.96%↑) 246320 (vs. 246320, 0.00%) 98597530 (vs. 98597530, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 76980 (vs. 75406, 2.09%↑) 246320 (vs. 246320, 0.00%) 98597530 (vs. 98597530, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 73552 (vs. 73482, 0.10%↑) 143012 (vs. 143012, 0.00%) 98493753 (vs. 98493753, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 108124 (vs. 131849, 17.99%↓) 3004452 (vs. 3004452, 0.00%) 52898580 (vs. 52898580, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 208691 (vs. 201195, 3.73%↑) 6756400 (vs. 6756400, 0.00%) 33056470 (vs. 33056470, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 75803 (vs. 71805, 5.57%↑) 143012 (vs. 143012, 0.00%) 98493881 (vs. 98493881, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 97590 (vs. 94120, 3.69%↑) 3008480 (vs. 3008480, 0.00%) 52905492 (vs. 52905492, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 207446 (vs. 195492, 6.11%↑) 6736144 (vs. 6736144, 0.00%) 33024150 (vs. 33024150, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 118895 (vs. 115150, 3.25%↑) 143012 (vs. 143012, 0.00%) 99716025 (vs. 99716025, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 135091 (vs. 131684, 2.59%↑) 3008480 (vs. 3008480, 0.00%) 54169300 (vs. 54169300, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 314304 (vs. 305824, 2.77%↑) 6736144 (vs. 6736144, 0.00%) 34937238 (vs. 34937238, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 24465 (vs. 22634, 8.09%↑) 208629 (vs. 208629, 0.00%) 14213630 (vs. 14213630, 0.00%) 172 (vs. 172, 0.00%)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 37050 (vs. 34814, 6.42%↑) 311349 (vs. 311349, 0.00%) 10551678 (vs. 10551678, 0.00%) 210 (vs. 210, 0.00%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment