Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save iree-github-actions-bot/e0550f6a4f97c22b27f1d01789e4103d to your computer and use it in GitHub Desktop.
Save iree-github-actions-bot/e0550f6a4f97c22b27f1d01789e4103d to your computer and use it in GitHub Desktop.

Full Benchmark Summary

Data-Tiling Comparison Table

Name No-DT (baseline) DT-Only DT-UK
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 229.620 (1.0X) N/A 113.440 (2.0X)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 718.911 (1.0X) N/A 231.435 (3.1X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.042 (1.0X) N/A 8.602 (0.8X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.121 (1.0X) N/A 33.562 (1.0X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.528 (1.0X) N/A 33.727 (1.0X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 266.210 (1.0X) N/A 233.687 (1.1X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.904 (1.0X) N/A 5.290 (1.1X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 28.616 (1.0X) N/A 15.424 (1.9X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.909 (1.0X) N/A 8.533 (1.0X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.081 (1.0X) N/A 39.435 (1.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 10.532 (1.0X) N/A 8.255 (1.3X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 88.751 (1.0X) N/A 41.536 (2.1X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.092 (1.0X) N/A 12.987 (0.9X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.397 (1.0X) N/A 62.312 (1.3X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.744 (1.0X) N/A 57.934 (0.6X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.842 (1.0X) N/A 187.279 (1.0X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.434 (1.0X) N/A 58.462 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.228 (1.0X) N/A 192.676 (0.9X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 60.298 (1.0X) N/A 63.623 (0.9X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 481.252 (1.0X) N/A 213.240 (2.3X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.889 (1.0X) N/A 4.520 (1.1X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 27.250 (1.0X) N/A 18.371 (1.5X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.680 (1.0X) N/A 4.927 (0.7X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.800 (1.0X) N/A 12.382 (1.0X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.762 (1.0X) N/A 5.600 (1.0X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.470 (1.0X) N/A 13.936 (1.5X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.901 (1.0X) N/A 3.234 (0.9X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.764 (1.0X) N/A 3.130 (0.9X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.463 (1.0X) N/A 9.582 (0.9X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.182 (1.0X) N/A 32.811 (1.0X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.775 (1.0X) N/A 0.660 (1.2X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.708 (1.0X) N/A 0.596 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.114 (1.0X) N/A 5.276 (0.8X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.588 (1.0X) N/A 20.969 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.594 (1.0X) N/A 7.592 (1.0X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 49.331 (1.0X) N/A 77.538 (0.6X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.491 (1.0X) N/A 78.137 (0.6X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 30.345 (1.0X) N/A 46.240 (0.7X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.677 (1.0X) N/A 21.035 (4.4X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.827 (1.0X) N/A 21.949 (4.3X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.577 (1.0X) N/A 21.812 (2.4X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 133.041 (1.0X) N/A 27.282 (4.9X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 139.479 (1.0X) N/A 28.952 (4.8X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 75.378 (1.0X) N/A 26.234 (2.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 690.395 (1.0X) N/A 365.637 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 701.393 (1.0X) N/A 363.567 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 395.483 (1.0X) N/A 223.156 (1.8X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1046.193 (1.0X) N/A 257.241 (4.1X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1047.262 (1.0X) N/A 257.292 (4.1X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 543.116 (1.0X) N/A 151.883 (3.6X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2098.163 (1.0X) N/A 303.333 (6.9X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2099.427 (1.0X) N/A 307.269 (6.8X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1118.831 (1.0X) N/A 182.836 (6.1X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.169 (1.0X) N/A 1.438 (8.5X)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 105.460 (vs. 96.287, 9.53%↑) 107.809 5.163
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 139.479 (vs. 130.412, 6.95%↑) 139.381 0.475

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 81.551 (vs. 95.033, 14.19%↓) 81.556 0.977
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 69.876 (vs. 76.311, 8.43%↓) 69.706 0.753
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 28.952 (vs. 30.716, 5.74%↓) 29.174 0.704
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 27.282 (vs. 28.893, 5.58%↓) 27.617 0.949

Similar Latencies

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 182.836 (vs. 191.845, 4.70%↓) 183.871 4.586
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.889 (vs. 4.683, 4.41%↑) 4.884 0.035
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 26.234 (vs. 27.437, 4.38%↓) 26.379 0.347
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 151.883 (vs. 158.759, 4.33%↓) 153.375 3.620
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 39.435 (vs. 41.127, 4.11%↓) 38.972 1.304
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 77.538 (vs. 80.778, 4.01%↓) 77.810 0.964
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 363.567 (vs. 378.375, 3.91%↓) 364.432 3.810
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 21.949 (vs. 22.760, 3.56%↓) 21.834 0.271
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 94.776 (vs. 98.243, 3.53%↓) 94.732 0.279
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 21.812 (vs. 22.595, 3.47%↓) 21.847 0.192
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1.438 (vs. 1.489, 3.45%↓) 1.442 0.013
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 21.035 (vs. 21.763, 3.34%↓) 21.024 0.046
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.169 (vs. 12.589, 3.33%↓) 12.243 0.136
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 395.483 (vs. 409.089, 3.33%↓) 396.225 9.189
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.434 (vs. 33.368, 3.20%↑) 34.043 0.841
MobileBertSquad\_int8(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 72.218 (vs. 74.515, 3.08%↓) 72.205 0.277
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 30.345 (vs. 31.300, 3.05%↓) 30.566 0.753
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 75.378 (vs. 77.669, 2.95%↓) 75.307 0.294
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 365.637 (vs. 376.293, 2.83%↓) 366.757 5.665
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 223.156 (vs. 229.384, 2.72%↓) 225.484 6.509
MiniLML12H384Uncased(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.659 (vs. 1.616, 2.70%↑) 1.659 0.011
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 543.116 (vs. 558.128, 2.69%↓) 544.213 6.427
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 102.944 (vs. 105.736, 2.64%↓) 103.057 0.405
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 257.292 (vs. 264.235, 2.63%↓) 256.497 2.611
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 27.250 (vs. 26.558, 2.61%↑) 27.090 0.495
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1118.831 (vs. 1147.936, 2.54%↓) 1121.783 20.229
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 257.241 (vs. 263.751, 2.47%↓) 257.476 1.442
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.463 (vs. 8.274, 2.28%↑) 8.462 0.070
matmul\_128x256x8192\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.026 (vs. 0.026, 2.25%↑) 0.026 0.000
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 50.491 (vs. 51.627, 2.20%↓) 50.605 0.452
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.744 (vs. 34.008, 2.16%↑) 34.403 0.723
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 83.201 (vs. 85.020, 2.14%↓) 83.226 0.199
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 133.041 (vs. 135.936, 2.13%↓) 132.978 0.292
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 303.333 (vs. 309.762, 2.08%↓) 304.195 3.613
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 307.269 (vs. 313.720, 2.06%↓) 306.849 3.016
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.588 (vs. 17.948, 2.00%↓) 17.583 0.076
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.764 (vs. 2.819, 1.93%↓) 2.762 0.011
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.562 (vs. 32.949, 1.86%↑) 33.300 0.529
BertForMaskedLMTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 7.064 (vs. 6.937, 1.84%↑) 7.031 0.094
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 46.240 (vs. 47.104, 1.83%↓) 46.404 0.723
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 41.536 (vs. 42.311, 1.83%↓) 41.002 1.624
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 3.130 (vs. 3.077, 1.73%↑) 3.129 0.007
EfficientNetV2STF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 4.361 (vs. 4.289, 1.67%↑) 4.350 0.059
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 78.137 (vs. 79.444, 1.65%↓) 78.348 1.007
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.520 (vs. 4.449, 1.61%↑) 4.517 0.019
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.234 (vs. 3.183, 1.58%↑) 3.226 0.024
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.927 (vs. 4.851, 1.57%↑) 4.933 0.022
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.800 (vs. 11.631, 1.46%↑) 11.715 0.230
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.042 (vs. 6.943, 1.43%↑) 7.046 0.027
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.092 (vs. 12.265, 1.41%↓) 11.935 0.287
BertLargeTF(stablehlo) [cuda-sm\_80-linux\_gnu-cuda][default-flags] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 10.831 (vs. 10.681, 1.41%↑) 10.824 0.021
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 701.393 (vs. 711.071, 1.36%↓) 701.261 10.810
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 62.312 (vs. 63.166, 1.35%↓) 62.057 0.963
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 231.435 (vs. 234.482, 1.30%↓) 229.240 6.487
MobileBertSquad\_fp32(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 122.787 (vs. 124.397, 1.29%↓) 122.803 0.306
MobileNetV1\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 18.371 (vs. 18.140, 1.28%↑) 18.309 0.214
matmul\_128x256x8192\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul,splitk] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.048 (vs. 0.047, 1.24%↑) 0.048 0.000
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.528 (vs. 34.117, 1.20%↑) 33.983 0.869
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.600 (vs. 5.534, 1.20%↑) 5.608 0.035
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16] vulkan(none)[full-inference,experimental-flags] with default @ pixel-6-pro[gpu] 78.915 (vs. 79.836, 1.15%↓) 78.841 0.608
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.987 (vs. 12.845, 1.11%↑) 13.021 0.160
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 15.424 (vs. 15.257, 1.10%↑) 15.402 0.050
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 20.969 (vs. 21.187, 1.03%↓) 20.932 0.091
matmul\_3456x1024x2048\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.064 (vs. 0.063, 1.03%↑) 0.064 0.000
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.114 (vs. 4.155, 0.98%↓) 4.121 0.024
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 690.395 (vs. 696.265, 0.84%↓) 694.691 9.538
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 259.156 (vs. 261.235, 0.80%↓) 259.292 0.905
BertLargeTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 718.911 (vs. 724.560, 0.78%↓) 719.070 42.011
matmul\_2560x2560x2560\_f16t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.142 (vs. 0.141, 0.71%↑) 0.142 0.000
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 9.582 (vs. 9.516, 0.69%↑) 9.574 0.031
MobileNetV3Small\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.901 (vs. 2.881, 0.69%↑) 2.900 0.014
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.762 (vs. 5.725, 0.65%↑) 5.770 0.018
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.121 (vs. 32.330, 0.65%↓) 32.076 0.132
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.727 (vs. 33.514, 0.63%↑) 33.471 0.371
matmul\_3456x1024x2048\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.166 (vs. 0.165, 0.59%↑) 0.166 0.000
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 1304.254 (vs. 1297.030, 0.56%↑) 1304.868 2.439
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.827 (vs. 94.344, 0.55%↓) 93.834 0.042
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 92.677 (vs. 93.158, 0.52%↓) 92.571 0.229
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 192.676 (vs. 191.722, 0.50%↑) 190.751 3.735
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 266.210 (vs. 267.480, 0.47%↓) 265.373 3.211
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.909 (vs. 8.951, 0.46%↓) 8.652 0.430
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 180.228 (vs. 179.397, 0.46%↑) 179.047 2.725
GPT2\_117M\_TF\_1X1XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.577 (vs. 52.817, 0.45%↓) 52.562 0.139
MobileBertSquad\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][default-flags] vulkan(none)[full-inference,default-flags] with default @ moto-edge-x30[gpu] 259.262 (vs. 260.438, 0.45%↓) 259.520 0.767
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 229.620 (vs. 228.608, 0.44%↑) 225.286 12.889
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.533 (vs. 8.495, 0.44%↑) 8.502 0.072
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.592 (vs. 7.560, 0.41%↑) 7.590 0.013
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5695.132 (vs. 5672.545, 0.40%↑) 5703.307 22.106
matmul\_256x256x2048\_i8\_i4\_i32\_tile\_config\_default(linalg) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.594 (vs. 7.565, 0.38%↑) 7.591 0.012
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 13.936 (vs. 13.884, 0.38%↑) 13.929 0.032
matmul\_2560x2560x2560\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.301 (vs. 0.300, 0.38%↑) 0.301 0.000
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.182 (vs. 34.310, 0.37%↓) 34.118 0.267
GPT2\_117M\_TF\_1X1XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.081 (vs. 69.823, 0.37%↑) 69.757 2.650
MobileBertSquad\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 58.462 (vs. 58.248, 0.37%↑) 58.149 0.548
DeepLabV3\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 49.331 (vs. 49.155, 0.36%↑) 49.617 0.906
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 57.934 (vs. 57.747, 0.32%↑) 57.759 0.415
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.904 (vs. 5.923, 0.32%↓) 5.907 0.014
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 63.623 (vs. 63.804, 0.28%↓) 63.499 0.347
matmul\_2562x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.374 (vs. 1.370, 0.28%↑) 1.371 0.008
matmul\_2564x2564x2564\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.948 (vs. 0.946, 0.28%↑) 0.946 0.008
MobileSSD\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.811 (vs. 32.722, 0.27%↑) 32.615 0.320
matmul\_2562x2564x2562\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 1.116 (vs. 1.113, 0.27%↑) 1.114 0.005
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.255 (vs. 8.233, 0.27%↑) 8.223 0.083
PoseNet\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.276 (vs. 5.262, 0.26%↑) 5.270 0.021
MobileNetV2\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.470 (vs. 21.525, 0.26%↓) 21.452 0.074
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 10.532 (vs. 10.558, 0.25%↓) 10.162 0.584
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.775 (vs. 0.777, 0.23%↓) 0.777 0.004
GPT2\_117M\_TF\_1X4XI32(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 88.751 (vs. 88.548, 0.23%↑) 89.321 3.295
matmul\_123x2561x2561\_f32t\_f32t\_f32t\_tile\_config\_default(linalg) [cuda-sm\_80-linux\_gnu-cuda][ukernel,matmul] cuda(none)[full-inference,default-flags] with default @ a2-highgpu-1g[gpu] 0.202 (vs. 0.202, 0.23%↑) 0.202 0.001
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.290 (vs. 5.301, 0.21%↓) 5.282 0.026
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.596 (vs. 0.594, 0.19%↑) 0.596 0.001
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.708 (vs. 0.707, 0.18%↑) 0.708 0.002
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 60.298 (vs. 60.398, 0.17%↓) 60.112 0.466
EfficientNet\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 28.616 (vs. 28.664, 0.17%↓) 28.572 0.114
PersonDetect\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.660 (vs. 0.659, 0.16%↑) 0.659 0.003
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.382 (vs. 12.399, 0.14%↓) 12.375 0.079
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.842 (vs. 180.069, 0.13%↓) 178.683 2.850
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1047.262 (vs. 1046.310, 0.09%↑) 1046.827 1.290
MobileBertSquad\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1046.193 (vs. 1047.001, 0.08%↓) 1045.586 1.873
EfficientNetV2STF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 233.687 (vs. 233.847, 0.07%↓) 231.755 4.116
MobileNetV2\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.680 (vs. 3.678, 0.05%↑) 3.671 0.022
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 481.252 (vs. 481.007, 0.05%↑) 480.487 1.973
MobileBertSquad\_fp16(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 187.279 (vs. 187.372, 0.05%↓) 186.170 2.984
BertForMaskedLMTF(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 113.440 (vs. 113.476, 0.03%↓) 112.708 1.258
MobileBertSquad\_int8(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 213.240 (vs. 213.192, 0.02%↑) 212.473 1.410
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2098.163 (vs. 2097.755, 0.02%↑) 2097.842 1.542
DeepLabV3\_fp32(tflite) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags,dt-uk] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.602 (vs. 8.604, 0.01%↓) 8.600 0.017
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 79.397 (vs. 79.391, 0.01%↑) 79.169 0.946
Vit\_int8(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2099.427 (vs. 2099.540, 0.01%↓) 2099.118 1.226

All Compilation Metrics

Benchmark Name Compilation Time (ms) Total Dispatch Size (bytes) Total Artifact Size (bytes) Stream IR Dispatch Count (# of cmd.dispatch ops)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 1361 (vs. 1325, 2.72%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 33184 (vs. 29827, 11.25%↑) 186928 (vs. 186928, 0.00%) 442757 (vs. 442757, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 40976 (vs. 39409, 3.98%↑) 238768 (vs. 238768, 0.00%) 10458309 (vs. 10458309, 0.00%) 97 (vs. 97, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 38010 (vs. 38126, 0.30%↓) 178288 (vs. 178288, 0.00%) 2959685 (vs. 2959685, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 61222 (vs. 57880, 5.77%↑) 598960 (vs. 598960, 0.00%) 5522373 (vs. 5522373, 0.00%) 89 (vs. 89, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 30359 (vs. 22493, 34.97%↑) 179392 (vs. 179392, 0.00%) 17099397 (vs. 17099397, 0.00%) 51 (vs. 51, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 35501 (vs. 33471, 6.06%↑) 177648 (vs. 177648, 0.00%) 14160773 (vs. 14160773, 0.00%) 74 (vs. 74, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 69598 (vs. 65209, 6.73%↑) 545824 (vs. 545824, 0.00%) 4196549 (vs. 4196549, 0.00%) 144 (vs. 144, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 49313 (vs. 47011, 4.90%↑) 296320 (vs. 296320, 0.00%) 18237957 (vs. 18237957, 0.00%) 124 (vs. 124, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 22618 (vs. 24201, 6.54%↓) 168064 (vs. 168064, 0.00%) 5221445 (vs. 5221445, 0.00%) 48 (vs. 48, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 76312 (vs. 68599, 11.24%↑) 87536 (vs. 87536, 0.00%) 99929733 (vs. 99929733, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 72650 (vs. 68739, 5.69%↑) 96800 (vs. 96800, 0.00%) 98446853 (vs. 98446853, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 299285 (vs. 295607, 1.24%↑) 5908560 (vs. 5908560, 0.00%) 32207557 (vs. 32207557, 0.00%) 1102 (vs. 1102, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 107247 (vs. 99589, 7.69%↑) 215712 (vs. 215712, 0.00%) 164497132 (vs. 164497132, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 33783 (vs. 33897, 0.34%↓) 64976 (vs. 64976, 0.00%) 134001839 (vs. 134001839, 0.00%) 209 (vs. 209, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 36684 (vs. 35004, 4.80%↑) 30272 (vs. 30272, 0.00%) 652757396 (vs. 652757396, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 36171 (vs. 33227, 8.86%↑) 16656 (vs. 16656, 0.00%) 652738004 (vs. 652738004, 0.00%) 246 (vs. 246, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 38097 (vs. 35555, 7.15%↑) 74464 (vs. 74464, 0.00%) 533846975 (vs. 533846975, 0.00%) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,no-dt,compile-stats] 38538 (vs. 36391, 5.90%↑) 56768 (vs. 56768, 0.00%) 1336031679 (vs. 1336031679, 0.00%) 413 (vs. 413, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 1456 (vs. 1474, 1.22%↓) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 23807 (vs. 23782, 0.11%↑) 106864 (vs. 106864, 0.00%) 371013 (vs. 371013, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 34758 (vs. 34127, 1.85%↑) 114608 (vs. 114608, 0.00%) 11446533 (vs. 11446533, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 37404 (vs. 32666, 14.50%↑) 127376 (vs. 127376, 0.00%) 2935237 (vs. 2935237, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 54710 (vs. 50900, 7.49%↑) 269840 (vs. 269840, 0.00%) 5219909 (vs. 5219909, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 19770 (vs. 16840, 17.40%↑) 62160 (vs. 62160, 0.00%) 17019781 (vs. 17019781, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 32038 (vs. 29446, 8.80%↑) 99152 (vs. 99152, 0.00%) 14219589 (vs. 14219589, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 59168 (vs. 57732, 2.49%↑) 322624 (vs. 322624, 0.00%) 3996229 (vs. 3996229, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 45974 (vs. 41584, 10.56%↑) 139168 (vs. 139168, 0.00%) 18503941 (vs. 18503941, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 35840 (vs. 36209, 1.02%↓) 84080 (vs. 84080, 0.00%) 5188101 (vs. 5188101, 0.00%) 81 (vs. 81, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 98185 (vs. 90536, 8.45%↑) 74048 (vs. 74048, 0.00%) 100104773 (vs. 100104773, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 93640 (vs. 84052, 11.41%↑) 73184 (vs. 73184, 0.00%) 98608709 (vs. 98608709, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 260179 (vs. 247074, 5.30%↑) 2642416 (vs. 2642416, 0.00%) 29120517 (vs. 29120517, 0.00%) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 114287 (vs. 108326, 5.50%↑) 179696 (vs. 179696, 0.00%) 179806124 (vs. 179806124, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 42661 (vs. 40461, 5.44%↑) 42224 (vs. 42224, 0.00%) 219514223 (vs. 219514223, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 51841 (vs. 44690, 16.00%↑) 23808 (vs. 23808, 0.00%) 992645460 (vs. 992645460, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 41834 (vs. 39451, 6.04%↑) 14688 (vs. 14688, 0.00%) 992644244 (vs. 992644244, 0.00%) 367 (vs. 367, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 50882 (vs. 45836, 11.01%↑) 42368 (vs. 42368, 0.00%) 875941567 (vs. 875941567, 0.00%) 334 (vs. 334, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][experimental-flags,dt-only,compile-stats] 64994 (vs. 52809, 23.07%↑) 76336 (vs. 76336, 0.00%) 1336085695 (vs. 1336085695, 0.00%) 654 (vs. 654, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 1595 (vs. 1475, 8.14%↑) 9664 (vs. 9664, 0.00%) 277497 (vs. 277497, 0.00%) 1 (vs. 1, 0.00%)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 23020 (vs. 22240, 3.51%↑) 91520 (vs. 91520, 0.00%) 355653 (vs. 355653, 0.00%) 87 (vs. 87, 0.00%)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 32992 (vs. 34184, 3.49%↓) 93472 (vs. 93472, 0.00%) 10395269 (vs. 10395269, 0.00%) 147 (vs. 147, 0.00%)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 31573 (vs. 31273, 0.96%↑) 106960 (vs. 106960, 0.00%) 2915333 (vs. 2915333, 0.00%) 144 (vs. 144, 0.00%)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 49760 (vs. 47313, 5.17%↑) 239056 (vs. 239056, 0.00%) 5189125 (vs. 5189125, 0.00%) 148 (vs. 148, 0.00%)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 20032 (vs. 16376, 22.33%↑) 57120 (vs. 57120, 0.00%) 17014789 (vs. 17014789, 0.00%) 79 (vs. 79, 0.00%)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 34313 (vs. 29653, 15.72%↑) 88784 (vs. 88784, 0.00%) 14127301 (vs. 14127301, 0.00%) 136 (vs. 136, 0.00%)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 53554 (vs. 58959, 9.17%↓) 291328 (vs. 291328, 0.00%) 3964933 (vs. 3964933, 0.00%) 213 (vs. 213, 0.00%)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 43152 (vs. 41543, 3.87%↑) 122128 (vs. 122128, 0.00%) 18347461 (vs. 18347461, 0.00%) 223 (vs. 223, 0.00%)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 19757 (vs. 20265, 2.51%↓) 55840 (vs. 55840, 0.00%) 5182597 (vs. 5182597, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 95271 (vs. 82392, 15.63%↑) 48928 (vs. 48928, 0.00%) 100108421 (vs. 100108421, 0.00%) 1786 (vs. 1786, 0.00%)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 94061 (vs. 85802, 9.63%↑) 50112 (vs. 50112, 0.00%) 98614277 (vs. 98614277, 0.00%) 1762 (vs. 1762, 0.00%)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 246103 (vs. 241467, 1.92%↑) 2631824 (vs. 2631824, 0.00%) 29117061 (vs. 29117061, 0.00%) 2160 (vs. 2160, 0.00%)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 116879 (vs. 108786, 7.44%↑) 150448 (vs. 150448, 0.00%) 169905644 (vs. 169905644, 0.00%) 439 (vs. 439, 0.00%)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 39460 (vs. 39549, 0.23%↓) 39792 (vs. 39792, 0.00%) 219511791 (vs. 219511791, 0.00%) 330 (vs. 330, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 50075 (vs. 42777, 17.06%↑) 19328 (vs. 19328, 0.00%) 992542612 (vs. 992542612, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 49054 (vs. 43765, 12.08%↑) 14112 (vs. 14112, 0.00%) 992544916 (vs. 992544916, 0.00%) 367 (vs. 367, 0.00%)
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 45987 (vs. 42617, 7.91%↑) 39072 (vs. 39072, 0.00%) 875938303 (vs. 875938303, 0.00%) 334 (vs. 334, 0.00%)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu][default-flags,dt-uk,compile-stats] 59676 (vs. 51890, 15.00%↑) 45760 (vs. 45760, 0.00%) 1336112447 (vs. 1336112447, 0.00%) 654 (vs. 654, 0.00%)
EfficientNetV2STF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 97790 (vs. 90235, 8.37%↑) 847160 (vs. 847160, 0.00%) 165161940 (vs. 165161940, 0.00%) 276 (vs. 276, 0.00%)
MiniLML12H384Uncased(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 27200 (vs. 30247, 10.07%↓) 145116 (vs. 145116, 0.00%) 134088367 (vs. 134088367, 0.00%) 209 (vs. 209, 0.00%)
BertForMaskedLMTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 34010 (vs. 36140, 5.89%↓) 227820 (vs. 227820, 0.00%) 534005730 (vs. 534005730, 0.00%) 212 (vs. 212, 0.00%)
BertLargeTF(stablehlo) [cuda-sm_80-linux_gnu-cuda][default-flags,compile-stats] 37828 (vs. 32540, 16.25%↑) 130808 (vs. 130808, 0.00%) 1336110891 (vs. 1336110891, 0.00%) 413 (vs. 413, 0.00%)
matmul_3456x1024x2048_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1476 (vs. 1500, 1.60%↓) 30540 (vs. 30540, 0.00%) 41263 (vs. 41263, 0.00%) 1 (vs. 1, 0.00%)
matmul_3456x1024x2048_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1756 (vs. 1739, 0.98%↑) 43516 (vs. 43516, 0.00%) 54239 (vs. 54239, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1526 (vs. 1510, 1.06%↑) 28492 (vs. 28492, 0.00%) 39151 (vs. 39151, 0.00%) 1 (vs. 1, 0.00%)
matmul_2560x2560x2560_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1707 (vs. 1788, 4.53%↓) 39892 (vs. 39892, 0.00%) 50551 (vs. 50551, 0.00%) 1 (vs. 1, 0.00%)
matmul_2564x2564x2564_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1882 (vs. 1908, 1.36%↓) 86300 (vs. 86300, 0.00%) 96959 (vs. 96959, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2564x2562_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2044 (vs. 2065, 1.02%↓) 89040 (vs. 89040, 0.00%) 99763 (vs. 99763, 0.00%) 1 (vs. 1, 0.00%)
matmul_2562x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 2501 (vs. 2367, 5.66%↑) 85768 (vs. 85768, 0.00%) 96491 (vs. 96491, 0.00%) 1 (vs. 1, 0.00%)
matmul_123x2561x2561_f32t_f32t_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,compile-stats] 1801 (vs. 1972, 8.67%↓) 51280 (vs. 51280, 0.00%) 62002 (vs. 62002, 0.00%) 1 (vs. 1, 0.00%)
matmul_128x256x8192_f16t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 863 (vs. 797, 8.28%↑) 8828 (vs. 8828, 0.00%) 26181 (vs. 26181, 0.00%) 2 (vs. 2, 0.00%)
matmul_128x256x8192_f32t_tile_config_default(linalg) [cuda-sm_80-linux_gnu-cuda][ukernel,matmul,splitk,compile-stats] 888 (vs. 826, 7.51%↑) 9784 (vs. 9784, 0.00%) 27137 (vs. 27137, 0.00%) 2 (vs. 2, 0.00%)
MiniLML12H384Uncased(stablehlo) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 36133 (vs. 33497, 7.87%↑) 51640 (vs. 51640, 0.00%) 133988529 (vs. 133988529, 0.00%) 209 (vs. 209, 0.00%)
DeepLabV3_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 22297 (vs. 21456, 3.92%↑) 43080 (vs. 43080, 0.00%) 2824455 (vs. 2824455, 0.00%) 79 (vs. 79, 0.00%)
EfficientNet_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 43404 (vs. 39195, 10.74%↑) 175288 (vs. 175288, 0.00%) 5098695 (vs. 5098695, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 75264 (vs. 68860, 9.30%↑) 51112 (vs. 51112, 0.00%) 98401159 (vs. 98401159, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 250374 (vs. 238218, 5.10%↑) 2291592 (vs. 2291592, 0.00%) 28590599 (vs. 28590599, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV1_fp32(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 20782 (vs. 16549, 25.58%↑) 51560 (vs. 51560, 0.00%) 16976839 (vs. 16976839, 0.00%) 65 (vs. 65, 0.00%)
MobileNetV2_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 57054 (vs. 55929, 2.01%↑) 227592 (vs. 227592, 0.00%) 3878279 (vs. 3878279, 0.00%) 144 (vs. 144, 0.00%)
PersonDetect_int8(tflite) [riscv_64-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 19241 (vs. 21919, 12.22%↓) 73272 (vs. 73272, 0.00%) 329095 (vs. 329095, 0.00%) 60 (vs. 60, 0.00%)
EfficientNet_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 55519 (vs. 45885, 21.00%↑) 327308 (vs. 327308, 0.00%) 5250695 (vs. 5250695, 0.00%) 89 (vs. 89, 0.00%)
MobileBertSquad_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 301184 (vs. 281399, 7.03%↑) 2523180 (vs. 2523180, 0.00%) 28822215 (vs. 28822215, 0.00%) 1102 (vs. 1102, 0.00%)
PersonDetect_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 23642 (vs. 23829, 0.78%↓) 175772 (vs. 175772, 0.00%) 431623 (vs. 431623, 0.00%) 60 (vs. 60, 0.00%)
MobileNetV2_int8(tflite) [riscv_32-generic-linux_gnu-llvm_cpu][default-flags,compile-stats] 61845 (vs. 56173, 10.10%↑) 315096 (vs. 315096, 0.00%) 3965831 (vs. 3965831, 0.00%) 144 (vs. 144, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 23113 (vs. 20431, 13.13%↑) 56656 (vs. 56656, 0.00%) 2838021 (vs. 2838021, 0.00%) 79 (vs. 79, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 52245 (vs. 51172, 2.10%↑) 33120 (vs. 33120, 0.00%) 98383173 (vs. 98383173, 0.00%) 704 (vs. 704, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 38244 (vs. 34134, 12.04%↑) 20784 (vs. 20784, 0.00%) 652747924 (vs. 652747924, 0.00%) 233 (vs. 233, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 38093 (vs. 34390, 10.77%↑) 9584 (vs. 9584, 0.00%) 652730900 (vs. 652730900, 0.00%) 246 (vs. 246, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 285052 (vs. 278554, 2.33%↑) 4407168 (vs. 4407168, 0.00%) 30706181 (vs. 30706181, 0.00%) 1102 (vs. 1102, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 113326 (vs. 104125, 8.84%↑) 711552 (vs. 711552, 0.00%) 88745541 (vs. 88745541, 0.00%) 268 (vs. 268, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 22199 (vs. 21131, 5.05%↑) 41216 (vs. 41216, 0.00%) 2840133 (vs. 2840133, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 88591 (vs. 81947, 8.11%↑) 22384 (vs. 22384, 0.00%) 98570181 (vs. 98570181, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 42476 (vs. 40355, 5.26%↑) 11040 (vs. 11040, 0.00%) 992509780 (vs. 992509780, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 48274 (vs. 43082, 12.05%↑) 10048 (vs. 10048, 0.00%) 992516180 (vs. 992516180, 0.00%) 367 (vs. 367, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 230669 (vs. 217252, 6.18%↑) 1356736 (vs. 1356736, 0.00%) 27837893 (vs. 27837893, 0.00%) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 85248 (vs. 77083, 10.59%↑) 129888 (vs. 129888, 0.00%) 88159621 (vs. 88159621, 0.00%) 386 (vs. 386, 0.00%)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 22252 (vs. 21662, 2.72%↑) 41568 (vs. 41568, 0.00%) 2840517 (vs. 2840517, 0.00%) 144 (vs. 144, 0.00%)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 84833 (vs. 78374, 8.24%↑) 22496 (vs. 22496, 0.00%) 98558021 (vs. 98558021, 0.00%) 1762 (vs. 1762, 0.00%)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 46677 (vs. 39579, 17.93%↑) 11152 (vs. 11152, 0.00%) 992509908 (vs. 992509908, 0.00%) 318 (vs. 318, 0.00%)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 43073 (vs. 38695, 11.31%↑) 10096 (vs. 10096, 0.00%) 992516244 (vs. 992516244, 0.00%) 367 (vs. 367, 0.00%)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 220331 (vs. 205299, 7.32%↑) 1219136 (vs. 1219136, 0.00%) 27697221 (vs. 27697221, 0.00%) 2160 (vs. 2160, 0.00%)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 74149 (vs. 69831, 6.18%↑) 122416 (vs. 122416, 0.00%) 88152133 (vs. 88152133, 0.00%) 386 (vs. 386, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,no-dt,compile-stats] 1120 (vs. 909, 23.21%↑) 3872 (vs. 3872, 0.00%) 271673 (vs. 271673, 0.00%) 1 (vs. 1, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][default-flags,dt-uk,compile-stats] 1693 (vs. 1744, 2.92%↓) 4432 (vs. 4432, 0.00%) 273541 (vs. 273541, 0.00%) 4 (vs. 4, 0.00%)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu][experimental-flags,dt-only,compile-stats] 1295 (vs. 1158, 11.83%↑) 3856 (vs. 3856, 0.00%) 272965 (vs. 272965, 0.00%) 4 (vs. 4, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 65500 (vs. 64020, 2.31%↑) 246212 (vs. 246212, 0.00%) 98597402 (vs. 98597402, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [qualcomm-adreno-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,compile-stats] 73899 (vs. 68309, 8.18%↑) 246212 (vs. 246212, 0.00%) 98597402 (vs. 98597402, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 74243 (vs. 70944, 4.65%↑) 142904 (vs. 142904, 0.00%) 98493689 (vs. 98493689, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,demote-f32-to-f16,compile-stats] 100523 (vs. 101873, 1.33%↓) 3003516 (vs. 3003516, 0.00%) 52897620 (vs. 52897620, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][default-flags,compile-stats] 203050 (vs. 192191, 5.65%↑) 6755236 (vs. 6755236, 0.00%) 33055318 (vs. 33055318, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 69507 (vs. 65058, 6.84%↑) 142904 (vs. 142904, 0.00%) 98493817 (vs. 98493817, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16,compile-stats] 88852 (vs. 85235, 4.24%↑) 3007544 (vs. 3007544, 0.00%) 52904532 (vs. 52904532, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,compile-stats] 199544 (vs. 190363, 4.82%↑) 6734980 (vs. 6734980, 0.00%) 33022998 (vs. 33022998, 0.00%) 1102 (vs. 1102, 0.00%)
MobileBertSquad_fp32(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 118021 (vs. 103119, 14.45%↑) 142904 (vs. 142904, 0.00%) 99715961 (vs. 99715961, 0.00%) 704 (vs. 704, 0.00%)
MobileBertSquad_fp16(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,demote-f32-to-f16,compile-stats] 132395 (vs. 120266, 10.09%↑) 3007544 (vs. 3007544, 0.00%) 54168340 (vs. 54168340, 0.00%) 728 (vs. 728, 0.00%)
MobileBertSquad_int8(tflite) [arm-valhall-vulkan_android31-vulkan_spirv][experimental-flags,fuse-padding,max-concurrency,repeated-kernel,compile-stats] 309745 (vs. 301444, 2.75%↑) 6734980 (vs. 6734980, 0.00%) 34936086 (vs. 34936086, 0.00%) 1102 (vs. 1102, 0.00%)
MobileNetV2_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 23571 (vs. 22439, 5.04%↑) 208053 (vs. 208053, 0.00%) 14213054 (vs. 14213054, 0.00%) 172 (vs. 172, 0.00%)
MobileNetV3Small_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags,compile-stats] 33373 (vs. 31657, 5.42%↑) 311157 (vs. 311157, 0.00%) 10551486 (vs. 10551486, 0.00%) 210 (vs. 210, 0.00%)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment