iree-github-actions-bot/iree-full-benchmark-result-231.md

## iree-full-benchmark-result-231.md

      
    Raw
  

              iree-full-benchmark-result-231.md
            
          
    Full Benchmark Summary


@ commit 0f153b0ddc6420c96971c8ad7a496e95f8b2dee2 (vs. base ff38a688914f2fdde7c271227a6361bd2bb643d0)
Pull request
Buildkite build

Regressed Benchmarks 🚩


Benchmark Name
Average Latency (ms)
Median Latency (ms)
Latency Standard Deviation (ms)


MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
134 (vs. 122, 9.84%↑)
132
15


MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)
18 (vs. 17, 5.88%↑)
18
0


Improved Benchmarks 🎉


Benchmark Name
Average Latency (ms)
Median Latency (ms)
Latency Standard Deviation (ms)


MobileNetV2 [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
1130 (vs. 1255, 9.96%↓)
1092
70


MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
787 (vs. 870, 9.54%↓)
757
54


Similar Benchmarks


Benchmark Name
Average Latency (ms)
Median Latency (ms)
Latency Standard Deviation (ms)


MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
227 (vs. 237, 4.22%↓)
227
13


MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
79 (vs. 77, 2.60%↑)
78
3


MobileNetV3Small [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)
81 (vs. 79, 2.53%↑)
81
2


MobileNetV2 [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)
84 (vs. 82, 2.44%↑)
85
5


MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
46 (vs. 45, 2.22%↑)
46
0


MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
195 (vs. 191, 2.09%↑)
195
5


MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
431 (vs. 440, 2.05%↓)
420
51


MobileNetV3Small [fp32,imagenet] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)
60 (vs. 61, 1.64%↓)
61
0


MobileNetV3Small [fp32,imagenet] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)
374 (vs. 368, 1.63%↑)
374
5


MobileNetV2 [fp32,imagenet] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
141 (vs. 139, 1.44%↑)
142
2


MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
372 (vs. 375, 0.80%↓)
373
3


MobileNetV2 [fp32,imagenet] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)
141 (vs. 140, 0.71%↑)
142
2


MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
388 (vs. 386, 0.52%↑)
388
2


MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
966 (vs. 962, 0.42%↑)
959
30


MobileNetV2 [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
1328 (vs. 1325, 0.23%↑)
1328
4


MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-VMVX @ SM-G980F (CPU-ARMv8.2-A)
63313 (vs. 63408, 0.15%↓)
63413
216


MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-VMVX @ SM-G980F (CPU-ARMv8.2-A)
16938 (vs. 16952, 0.08%↓)
16943
20


MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-VMVX @ Pixel-4 (CPU-ARMv8.2-A)
19119 (vs. 19134, 0.08%↓)
19127
27


MobileNetV2 [fp32,imagenet] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)
1330 (vs. 1329, 0.08%↑)
1328
4


MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-VMVX @ Pixel-4 (CPU-ARMv8.2-A)
71581 (vs. 71565, 0.02%↑)
71555
69


MobileNetV2 [fp32,imagenet] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)
1252 (vs. 1252, 0.00%)
1251
5


MobileNetV3Small [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)
20 (vs. 20, 0.00%)
20
0


MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
315 (vs. 315, 0.00%)
315
6


MobileNetV3Small [fp32,imagenet] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)
46 (vs. 46, 0.00%)
46
0


MobileNetV2 [fp32,imagenet] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
170 (vs. 170, 0.00%)
170
0


MobileNetV2 [fp32,imagenet] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)
211 (vs. 211, 0.00%)
210
2


MobileNetV2 [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)
74 (vs. 74, 0.00%)
74
0


MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)
70 (vs. 70, 0.00%)
70
1


MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
51 (vs. 51, 0.00%)
51
0


MobileNetV3Small [fp32,imagenet] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)
388 (vs. 388, 0.00%)
388
2


MobileNetV3Small [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)
38 (vs. 38, 0.00%)
38
0


MobileNetV3Small [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)
35 (vs. 35, 0.00%)
34
1


Similar Benchmarks


Benchmark Name
Average Latency (ms)
Median Latency (ms)
Latency Standard Deviation (ms)


MobileBertSquad [fp16] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)
157
156
1


MobileBertSquad [fp32] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
2091
2132
91


MobileBertSquad [fp32] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
5419
5453
497


MobileBertSquad [fp32] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)
215
215
2


MobileBertSquad [fp32] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
543
547
42


MobileBertSquad [fp32] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)
786
789
9


MobileBertSquad [fp32] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)
5827
5867
118


MobileBertSquad [fp32] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)
790
785
19


MobileBertSquad [fp32] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
2065
2062
16


MobileBertSquad [fp32] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
5611
5610
6


MobileBertSquad [fp32] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
345
345
1


MobileBertSquad [fp32] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)
729
728
4


MobileBertSquad [fp32] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)
5640
5641
7


MobileBertSquad [fp32] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)
900
899
2


MobileBertSquad [fp32] (TensorFlow) full-inference with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)
866
860
19
Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	134 (vs. 122, 9.84%↑)	132	15
MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	18 (vs. 17, 5.88%↑)	18	0
Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV2 [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	1130 (vs. 1255, 9.96%↓)	1092	70
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	787 (vs. 870, 9.54%↓)	757	54
Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	227 (vs. 237, 4.22%↓)	227	13
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	79 (vs. 77, 2.60%↑)	78	3
MobileNetV3Small [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	81 (vs. 79, 2.53%↑)	81	2
MobileNetV2 [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	84 (vs. 82, 2.44%↑)	85	5
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	46 (vs. 45, 2.22%↑)	46	0
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	195 (vs. 191, 2.09%↑)	195	5
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	431 (vs. 440, 2.05%↓)	420	51
MobileNetV3Small [fp32,imagenet] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)	60 (vs. 61, 1.64%↓)	61	0
MobileNetV3Small [fp32,imagenet] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)	374 (vs. 368, 1.63%↑)	374	5
MobileNetV2 [fp32,imagenet] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	141 (vs. 139, 1.44%↑)	142	2
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	372 (vs. 375, 0.80%↓)	373	3
MobileNetV2 [fp32,imagenet] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)	141 (vs. 140, 0.71%↑)	142	2
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	388 (vs. 386, 0.52%↑)	388	2
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	966 (vs. 962, 0.42%↑)	959	30
MobileNetV2 [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	1328 (vs. 1325, 0.23%↑)	1328	4
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-VMVX @ SM-G980F (CPU-ARMv8.2-A)	63313 (vs. 63408, 0.15%↓)	63413	216
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-VMVX @ SM-G980F (CPU-ARMv8.2-A)	16938 (vs. 16952, 0.08%↓)	16943	20
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-VMVX @ Pixel-4 (CPU-ARMv8.2-A)	19119 (vs. 19134, 0.08%↓)	19127	27
MobileNetV2 [fp32,imagenet] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)	1330 (vs. 1329, 0.08%↑)	1328	4
MobileNetV2 [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-VMVX @ Pixel-4 (CPU-ARMv8.2-A)	71581 (vs. 71565, 0.02%↑)	71555	69
MobileNetV2 [fp32,imagenet] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)	1252 (vs. 1252, 0.00%)	1251	5
MobileNetV3Small [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	20 (vs. 20, 0.00%)	20	0
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	315 (vs. 315, 0.00%)	315	6
MobileNetV3Small [fp32,imagenet] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)	46 (vs. 46, 0.00%)	46	0
MobileNetV2 [fp32,imagenet] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	170 (vs. 170, 0.00%)	170	0
MobileNetV2 [fp32,imagenet] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)	211 (vs. 211, 0.00%)	210	2
MobileNetV2 [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)	74 (vs. 74, 0.00%)	74	0
MobileNetV2 [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)	70 (vs. 70, 0.00%)	70	1
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	51 (vs. 51, 0.00%)	51	0
MobileNetV3Small [fp32,imagenet] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)	388 (vs. 388, 0.00%)	388	2
MobileNetV3Small [fp32,imagenet] (TensorFlow) full-inference with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)	38 (vs. 38, 0.00%)	38	0
MobileNetV3Small [fp32,imagenet] (TensorFlow) kernel-execution with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)	35 (vs. 35, 0.00%)	34	1
Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileBertSquad [fp16] (TensorFlow) kernel-execution with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	157	156	1
MobileBertSquad [fp32] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	2091	2132	91
MobileBertSquad [fp32] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	5419	5453	497
MobileBertSquad [fp32] (TensorFlow) full-inference with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	215	215	2
MobileBertSquad [fp32] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	543	547	42
MobileBertSquad [fp32] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	786	789	9
MobileBertSquad [fp32] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)	5827	5867	118
MobileBertSquad [fp32] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)	790	785	19
MobileBertSquad [fp32] (TensorFlow) 3-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	2065	2062	16
MobileBertSquad [fp32] (TensorFlow) 1-thread,little-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	5611	5610	6
MobileBertSquad [fp32] (TensorFlow) 3-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	345	345	1
MobileBertSquad [fp32] (TensorFlow) 1-thread,big-core,full-inference with IREE-Dylib @ Pixel-4 (CPU-ARMv8.2-A)	729	728	4
MobileBertSquad [fp32] (TensorFlow) little-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)	5640	5641	7
MobileBertSquad [fp32] (TensorFlow) big-core,full-inference with IREE-Dylib-Sync @ Pixel-4 (CPU-ARMv8.2-A)	900	899	2
MobileBertSquad [fp32] (TensorFlow) full-inference with IREE-Vulkan @ Pixel-4 (GPU-Adreno-640)	866	860	19