Skip to content

Instantly share code, notes, and snippets.

@reinsteam
Created February 23, 2019 04:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save reinsteam/a973a8e7545cc8ae1ac7afdef98645a8 to your computer and use it in GitHub Desktop.
Save reinsteam/a973a8e7545cc8ae1ac7afdef98645a8 to your computer and use it in GitHub Desktop.
Throughput analysis dump from IACA 2.3
Intel(R) Architecture Code Analyzer Version - 2.3 build:c151d5a (Thu, 6 Jul 2017 09:41:36 +0300)
Analyzed File - aosoa_packet.obj
Binary Format - 64Bit
Architecture - HSW
Analysis Type - Throughput
*******************************************************************
Intel(R) Architecture Code Analyzer Mark Number 1
*******************************************************************
Throughput Analysis Report
--------------------------
Block Throughput: 48.00 Cycles Throughput Bottleneck: Backend. Port5
Port Binding In Cycles Per Iteration:
---------------------------------------------------------------------------------------
| Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 |
---------------------------------------------------------------------------------------
| Cycles | 36.0 0.0 | 36.0 | 27.0 27.0 | 27.0 27.0 | 0.0 | 48.0 | 1.0 | 0.0 |
---------------------------------------------------------------------------------------
N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
F - Macro Fusion with the previous instruction occurred
* - instruction micro-ops not bound to a port
^ - Micro Fusion happened
# - ESP Tracking sync uop was issued
@ - SSE instruction followed an AVX256/AVX512 instruction, dozens of cycles penalty is expected
X - instruction not supported, was not accounted in Analysis
| Num Of | Ports pressure in cycles | |
| Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | 6 | 7 | |
---------------------------------------------------------------------------------
| 1 | | | | | | 1.0 | | | CP | vshufps xmm3, xmm15, xmm15, 0x0
| 1 | | | | | | 1.0 | | | CP | vshufps xmm5, xmm15, xmm15, 0x55
| 1 | | | | | | 1.0 | | | CP | vshufps xmm7, xmm15, xmm15, 0xaa
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm2, xmm3, xmmword ptr [rip]
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm4, xmm5, xmmword ptr [rip]
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm6, xmm7, xmmword ptr [rip]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm2, xmm2, xmmword ptr [rsi+rcx*1]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x20]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x40]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm2, xmm2, xmmword ptr [rsi+rcx*1+0x10]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x30]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x50]
| 1 | 1.0 | | | | | | | | | vmulps xmm2, xmm2, xmm3
| 1 | 1.0 | | | | | | | | | vmulps xmm4, xmm4, xmm5
| 1 | 1.0 | | | | | | | | | vmulps xmm6, xmm6, xmm7
| 1 | | | | | | 1.0 | | | CP | vshufps xmm8, xmm15, xmm15, 0xff
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm4
| 1 | | 1.0 | | | | | | | | vaddps xmm6, xmm6, xmm8
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm6
| 1 | | | | | | 1.0 | | | CP | vxorps xmm9, xmm9, xmm2
| 1 | | | | | | 1.0 | | | CP | vshufps xmm3, xmm14, xmm14, 0x0
| 1 | | | | | | 1.0 | | | CP | vshufps xmm5, xmm14, xmm14, 0x55
| 1 | | | | | | 1.0 | | | CP | vshufps xmm7, xmm14, xmm14, 0xaa
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm2, xmm3, xmmword ptr [rip]
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm4, xmm5, xmmword ptr [rip]
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm6, xmm7, xmmword ptr [rip]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm2, xmm2, xmmword ptr [rsi+rcx*1]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x20]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x40]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm2, xmm2, xmmword ptr [rsi+rcx*1+0x10]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x30]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x50]
| 1 | 1.0 | | | | | | | | | vmulps xmm2, xmm2, xmm3
| 1 | 1.0 | | | | | | | | | vmulps xmm4, xmm4, xmm5
| 1 | 1.0 | | | | | | | | | vmulps xmm6, xmm6, xmm7
| 1 | | | | | | 1.0 | | | CP | vshufps xmm8, xmm14, xmm14, 0xff
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm4
| 1 | | 1.0 | | | | | | | | vaddps xmm6, xmm6, xmm8
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm6
| 1 | | | | | | 1.0 | | | CP | vxorps xmm9, xmm9, xmm2
| 1 | | | | | | 1.0 | | | CP | vshufps xmm3, xmm13, xmm13, 0x0
| 1 | | | | | | 1.0 | | | CP | vshufps xmm5, xmm13, xmm13, 0x55
| 1 | | | | | | 1.0 | | | CP | vshufps xmm7, xmm13, xmm13, 0xaa
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm2, xmm3, xmmword ptr [rip]
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm4, xmm5, xmmword ptr [rip]
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm6, xmm7, xmmword ptr [rip]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm2, xmm2, xmmword ptr [rsi+rcx*1]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x20]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x40]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm2, xmm2, xmmword ptr [rsi+rcx*1+0x10]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x30]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x50]
| 1 | 1.0 | | | | | | | | | vmulps xmm2, xmm2, xmm3
| 1 | 1.0 | | | | | | | | | vmulps xmm4, xmm4, xmm5
| 1 | 1.0 | | | | | | | | | vmulps xmm6, xmm6, xmm7
| 1 | | | | | | 1.0 | | | CP | vshufps xmm8, xmm13, xmm13, 0xff
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm4
| 1 | | 1.0 | | | | | | | | vaddps xmm6, xmm6, xmm8
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm6
| 1 | | | | | | 1.0 | | | CP | vxorps xmm9, xmm9, xmm2
| 1 | | | | | | 1.0 | | | CP | vshufps xmm3, xmm12, xmm12, 0x0
| 1 | | | | | | 1.0 | | | CP | vshufps xmm5, xmm12, xmm12, 0x55
| 1 | | | | | | 1.0 | | | CP | vshufps xmm7, xmm12, xmm12, 0xaa
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm2, xmm3, xmmword ptr [rip]
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm4, xmm5, xmmword ptr [rip]
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm6, xmm7, xmmword ptr [rip]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm2, xmm2, xmmword ptr [rsi+rcx*1]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x20]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x40]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm2, xmm2, xmmword ptr [rsi+rcx*1+0x10]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x30]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x50]
| 1 | 1.0 | | | | | | | | | vmulps xmm2, xmm2, xmm3
| 1 | 1.0 | | | | | | | | | vmulps xmm4, xmm4, xmm5
| 1 | 1.0 | | | | | | | | | vmulps xmm6, xmm6, xmm7
| 1 | | | | | | 1.0 | | | CP | vshufps xmm8, xmm12, xmm12, 0xff
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm4
| 1 | | 1.0 | | | | | | | | vaddps xmm6, xmm6, xmm8
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm6
| 1 | | | | | | 1.0 | | | CP | vxorps xmm9, xmm9, xmm2
| 1 | | | | | | 1.0 | | | CP | vshufps xmm3, xmm11, xmm11, 0x0
| 1 | | | | | | 1.0 | | | CP | vshufps xmm5, xmm11, xmm11, 0x55
| 1 | | | | | | 1.0 | | | CP | vshufps xmm7, xmm11, xmm11, 0xaa
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm2, xmm3, xmmword ptr [rip]
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm4, xmm5, xmmword ptr [rip]
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm6, xmm7, xmmword ptr [rip]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm2, xmm2, xmmword ptr [rsi+rcx*1]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x20]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x40]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm2, xmm2, xmmword ptr [rsi+rcx*1+0x10]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x30]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x50]
| 1 | 1.0 | | | | | | | | | vmulps xmm2, xmm2, xmm3
| 1 | 1.0 | | | | | | | | | vmulps xmm4, xmm4, xmm5
| 1 | 1.0 | | | | | | | | | vmulps xmm6, xmm6, xmm7
| 1 | | | | | | 1.0 | | | CP | vshufps xmm8, xmm11, xmm11, 0xff
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm4
| 1 | | 1.0 | | | | | | | | vaddps xmm6, xmm6, xmm8
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm6
| 1 | | | | | | 1.0 | | | CP | vxorps xmm9, xmm9, xmm2
| 1 | | | | | | 1.0 | | | CP | vshufps xmm3, xmm10, xmm10, 0x0
| 1 | | | | | | 1.0 | | | CP | vshufps xmm5, xmm10, xmm10, 0x55
| 1 | | | | | | 1.0 | | | CP | vshufps xmm7, xmm10, xmm10, 0xaa
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm2, xmm3, xmmword ptr [rip]
| 2^ | 1.0 | | 1.0 1.0 | | | | | | | vpand xmm4, xmm5, xmmword ptr [rip]
| 2^ | 1.0 | | | 1.0 1.0 | | | | | | vpand xmm6, xmm7, xmmword ptr [rip]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm2, xmm2, xmmword ptr [rsi+rcx*1]
| 2 | | | | 1.0 1.0 | | 1.0 | | | CP | vxorps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x20]
| 2 | | | 1.0 1.0 | | | 1.0 | | | CP | vxorps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x40]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm2, xmm2, xmmword ptr [rsi+rcx*1+0x10]
| 2 | | 1.0 | 1.0 1.0 | | | | | | | vaddps xmm4, xmm4, xmmword ptr [rsi+rcx*1+0x30]
| 2 | | 1.0 | | 1.0 1.0 | | | | | | vaddps xmm6, xmm6, xmmword ptr [rsi+rcx*1+0x50]
| 1 | 1.0 | | | | | | | | | vmulps xmm2, xmm2, xmm3
| 1 | 1.0 | | | | | | | | | vmulps xmm4, xmm4, xmm5
| 1 | 1.0 | | | | | | | | | vmulps xmm6, xmm6, xmm7
| 1 | | | | | | 1.0 | | | CP | vshufps xmm8, xmm10, xmm10, 0xff
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm4
| 1 | | 1.0 | | | | | | | | vaddps xmm6, xmm6, xmm8
| 1 | | 1.0 | | | | | | | | vaddps xmm2, xmm2, xmm6
| 1 | | | | | | 1.0 | | | CP | vxorps xmm9, xmm9, xmm2
| 1 | | | | | | | 1.0 | | | add rcx, 0x60
| 0F | | | | | | | | | | jnz 0xfffffffffffffd54
Total Num Of Uops: 175
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment