Skip to content

Instantly share code, notes, and snippets.

@dsprenkels
Last active November 28, 2018 17:22
Show Gist options
  • Save dsprenkels/6ed8d1c11fd1dc0e17f6baf3f50038c9 to your computer and use it in GitHub Desktop.
Save dsprenkels/6ed8d1c11fd1dc0e17f6baf3f50038c9 to your computer and use it in GitHub Desktop.
Pipeline analysis of radix-2^25.5 interleaved carry ripple modulo 2^255-19
Iterations: 100
Instructions: 4200
Total Cycles: 1406
Dispatch Width: 4
IPC: 2.99
Block RThroughput: 13.0
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[1] [2] [3] [4] [5] [6] Instructions:
1 1 1.00 vpsrlq $26, %ymm0, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm1, %ymm1
1 7 0.50 * vmovdqa (%rip), %ymm13
1 1 0.33 vpand %ymm13, %ymm0, %ymm0
1 1 1.00 vpsrlq $25, %ymm5, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm6, %ymm6
1 7 0.50 * vmovdqa (%rip), %ymm12
1 1 0.33 vpand %ymm12, %ymm5, %ymm5
1 1 1.00 vpsrlq $25, %ymm1, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm2, %ymm2
1 1 0.33 vpand %ymm12, %ymm1, %ymm1
1 1 1.00 vpsrlq $26, %ymm6, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm7, %ymm7
1 1 0.33 vpand %ymm13, %ymm6, %ymm6
1 1 1.00 vpsrlq $26, %ymm2, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm3, %ymm3
1 1 0.33 vpand %ymm13, %ymm2, %ymm2
1 1 1.00 vpsrlq $25, %ymm7, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm8, %ymm8
1 1 0.33 vpand %ymm12, %ymm7, %ymm7
1 1 1.00 vpsrlq $25, %ymm3, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm4, %ymm4
1 1 0.33 vpand %ymm12, %ymm3, %ymm3
1 1 1.00 vpsrlq $26, %ymm8, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm9, %ymm9
1 1 0.33 vpand %ymm13, %ymm8, %ymm8
1 1 1.00 vpsrlq $26, %ymm4, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm5, %ymm5
1 1 0.33 vpand %ymm13, %ymm4, %ymm4
1 1 1.00 vpsrlq $25, %ymm9, %ymm15
1 1 1.00 vpsllq $4, %ymm15, %ymm14
1 1 0.50 vpaddq %ymm14, %ymm0, %ymm0
1 1 0.50 vpaddq %ymm15, %ymm15, %ymm14
1 1 0.50 vpaddq %ymm15, %ymm14, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm0, %ymm0
1 1 0.33 vpand %ymm12, %ymm9, %ymm9
1 1 1.00 vpsrlq $25, %ymm5, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm6, %ymm6
1 1 0.33 vpand %ymm12, %ymm5, %ymm5
1 1 1.00 vpsrlq $26, %ymm0, %ymm15
1 1 0.50 vpaddq %ymm15, %ymm1, %ymm1
1 1 0.33 vpand %ymm13, %ymm0, %ymm0
Dynamic Dispatch Stall Cycles:
RAT - Register unavailable: 0
RCU - Retire tokens unavailable: 0
SCHEDQ - Scheduler full: 1145
LQ - Load queue full: 0
SQ - Store queue full: 0
GROUP - Static restrictions on the dispatch group: 0
Dispatch Logic - number of cycles where we saw N instructions dispatched:
[# dispatched], [# cycles]
0, 22 (1.6%)
2, 190 (13.5%)
3, 956 (68.0%)
4, 238 (16.9%)
Schedulers - number of cycles where we saw N instructions issued:
[# issued], [# cycles]
0, 3 (0.2%)
1, 1 (0.1%)
2, 204 (14.5%)
3, 1001 (71.2%)
4, 197 (14.0%)
Scheduler's queue usage:
SBPortAny, 54/54
Retire Control Unit - number of cycles where we saw N instructions retired:
[# retired], [# cycles]
0, 7 (0.5%)
1, 303 (21.6%)
2, 2 (0.1%)
3, 887 (63.1%)
4, 6 (0.4%)
5, 103 (7.3%)
7, 97 (6.9%)
14, 1 (0.1%)
Register File statistics:
Total number of mappings created: 4200
Max number of mappings used: 67
Resources:
[0] - SBDivider
[1] - SBFPDivider
[2] - SBPort0
[3] - SBPort1
[4] - SBPort4
[5] - SBPort5
[6.0] - SBPort23
[6.1] - SBPort23
Resource pressure per iteration:
[0] [1] [2] [3] [4] [5] [6.0] [6.1]
- - 14.01 12.99 - 13.00 - 2.00
Resource pressure by instruction:
[0] [1] [2] [3] [4] [5] [6.0] [6.1] Instructions:
- - 1.00 - - - - - vpsrlq $26, %ymm0, %ymm15
- - - 0.96 - 0.04 - - vpaddq %ymm15, %ymm1, %ymm1
- - - - - - - 1.00 vmovdqa (%rip), %ymm13
- - - 0.03 - 0.97 - - vpand %ymm13, %ymm0, %ymm0
- - 1.00 - - - - - vpsrlq $25, %ymm5, %ymm15
- - - 0.03 - 0.97 - - vpaddq %ymm15, %ymm6, %ymm6
- - - - - - - 1.00 vmovdqa (%rip), %ymm12
- - 0.96 0.03 - 0.01 - - vpand %ymm12, %ymm5, %ymm5
- - 1.00 - - - - - vpsrlq $25, %ymm1, %ymm15
- - - 0.01 - 0.99 - - vpaddq %ymm15, %ymm2, %ymm2
- - 0.01 0.97 - 0.02 - - vpand %ymm12, %ymm1, %ymm1
- - 1.00 - - - - - vpsrlq $26, %ymm6, %ymm15
- - - 1.00 - - - - vpaddq %ymm15, %ymm7, %ymm7
- - - 0.02 - 0.98 - - vpand %ymm13, %ymm6, %ymm6
- - 1.00 - - - - - vpsrlq $26, %ymm2, %ymm15
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm3, %ymm3
- - 0.01 0.01 - 0.98 - - vpand %ymm13, %ymm2, %ymm2
- - 1.00 - - - - - vpsrlq $25, %ymm7, %ymm15
- - - 0.99 - 0.01 - - vpaddq %ymm15, %ymm8, %ymm8
- - 0.01 0.02 - 0.97 - - vpand %ymm12, %ymm7, %ymm7
- - 1.00 - - - - - vpsrlq $25, %ymm3, %ymm15
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm4, %ymm4
- - - 0.02 - 0.98 - - vpand %ymm12, %ymm3, %ymm3
- - 1.00 - - - - - vpsrlq $26, %ymm8, %ymm15
- - - 0.98 - 0.02 - - vpaddq %ymm15, %ymm9, %ymm9
- - - 0.03 - 0.97 - - vpand %ymm13, %ymm8, %ymm8
- - 1.00 - - - - - vpsrlq $26, %ymm4, %ymm15
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm5, %ymm5
- - - 0.02 - 0.98 - - vpand %ymm13, %ymm4, %ymm4
- - 1.00 - - - - - vpsrlq $25, %ymm9, %ymm15
- - 1.00 - - - - - vpsllq $4, %ymm15, %ymm14
- - - 0.97 - 0.03 - - vpaddq %ymm14, %ymm0, %ymm0
- - - 0.98 - 0.02 - - vpaddq %ymm15, %ymm15, %ymm14
- - - 0.03 - 0.97 - - vpaddq %ymm15, %ymm14, %ymm15
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm0, %ymm0
- - - 0.03 - 0.97 - - vpand %ymm12, %ymm9, %ymm9
- - 1.00 - - - - - vpsrlq $25, %ymm5, %ymm15
- - - 0.03 - 0.97 - - vpaddq %ymm15, %ymm6, %ymm6
- - 0.02 0.01 - 0.97 - - vpand %ymm12, %ymm5, %ymm5
- - 1.00 - - - - - vpsrlq $26, %ymm0, %ymm15
- - - 0.97 - 0.03 - - vpaddq %ymm15, %ymm1, %ymm1
- - - 0.97 - 0.03 - - vpand %ymm13, %ymm0, %ymm0
Timeline view:
0123456789
Index 0123456789
[0,0] DeER . . . . vpsrlq $26, %ymm0, %ymm15
[0,1] D=eER. . . . vpaddq %ymm15, %ymm1, %ymm1
[0,2] DeeeeeeeER. . . vmovdqa (%rip), %ymm13
[0,3] D=======eER . . vpand %ymm13, %ymm0, %ymm0
[0,4] .DeE------R . . vpsrlq $25, %ymm5, %ymm15
[0,5] .D=eE-----R . . vpaddq %ymm15, %ymm6, %ymm6
[0,6] .DeeeeeeeER . . vmovdqa (%rip), %ymm12
[0,7] .D=======eER . . vpand %ymm12, %ymm5, %ymm5
[0,8] . DeE------R . . vpsrlq $25, %ymm1, %ymm15
[0,9] . D=eE-----R . . vpaddq %ymm15, %ymm2, %ymm2
[0,10] . D======eER . . vpand %ymm12, %ymm1, %ymm1
[0,11] . D=eE-----R . . vpsrlq $26, %ymm6, %ymm15
[0,12] . D=eE----R . . vpaddq %ymm15, %ymm7, %ymm7
[0,13] . D====eE-R . . vpand %ymm13, %ymm6, %ymm6
[0,14] . D=eE----R . . vpsrlq $26, %ymm2, %ymm15
[0,15] . D==eE---R . . vpaddq %ymm15, %ymm3, %ymm3
[0,16] . D===eE-R . . vpand %ymm13, %ymm2, %ymm2
[0,17] . D=eE---R . . vpsrlq $25, %ymm7, %ymm15
[0,18] . D==eE--R . . vpaddq %ymm15, %ymm8, %ymm8
[0,19] . D====eER . . vpand %ymm12, %ymm7, %ymm7
[0,20] . D=eE--R . . vpsrlq $25, %ymm3, %ymm15
[0,21] . D====eER . . vpaddq %ymm15, %ymm4, %ymm4
[0,22] . D====eER . . vpand %ymm12, %ymm3, %ymm3
[0,23] . D====eER . . vpsrlq $26, %ymm8, %ymm15
[0,24] . .D====eER . . vpaddq %ymm15, %ymm9, %ymm9
[0,25] . .D====eER . . vpand %ymm13, %ymm8, %ymm8
[0,26] . .D====eER . . vpsrlq $26, %ymm4, %ymm15
[0,27] . .D=====eER. . vpaddq %ymm15, %ymm5, %ymm5
[0,28] . . D====eER. . vpand %ymm13, %ymm4, %ymm4
[0,29] . . D====eER. . vpsrlq $25, %ymm9, %ymm15
[0,30] . . D=====eER . vpsllq $4, %ymm15, %ymm14
[0,31] . . D======eER . vpaddq %ymm14, %ymm0, %ymm0
[0,32] . . D====eE-R . vpaddq %ymm15, %ymm15, %ymm14
[0,33] . . D=====eER . vpaddq %ymm15, %ymm14, %ymm15
[0,34] . . D======eER . vpaddq %ymm15, %ymm0, %ymm0
[0,35] . . D====eE--R . vpand %ymm12, %ymm9, %ymm9
[0,36] . . D====eE-R . vpsrlq $25, %ymm5, %ymm15
[0,37] . . D=====eER . vpaddq %ymm15, %ymm6, %ymm6
[0,38] . . D=====eER . vpand %ymm12, %ymm5, %ymm5
[0,39] . . D======eER. vpsrlq $26, %ymm0, %ymm15
[0,40] . . D======eER vpaddq %ymm15, %ymm1, %ymm1
[0,41] . . D=====eE-R vpand %ymm13, %ymm0, %ymm0
Average Wait times (based on the timeline view):
[0]: Executions
[1]: Average time spent waiting in a scheduler's queue
[2]: Average time spent waiting in a scheduler's queue while ready
[3]: Average time elapsed from WB until retire stage
[0] [1] [2] [3]
0. 1 1.0 1.0 0.0 vpsrlq $26, %ymm0, %ymm15
1. 1 2.0 0.0 0.0 vpaddq %ymm15, %ymm1, %ymm1
2. 1 1.0 1.0 0.0 vmovdqa (%rip), %ymm13
3. 1 8.0 0.0 0.0 vpand %ymm13, %ymm0, %ymm0
4. 1 1.0 1.0 6.0 vpsrlq $25, %ymm5, %ymm15
5. 1 2.0 0.0 5.0 vpaddq %ymm15, %ymm6, %ymm6
6. 1 1.0 1.0 0.0 vmovdqa (%rip), %ymm12
7. 1 8.0 0.0 0.0 vpand %ymm12, %ymm5, %ymm5
8. 1 1.0 0.0 6.0 vpsrlq $25, %ymm1, %ymm15
9. 1 2.0 0.0 5.0 vpaddq %ymm15, %ymm2, %ymm2
10. 1 7.0 0.0 0.0 vpand %ymm12, %ymm1, %ymm1
11. 1 2.0 0.0 5.0 vpsrlq $26, %ymm6, %ymm15
12. 1 2.0 0.0 4.0 vpaddq %ymm15, %ymm7, %ymm7
13. 1 5.0 0.0 1.0 vpand %ymm13, %ymm6, %ymm6
14. 1 2.0 0.0 4.0 vpsrlq $26, %ymm2, %ymm15
15. 1 3.0 0.0 3.0 vpaddq %ymm15, %ymm3, %ymm3
16. 1 4.0 0.0 1.0 vpand %ymm13, %ymm2, %ymm2
17. 1 2.0 0.0 3.0 vpsrlq $25, %ymm7, %ymm15
18. 1 3.0 0.0 2.0 vpaddq %ymm15, %ymm8, %ymm8
19. 1 5.0 0.0 0.0 vpand %ymm12, %ymm7, %ymm7
20. 1 2.0 0.0 2.0 vpsrlq $25, %ymm3, %ymm15
21. 1 5.0 2.0 0.0 vpaddq %ymm15, %ymm4, %ymm4
22. 1 5.0 1.0 0.0 vpand %ymm12, %ymm3, %ymm3
23. 1 5.0 2.0 0.0 vpsrlq $26, %ymm8, %ymm15
24. 1 5.0 0.0 0.0 vpaddq %ymm15, %ymm9, %ymm9
25. 1 5.0 3.0 0.0 vpand %ymm13, %ymm8, %ymm8
26. 1 5.0 0.0 0.0 vpsrlq $26, %ymm4, %ymm15
27. 1 6.0 0.0 0.0 vpaddq %ymm15, %ymm5, %ymm5
28. 1 5.0 1.0 0.0 vpand %ymm13, %ymm4, %ymm4
29. 1 5.0 0.0 0.0 vpsrlq $25, %ymm9, %ymm15
30. 1 6.0 0.0 0.0 vpsllq $4, %ymm15, %ymm14
31. 1 7.0 0.0 0.0 vpaddq %ymm14, %ymm0, %ymm0
32. 1 5.0 0.0 1.0 vpaddq %ymm15, %ymm15, %ymm14
33. 1 6.0 0.0 0.0 vpaddq %ymm15, %ymm14, %ymm15
34. 1 7.0 0.0 0.0 vpaddq %ymm15, %ymm0, %ymm0
35. 1 5.0 1.0 2.0 vpand %ymm12, %ymm9, %ymm9
36. 1 5.0 1.0 1.0 vpsrlq $25, %ymm5, %ymm15
37. 1 6.0 0.0 0.0 vpaddq %ymm15, %ymm6, %ymm6
38. 1 6.0 2.0 0.0 vpand %ymm12, %ymm5, %ymm5
39. 1 7.0 0.0 0.0 vpsrlq $26, %ymm0, %ymm15
40. 1 7.0 0.0 0.0 vpaddq %ymm15, %ymm1, %ymm1
41. 1 6.0 0.0 1.0 vpand %ymm13, %ymm0, %ymm0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment