Created
June 27, 2016 15:45
-
-
Save simias/a4c3fa560de72e67a5b0b2ce9cf0298c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Running from cache: | |
lw $t0, 0($t1) | |
nop | |
takes 8 cycles | |
lw $t0, 0($t1) | |
lw $t1, 4($t1) | |
nop | |
takes 14 | |
lw $t0, 0($t1) | |
lw $t0, 4($t1) | |
nop | |
takes 15 | |
lw $t0, 0($t1) | |
lw $t1, 4($t1) | |
lw $t2, 8($t1) | |
nop | |
takes 20 | |
lw $t0, 0($t1) | |
lw $t0, 4($t1) | |
lw $t0, 8($t1) | |
nop | |
takes 22 | |
So it costs you one more cycle if you target the same register for | |
some reason. Maybe to cancel the previous load? | |
lw $t0, 0($t1) | |
nop | |
nop | |
nop | |
nop | |
nopa | |
nop | |
takes 8 cycles | |
lw $t0, 0($t1) | |
nop | |
nop | |
nop | |
nop | |
nop | |
nop | |
nop | |
takes 9 cycles. | |
So it seems that the LW itself takes 2 cycles, then the 6 next cycles | |
can be parallelized with other instructions, at least as long as | |
they're not using this register. | |
lw $t0, 0($t1) | |
move $t4, $t0 | |
nop | |
nop | |
nop | |
nop | |
nop | |
takes 8 cycles (in delay slot, so doesn't stall the pipeline waiting for load) | |
lw $t0, 0($t1) | |
nop | |
move $t4, $t0 | |
nop | |
nop | |
nop | |
nop | |
takes 13 cycles | |
lw $t0, 0($t1) | |
nop | |
move $t0, $t4 | |
nop | |
nop | |
nop | |
nop | |
takes 13 cycles as well, so you get a stall even if you overwrite the | |
target register | |
lw $t0, 0($t1) | |
nop | |
lw $t2, 4($t1) | |
nop | |
takes 16 cycles | |
So any other memory load will stall until the previous one is complete | |
even if it doesn't target the same address or register. | |
lw $t0, 0($t1) | |
nop | |
sw $t3, 16($s1) | |
nop | |
takes 10 cycles | |
lw $t0, 0($t1) | |
nop | |
nop | |
nop | |
nop | |
nop | |
nop | |
sw $t3, 16($s1) | |
nop | |
takes 10 cycles | |
lw $t0, 0($t1) | |
nop | |
sw $t3, 16($s1) | |
nop | |
nop | |
takes 11 cycles | |
So even memory stores force a stall apparently. | |
t0 is special-cased: | |
lw $0, 0($t1) | |
nop | |
move $t4, $0 | |
nop | |
nop | |
nop | |
nop | |
takes 8 cycles | |
lw $0, 0($t1) | |
nop | |
lw $0, 4($t1) | |
nop | |
takes 16 cycles | |
lw $0, 0($t1) | |
nop | |
sw $0, 4($t1) | |
nop | |
nop | |
takes 11 cycles | |
So $0 is not considered for register dependencies however loads to $0 | |
stall like any other register. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment