I used to work on Rock. Getting around to re-read some papers on it.
SST hardware dynamically extracts two threads of execution from a single sequential program. SST uses an "efficient" checkpointing mechanism to eliminate the need for renaming logic, reorder buffer, memory disambiguation, issue windows, etc.
SST uses a traditional multithreaded pipeline with an additional mechanism to checkpoint the register file.
SST implements two hardware thread (ahead and behind). Ahead thread speculatively executes under a cache miss and speculatively retires instructions out of order. A behind thread executes instructions dependent on the cache miss.
There are N checkpoints (2) per core. There are N deferred queues (DQ) that hold decoded instructions and available operand value that could not be executed due to a cache miss or other long latency instructions.
Also, there are N speculative register files and a second working register file. Each of these registers has a NA bit which is set if the value is "Not Available".
Every instruction issued is classified as being either "deferrable" or "retireable". An instruction is deferrable if it is a long latency instruction (cache miss, etc.) or if at least one of its operands is NA.
The core starts execution in a nonspeculative phase. All instructions retire in order and ARF and WRF are used. The DQ and speculate register files are not used.
When the first deferrable instruction is encountered, the core takes a checkpoint (committed checkpoint) and starts a speculative phase.
The deferrable instruction is placed in a DQ and their destination register are marked as NA.
Subsequent retirable instructions are executed and speculatively retired.
The retireable instructions write their results to a WRF and speculative register file. They clear the NA bits for the destination registers.
[..]