Link to tech report
Alpha 21264 has 15 FO4 delays. (FO4 delay is the delay of inverter, driven by an inverter 4x smaller than itself, and driving an interter 4x bigger than itself). BOOMv2 is 35 FO4.
BOOMv1 follows the 6-stage pipeline structure of MIPS R10K - fetch, decode/rename, issue/register-read, excute, memory, and writeback.
Frontend fetches instructions for execution in the backend.
Requires branch prediction techniques. BTB maintains set of table mapping from instruction addresses (PCs) to branch targets. Look up address checks BTB for tag hit. If hit, BTB makes prediction for frontend. Hysteris bits (T/NT) help guide the decision.
Return address stack (RAS) predicts function returns. Jump register instructions are difficiult to predict because the target depends on a register value. Underlying structure of RAS is a stack.
Conditional branch predictor (CBP) maintains set of prediction and hysteris tables becaused on look-up address. Only makes taken/not-taken predictions. Sometimes known as global history predictor.
Issue window holds all inflight and un-executed uops. Each ports selects from one of the available ready upons to be issued. The larger the window, the more instructions the scheduler can attempt to re-order.
Register file was manually crafted using a register bit out of the foundry-provided standard cells and let the placer automatically route the wires.
2bc-table.scala
:
class TwobcCounterTable
Provides two-bit counter table for use by branch predictors, only updated during commit. Table contains P and H hits. Implemented as saturating counter. Strongly not-taken, weakly not-taken, weakly taken, strongly taken. Different implementation of the tables are provided: single or dual-port memory.
base-only.scala
:
class BaseOnlyBrPredictor extends BrPredictor
Implements the base class. Does nothing except exercises the base predictor (BIM).
bim.scala
:
class BimodalTable extends BoomModule
BIM is a table of 2 bit counters. Control logic in three pipeline stages:
- S0: receive address to predict on
- S1: perform lookup
- S2: return the read data
Implemented as nBanks
. Each bank has update queue and write queue.
bpd-pipeline.scala
Accesses BTB and BPD to feed predictions to the Fetch Unit.
- F0: select the next PC
- F1: access I$ and BTB RAMs. Perform BPD hashing.
- F2: access BPD RAMs. Begin decoding instruction bits.
class BranchPredictionStage extends BoomModule
Contains BTB (BoomBTB <-- abstract class) and BPD (BrPredictor <-- abstract class). RAS appears to be disabled.
brpredictor.scala
Abstract class for branch predictors. F0-F4
btb-sa.scala
Parameterizable set-associated branch target buffer.
class BTBsa extends BoomBTB
Contains nWays
. Each way has valid registers, tag mem, and data mem. Can be configured to include RAS module. Implementation appears straight forward.
btb.scala
Abstract base class for BTB. Includes implementation of RAS (return address stack).
abstract class BoomBTB extends BoomModule
dense-btb.scala
Dense BTB with RAS and BIM predictor.
class DenseBTB extends BoomBTB
gshare.scala
Implementation of a gshare branch predictor.
class GShareBrPredictor extends BrPredictor
tage-table.scala
TAGE table? used by TAGE branch predictor.
https://comparch.net/2013/06/30/why-tage-is-the-best/
tage.scala
Implementation of TAGE-based branch predictor/
class TageBrPredictor extends BrPredictor
configs.scala
Contains diplomacy configurations.
consts.scala
List of constants.
microop.scala
Not sure where these uops are used. More reading will help.
class MicroOp extends BoomBundle
class CtrlSignals extends BoomBundle
parameters.scala
Main configuration object.
tile.scala
Interface between Boom and RocketChip. Instantiates core, HellaCache, ICacheFrontEnd, etc.
core.scala
Conceptual pipeline stages:
if0 - next-pc select
if1 - I$ access
if2 - instruction return
if3 - enqueue to fetch buffer
if4 - redirect from BPD
dec - decode
ren - rename
dis - dispatch
iss - issue
rrd - register read
exe - execute
mem - memory
sxt - sign extend
wb - writeback
com - commit
class BoomCore extends BoomModule
List of instantiated modules:
- FpPipeline
- DecodeUnit (# of decodeWidth)
- BranchMaskGeneration Logic
- RenameStage
- new boom.exu.IssueUnits (implies there are multiple, but not expanded here?)
- RegisterFile (incl. behavioral version!)
- Arbiter(new ExeUnitResp) - not sure what for?
- RegisterRead (believe this is its pipeline stage)
- DCacheShim (what does Shim mean here?)
- LoadStoreUnit
- Rob
- CsrFile
Contains hardware performance events
decode.scala
ChrisC. explains on riscv-hw mailing list why the implementation looks as such.
class DecodeUnit extends BoomModule
execute.scala
Execution units. The issue window schedules uops onto a specific execution pipeline.
abstract class ExecutionUnit
class ALUExeUnit extends ExecutionUnit
, FPUExeUnit, MemExeUnit
fudecode.scala
Generates the functional unit control signals from the micro-op opcodes.
functional_unit.scala
abstract class FunctionalUnit
ALUUnit extends PipelinedFunctionalUnit
MemAddrCalcUnit extends PipelinedFunctionalUnit
FPUUnit extends PipelinedFunctionalUnit
PipelinedMulUnit extends PipelinedFunctionalUnit
--> uses class IMul (below)
imul.scala
class IMul extends Module
For iterative or unpipelined units. These only only a single uop at a time.
abstract class IterativeFunctionalUnit
issue.scala
abstract class IssueUnit
Contains # of issue slots table. Module IssueSlot,
issue_ageordered.scala
class IssueUnitCollapsing extends IssueUnit
Dispatch entry logic -> find a slot to store new dispatched uops into.
Issue select logic
issue_slot.scala
class IssueSlot extends BoomModule
issue_unordered.scala
class IssueUnitStatic extends IssueUnit
Contains issue table of # issue slots.
regfile_custom.scala
class Rf6r3wBitModel
<- not synthesizable
regfile.scala
abstract class RegisterFile
class RegisterFileBehavioral
Highly configurable.
registerread.scala
Handles register read and bypass network for OoO backend. Interafaces with the issue window on the enqueue side and the execution pipelined on the dequeue side.
class RegisterRead extends BoomModule
rename-busytable.scala
class BusyTableHelper extends BoomModule
Implements a busy bit per register?
rename-freelist.scala
Which physical registers are free.
class RenameFreeListHelper extends BoomModule
class RenameFreeList extends BoomModule
List of unallocated physical? logical? registers
rename-maptable.scala
Map between logical and physical registers
rename.scala
Rename logic. Instantiate map table, free table, busy table for both GPR and FPR.
class RenameStage extends BoomModule
rob.scala
Reorder buffer. Each "dispatch" group gets its own row of the ROB and each instruction in the dispatch group goes to a different bank. Entries are added to the ROB when instruction is dispatched.
class Rob extends BoomModule
branchchecker.scala
Verify BTB prediction. Catch JAL..... Used by fetch unit.
class BranchChecker extends BoomModule
fetch.scala
- F0: next PC select
- F1: I$ access
- F2: I$ response/predecode
- F3: branch-check/verification/redirect
- F4: take redirect
class FetchControlUnit extends BoomModule
fetchbuffer.scala
Takes a FetchBuffer and converts into a vector of MicroOps.
lsu.scala
Contains load address queue, store address queue, and store data queue.
Stores are sent to memory at commit, loads are executed ASAP. If misspeculation discovered, pipeline cleared. Loads "put to sleep" are retried. Load can receive its data by forwarding data out of the store-data queue.
Loads sent ASAP to memory. In parallel associate search of SAQ. If hit on SAQ, memory request is killed on the next cycle.
If store data is not present put data "to sleep?" in LAQ?