Skip to content

Instantly share code, notes, and snippets.

@mbitsnbites
Last active September 14, 2018 14:36
Show Gist options
  • Save mbitsnbites/1f2e20cab3c46d7f5bff91c33b306417 to your computer and use it in GitHub Desktop.
Save mbitsnbites/1f2e20cab3c46d7f5bff91c33b306417 to your computer and use it in GitHub Desktop.

Idea

Minimize logic in each pipeline stage to minimize design complexity and maxmize clock speed (roughly the same as for MIPS, but more extreme).

  • No speculative branches.
    • Use branch delay slots.
  • No operand forwarding.
    • All instructions have the same latency.
    • I.e. every instruction has trailing "delay slots".
  • No data hazard resolution.
    • Exception: Cache misses (if applicable) cause the entire pipeline to stall.

Possible optimization to reduce the number of delay slots: Partition the execution part of the pipeline into several pipelines where each execute pipeline has its own register file (e.g. integer + float + fixed point). Would require a straight forward way to transfer data between register files.

Pipeline

       Branch                Write back
   _______________    _______________________
  /               \  /                       \
 v                 \v                         \
 PC -> IF -> ID -> RF -> EX1 -> ... -> EXn -> WB
       ^                         ^
       |                         v
    ICache                     DCache

Branch - 2 delay slots:

  • BN branch if register is negative, PC+immediate
  • BP branch if register is positive, PC+immediate
  • BA branch always, PC+immediate
  • J jump always, register address

Compare - set all bits of register to 1/0 if true/false:

  • SEQ, SNE, SLT, SLTU, etc.

Conditionals:

  • Conditional write-back is simple to implement.
  • E.g. "discard result of next instruction if not true".

Registers:

  • 16 GP registers (per pipeline, e.g. integer + float?).
  • Possibly only SIMD registers?
  • Size? 32/64/more bits?

Instruction encoding

Use fixed size 32 bit instruction words.

Pros:

  • Makes it easier to keep a constant stream of instructions (one per clock).

Cons:

  • Loading 32-bit or 64-bit immediate values is cumbersome without proper operand forwarding.

Suggestion:

 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+-----------+---+-------+-------+-------+-----------------------+
|Op         |1 1| Rd    | Ra    | Rb    | ? (shift/mask/func?)  | <- ALU
+-----------+---+-------+-------+-------+-----------------------+

+-----------+---+-------+-------+-------------------------------+
|Op         |1 0| Rd    | Ra    | Imm16                         | <- Load+ALU
+-----------+---+-------+-------+-------------------------------+

+-----------+---+-------+-------+-------+-----------------------+
|Op         |0 1| Imm4  | Ra    | Rb    | Imm12                 | <- Store
+-----------+---+-------+-------+-------+-----------------------+

+-----------+---+-------+-------+-------------------------------+
|Op         |0 0| Imm4  | Ra    | Imm16                         | <- Branch
+-----------+---+-------+-------+-------------------------------+

Also consider VLIW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment