Skip to content

Instantly share code, notes, and snippets.

Created May 7, 2013 17:18
Show Gist options
  • Save anonymous/5534375 to your computer and use it in GitHub Desktop.
Save anonymous/5534375 to your computer and use it in GitHub Desktop.
Instruction dispatch variants
Okay, here's the different op splitting/fusing strategies for different cores, as far as I've been able to discern them:
Pentium: Complex instructions are U-pipe only but execute directly, they don't get split.
Atom (Bonnel/Saltwell): Certain complex instructions don't get split.
Pentium Pro/2/3: All ops get split into, tracked as, and executed as uOps.
Pentium 4: This never happened.
Pentium M/Core:
All ops get split into uOps.
Post-split, the core can fuse two types of multi-uOp sequences into a larger fused op used for tracking:
- For stores, address generation + actual store uOps can get fused.
- Read-modify (but not read-modify-write) fusion, aka "load-op" fusion.
This is what Intel calls "micro-op fusion". The fused uOps are what's used in the scheduler/ROB etc.
The complex ops are split into uOps for the purposes of execution, but the "accounting" in the core
is all in terms of fused ops.
Core2 and later:
Like Core, plus "macro-op fusion": Certain arithmetic and branch instructions can get fused
into a single arithmetic-then-branch instruction.
Original Athlon (K7):
Ops get decoded into "macro-ops" not uOps. Macro-ops can contain references to >2 source registers and memory references.
Any instruction that generates more than one macro-op is microcoded.
These are then used for scheduling and dependency tracking. These complex ops are split into uOps right before
execution, just like in the Pentium M.
The difference to Pentium M is that AMD decodes directly into macro-ops while Intel first decodes to uOps then fuses.
Both Intel and AMD can thus treat the instruction "add eax, [mem]" as a single op for the purpose of scheduling, but
they arrive there in different ways.
Athlon64 (K8) and later:
Some instructions can now generate two macro-ops without going through the microcode path. Anything above 2 macro-ops
is still microcoded. The rest is fairly similar.
Atom (Silvermont):
From a decode/dispatch standpoint, this seems very similar to what the K7 did as far as I can tell.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment