- little endian vs big endian
- address space
- clock period is an electrical stimulus that activates the processor
- CPI of 1 (one cycle per instruction) with a loong clock period
- if an instruction takes 10ns, another takes 20ns, another takes 25ns, the clock period is 25ns
- lw usually determines what the clock cycle is (needs an IF, DE, ALU, MA, WB)
- single cycle has more hardware components (eg. adder and ALU) as components cannot be reused in the same cycle
- eg. 5ns register file r/w, 10ns ALU, 15ns instruction mem. read, 20ns data memory read, 40ns data memory write, 2ns adder (incr. PC), 0ns other
- PC can be incremented in parallel to instr. read so 2ns efficiency is irrelevant
- jalr - IF (15ns) + RF (5ns) + RF (5ns) = 25ns
- lw - IF (15ns) + RF (5ns) + ALU (10ns) + MEM_R (20ns) + RF (5ns) = 55ns
- sw - IF (15ns) + RF (5ns) + ALU (10ns) + MEM_W (40ns) = 70ns
- add/nand - IF (15ns) + RF (5ns) + ALU (10ns) + RF (5ns) = 35ns
- beq - IF (15ns) + RF (5ns) + ALU (10ns) = 30ns
- noop/halt - IF (15ns) = 15ns
- divides an instruction into five steps (IF, ID, EX, MEM, WB)
- noop/halt use IF/ID, beq/sw use IF/ID/EX/MEM, add/nand use IF/ID/EX/WB, lw uses IF/ID/EX/MEM/WB
- eg. 2000 instructions with 40% add/nand, 25% beq, 30% lw, 5% sw
- methods to fix: avoid (don't write hazards), detect and stall (insert two noops), detect and forward (grab value from a forward stage, eg. ALU or MEM)
- the address of an lw operation is calculated in the ALU, but the data is found in MEM so a noop needs to be inserted in case of a lw hazard
- needs to check future operations in case of hazard (prioritize latest instruction in case of multiple hazard)
- beq changes the PC after the following few instructions have been fetched from the old PC, so they need to be squashed
- detect and stall - insert noops after a beq
- speculate and squash - guess, and if you're wrong, squash the next few instructions
- 1-bit - go with what worked last time (store taken or not taken using the 1-bit for each address)
- F13 Exam 2 question (Cache Simulation)
- Go over compulsory/capacity/conflict misses by simulating various types of caches
- Do it sequentially, instead of all at once
- eg. two programs accessing the same memory address can result in undefined behavior