- side-channel attacks have been around since 90's
- speculative execution can execute the program in a possible incorrect way, but state is rolled back before visible (atleast that's the idea)
- There has been micro-architectural vuln before: cache timings, rowhammer, branch target buffers
- Trick the cpu into exec instr that should not have been executed
- Attacks both using native code, javascript and eBPF
- An attack
- Locate or introduce instr which act as covert channel
- Trick CPU into exec those instr
- Retrieve info over covert channel
- Cache-based covert channels
- Flush+Reload
- Evict+Reload
- Affects Amd Ryzen, ARM mobile phones, recent gen of Intel
- Direct calls and jumps. static or monotonic
- Indirect calls and jumps. Monotonic or Varying
- Conditional branches
- Branch Target Buffer (BTB)
- Return Stack Buffer (RSB)
- Not shared between cores
- Attacker mistrains the branch predictor, to cause CPU to exec instr that should not have been exec
- Example (x contains attacker-controlled data)
if (x < array1_size)
y = array2[array1[x] * 4096]
- Attacker can bypass if stmt
- invoke with valid inputs
- during exploit phase: x value outside bounds of array1
- dependent load to different cache lines
- 4096 is large enough to avoid prefetching effects
- ROP gadget from victim's address space
- Influence the victim to execute gadget
- Does not rely on vul in victim code
- Train the Branch Target Buffer (BTB) to mispredict an indirect branch
- Results in speculative execution of gadget
- Gadget execution leaks state into cache
- Training is done from attackers address space, indirect branches to gadget in victim address space
- Doesn't matter what's in attackers address space
- Find ASLR location by analyzing branch history buffer and branch target buffer leaks
- Find L3 set associativity
- Find physical memory map location info using spectre gadget
- The above init takes 10-30min
- Then it leaks hypervisor memory from attacker-choosen addresses
- Uses Variant 2
- Why do the C example in the appendix use 512 as offsets, while the paper says 4096. Do these lines in fact calculate 4096 offsets?
addr = &array2[mix_i * 512];
But we flush with 512 as offset in the appendix:
for (i = 0; i < 256; i++)
_mm_clflush(&array2[i * 512]); /* clflush */
- Is the 4096 offsets uses since the prefetcher can't cross page boundaries?
- What do static, monotonic and varying mean for the branch predictor?
- Do there exists processors that can track whether data was fetched due to speculative execution?
- Can variant 1 have an attacker in a separate process?