Skip to content

Instantly share code, notes, and snippets.

@jlevon
Last active January 22, 2018 19:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jlevon/6b5c6e9f19345d1617debd12f928afb4 to your computer and use it in GitHub Desktop.
Save jlevon/6b5c6e9f19345d1617debd12f928afb4 to your computer and use it in GitHub Desktop.
PCID implementation
We'll have two main PCIDs: 0 (kernel) and 1 (user).
Brief summary of PCID
---------------------
PCID is enabled by %cr4.PCIDE. It's available in at least Sandy Bridge, but INVPCID came later: Haswell I think.
PCID lives is in the MMU_PAGEMASK bits of %cr3. A zero PCID is also used when %cr4.PCIDE is 0.
With a non-zero PCID in %cr3, TLB entries are tagged with the PCID, and only TLB entries matching the current PCID are used.
(Global TLB entries are apparently not tagged, so are used regardless.)
mov to %cr3 is modified when %cr4.PCIDE is 1. If the top bit is 0, the PCID of the *source* operand is invalidated - NOT the current %cr3 PCID. Other PCIDs are not invalidated. Thus, a mov to %cr3 will NOT invalidate any "current" mappings.
If the top bit is 1, nothing is invalidated.
INVPCID has 4 forms, similar to INVVPID. You can shootdown an individual PCID mapping, a whole PCID, all-PCIDs-except-global, and everything (like a - supposedly faster - twiddle of %cr4.PGE).
How we're planning to use it
----------------------------
With KPTI, we can no longer use PT_GLOBAL to keep our kernel TLB entries around, since when we return to userspace, we cannot allow it to speculate across KERNELBASE. In particular, on user->kernel and kernel->user, we have to switch %cr3, and hence dump all the TLB.
We want to use PCID to mitigate this somewhat: namely, that a switch into - and out of - the kernel will at least keep our userspace mappings in the TLB. (Presuming we haven't context switched to another process of course).
To do that we'll have PCID1 for the kernel, and PCID2 for userspace, ensuring that we don't flush PCID2 unless it's necessary.
FIXME: tlb shootdown/inval
FIXME: hat_switch
FIXME: lack of INVVPCID
- pcide changes PAT behavoiur; problem? FIXME
cr4 audit
---------
pat_sync() cr4.pge twiddle to flush all
flush_all_tlb_entries() cr4.pge twiddle
mmu_tlbflush_entry audit
-------------------------
flush a single page via invlpg
kboot_mmu - presume this is all pre-PCIDE. Should validate.
i86pc/vm/hat_kdi.c: used prior to kernel (bootload kmdb). Therefore
can't presume PCID. Always a kernel-only mapping, but maybe setup prior
to PCID (hat_kdi_init). Always flushed after use though, so this seems
OK.
x86pte_mapin - only if !kpm. ASSERT()?
x86pte_set - local flush. if > KERNELBASE flush kpcid, else upcid
INVPCID
x86pte_copy - only !kpm
hati_demap_func - shootdown handler. Can be a range. In this case we
need to make sure to flush depending on kernelbase/userlimit
INVPCID
If we're flushing everything (will need to do pge.cr4 twiddle /
invpcid).
hat_flush_range - panic dump thing. only ever done on kas. But does this
imply we should always have kpcid == 0? Seems like it would make invals
easier, we never get confused between safe cr3 kpcid and ours.
hat_mempte_release/remap - for ppcopy(). Only flushed at cpu unconfigure time.
Strictly kas. !kpm only.
flush_all_tlb_entries audit
---------------------------
TLB_INVAL_ALL on tlb_service - way out of idle
Fallback for hat_flush_range.
cr3
---
fb_swtch_src - OK, since we twiddle pge first
kdi_idthdl.s: loads safe cr3. If we don't change, this'll keep user TLB
entries around (but flush kernel since that's our source operand pcid).
Presumably that's OK. Normal trap path will go through usual exit
routine.
kdi_master_entry - switches over to kernel cr3. Same as before.
set_pteval - does reload_cr3(), but maybe only 32-bit? Also, not used
for !xpv, except via hat_kern_alloc(); maybe early so OK.
hat_kern_setup - sets tss_cr3 to whatever getcr3() is at this point,
presumably early kas cr3. Again, fine if using PCID0 for kernel.
FIXME: not set for !boot CPUs?
tss_cr3 - presumably not relevant
kpti_safe_cr3 - should be 0 PCID.
kdi_flush_caches, a reload of cr3 used by KDI slaves.
kmdb_dpi_flush_slave_caches. Used kmdb phys read/write. The meaning of
this is obscure. It's called *prior* to the read/write. But why?
i86pc/os/mp_pc.c: use of MAKECR3 here should be replaced with something
cleaner. Again would require PCID0.
kpti in, non-paranoid, from userspace: we need to do a non-flushing mov cr3 to pick up the
kernel cr3, leave userspace mappings in place.
kpti in, non-paranoid, from kernelspace: non-flushing mov cr3
kpti in, paranoid: should be same as above cases
kpti out to kernelspace: we don't modify cr3 here
kpti out to userspace: do a normal mov cr3: this should flush the (current)
kernelspace mappings iff PCID0. Perhaps assert PCID0 to verify??
hat_switch():
from user to kas
FIXME: who deleted the VLP entries in this case? Does it matter?
Are we lazy here?
- we moved off the user cr3 already, but kept the mappings
around. We can do a non-flushing mov cr3 (it's HAT_CR3-kernel to
kas cr3 reload). Is this bad that we'd keep old userspace
mappings? Probably? Should we invpcid(user)?
It seems like, since we remove ourselves from ->hat_cpus, that
we must be eager here, otherwise hati_demap_func won't know to flush us.
from user to (different) user
hat_vlp_update will trigger.
cr3 is still kernel though. Need explicit invpcid(user).
from kas to user
hat_vlp_update will trigger. Do we need an invalidate for moral
equivalent of link_ptp()? Possibly; check intel manual.
If we are lazy above (user->kas) then we must invpcid(user)
first.
link_ptp(): will tlb shootdown if HAT_VLP. Should be sufficient for us
given above (i.e. hati_demap_func/hat_update_vlp). Really, seems like
this is mostly about copying over the new VLP entries, although it's
possible a replacement of pte may need a flush?
unlink_ptp(): if VLP, will DEMAP_ALL_ADDR.
hat_tlb_inval_range - HAT_SHARED behaviour??
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment