Last active
January 22, 2018 19:44
-
-
Save jlevon/6b5c6e9f19345d1617debd12f928afb4 to your computer and use it in GitHub Desktop.
PCID implementation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We'll have two main PCIDs: 0 (kernel) and 1 (user). | |
Brief summary of PCID | |
--------------------- | |
PCID is enabled by %cr4.PCIDE. It's available in at least Sandy Bridge, but INVPCID came later: Haswell I think. | |
PCID lives is in the MMU_PAGEMASK bits of %cr3. A zero PCID is also used when %cr4.PCIDE is 0. | |
With a non-zero PCID in %cr3, TLB entries are tagged with the PCID, and only TLB entries matching the current PCID are used. | |
(Global TLB entries are apparently not tagged, so are used regardless.) | |
mov to %cr3 is modified when %cr4.PCIDE is 1. If the top bit is 0, the PCID of the *source* operand is invalidated - NOT the current %cr3 PCID. Other PCIDs are not invalidated. Thus, a mov to %cr3 will NOT invalidate any "current" mappings. | |
If the top bit is 1, nothing is invalidated. | |
INVPCID has 4 forms, similar to INVVPID. You can shootdown an individual PCID mapping, a whole PCID, all-PCIDs-except-global, and everything (like a - supposedly faster - twiddle of %cr4.PGE). | |
How we're planning to use it | |
---------------------------- | |
With KPTI, we can no longer use PT_GLOBAL to keep our kernel TLB entries around, since when we return to userspace, we cannot allow it to speculate across KERNELBASE. In particular, on user->kernel and kernel->user, we have to switch %cr3, and hence dump all the TLB. | |
We want to use PCID to mitigate this somewhat: namely, that a switch into - and out of - the kernel will at least keep our userspace mappings in the TLB. (Presuming we haven't context switched to another process of course). | |
To do that we'll have PCID1 for the kernel, and PCID2 for userspace, ensuring that we don't flush PCID2 unless it's necessary. | |
FIXME: tlb shootdown/inval | |
FIXME: hat_switch | |
FIXME: lack of INVVPCID | |
- pcide changes PAT behavoiur; problem? FIXME | |
cr4 audit | |
--------- | |
pat_sync() cr4.pge twiddle to flush all | |
flush_all_tlb_entries() cr4.pge twiddle | |
mmu_tlbflush_entry audit | |
------------------------- | |
flush a single page via invlpg | |
kboot_mmu - presume this is all pre-PCIDE. Should validate. | |
i86pc/vm/hat_kdi.c: used prior to kernel (bootload kmdb). Therefore | |
can't presume PCID. Always a kernel-only mapping, but maybe setup prior | |
to PCID (hat_kdi_init). Always flushed after use though, so this seems | |
OK. | |
x86pte_mapin - only if !kpm. ASSERT()? | |
x86pte_set - local flush. if > KERNELBASE flush kpcid, else upcid | |
INVPCID | |
x86pte_copy - only !kpm | |
hati_demap_func - shootdown handler. Can be a range. In this case we | |
need to make sure to flush depending on kernelbase/userlimit | |
INVPCID | |
If we're flushing everything (will need to do pge.cr4 twiddle / | |
invpcid). | |
hat_flush_range - panic dump thing. only ever done on kas. But does this | |
imply we should always have kpcid == 0? Seems like it would make invals | |
easier, we never get confused between safe cr3 kpcid and ours. | |
hat_mempte_release/remap - for ppcopy(). Only flushed at cpu unconfigure time. | |
Strictly kas. !kpm only. | |
flush_all_tlb_entries audit | |
--------------------------- | |
TLB_INVAL_ALL on tlb_service - way out of idle | |
Fallback for hat_flush_range. | |
cr3 | |
--- | |
fb_swtch_src - OK, since we twiddle pge first | |
kdi_idthdl.s: loads safe cr3. If we don't change, this'll keep user TLB | |
entries around (but flush kernel since that's our source operand pcid). | |
Presumably that's OK. Normal trap path will go through usual exit | |
routine. | |
kdi_master_entry - switches over to kernel cr3. Same as before. | |
set_pteval - does reload_cr3(), but maybe only 32-bit? Also, not used | |
for !xpv, except via hat_kern_alloc(); maybe early so OK. | |
hat_kern_setup - sets tss_cr3 to whatever getcr3() is at this point, | |
presumably early kas cr3. Again, fine if using PCID0 for kernel. | |
FIXME: not set for !boot CPUs? | |
tss_cr3 - presumably not relevant | |
kpti_safe_cr3 - should be 0 PCID. | |
kdi_flush_caches, a reload of cr3 used by KDI slaves. | |
kmdb_dpi_flush_slave_caches. Used kmdb phys read/write. The meaning of | |
this is obscure. It's called *prior* to the read/write. But why? | |
i86pc/os/mp_pc.c: use of MAKECR3 here should be replaced with something | |
cleaner. Again would require PCID0. | |
kpti in, non-paranoid, from userspace: we need to do a non-flushing mov cr3 to pick up the | |
kernel cr3, leave userspace mappings in place. | |
kpti in, non-paranoid, from kernelspace: non-flushing mov cr3 | |
kpti in, paranoid: should be same as above cases | |
kpti out to kernelspace: we don't modify cr3 here | |
kpti out to userspace: do a normal mov cr3: this should flush the (current) | |
kernelspace mappings iff PCID0. Perhaps assert PCID0 to verify?? | |
hat_switch(): | |
from user to kas | |
FIXME: who deleted the VLP entries in this case? Does it matter? | |
Are we lazy here? | |
- we moved off the user cr3 already, but kept the mappings | |
around. We can do a non-flushing mov cr3 (it's HAT_CR3-kernel to | |
kas cr3 reload). Is this bad that we'd keep old userspace | |
mappings? Probably? Should we invpcid(user)? | |
It seems like, since we remove ourselves from ->hat_cpus, that | |
we must be eager here, otherwise hati_demap_func won't know to flush us. | |
from user to (different) user | |
hat_vlp_update will trigger. | |
cr3 is still kernel though. Need explicit invpcid(user). | |
from kas to user | |
hat_vlp_update will trigger. Do we need an invalidate for moral | |
equivalent of link_ptp()? Possibly; check intel manual. | |
If we are lazy above (user->kas) then we must invpcid(user) | |
first. | |
link_ptp(): will tlb shootdown if HAT_VLP. Should be sufficient for us | |
given above (i.e. hati_demap_func/hat_update_vlp). Really, seems like | |
this is mostly about copying over the new VLP entries, although it's | |
possible a replacement of pte may need a flush? | |
unlink_ptp(): if VLP, will DEMAP_ALL_ADDR. | |
hat_tlb_inval_range - HAT_SHARED behaviour?? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment