These are just some notes on my current understanding of the subtleties of the AGX memory model and the TLB/caching issues I'm seeing.
The AGX MMU has 64 context slots (63 usable for user contexts). The page table base addresses are stored in a page (Apple calls it "TTBAT") in main memory. There are user and kernel halves (low and high), but we can ignore the high half from the GPU perspective since it is only used in very specific cases. The permission bits are funny and the same page tables are shared (and interpreted differently) by the ASC, but they are basically ARM page tables. Each context slot has an ASID (how many bits?), like a TTBR in ARM.