-
-
Save Sonicadvance1/8c55565b2dbbbaef79bde800e40835d9 to your computer and use it in GitHub Desktop.
AArch64 VA problems
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Syscalls that allocate memory: | |
- mmap, mmap2(Doesn't exist on ARM), mremap, shmat, ioctl | |
What is FEX-Emu? | |
FEX is a AArch64 ONLY userspace emulator of 32-bit x86 and x86-64. | |
32-bit x86 runs inside of an AArch64 container, which future proofs FEX for when ARM CPUs lose support for AArch32. | |
Adds additional problems for VA on top of the x86-64 specific VA problems. | |
Host versus Guest? | |
- Host is everything inside of FEX code | |
- Guest is the application being emulated | |
Thunks cause pain: | |
- What is a thunk? | |
- A bridge library between the x86/x86-64 guest library and a true AArch64 host library. | |
FEX+64Bit: | |
- Common problems: | |
- Guest can not allocate memory in the 48-bit VA space | |
- Current workarounds: | |
- Allocate 128TB of VA space on application startup in the 48-bit range | |
- Takes 5-20ms, benchmarked on Apple M1. Cortex is slower. | |
- Only on >= 48-bit VA. Anything setup with smaller VA is spared this horror. | |
- Thunks Off: | |
- FEX controls all guest syscalls | |
- All *guest* memory allocation syscalls must return data in the VA range below 47-bit to match x86-64 | |
- All *host* memory allocations are unrestricted and can be allowed to go in to the 48-bit range | |
- Problem examples: | |
- Guest application loads shared library with `mmap(nullptr, <size>, <prot>, <flags>, <fd>, <some offset>)` | |
- This needs to return in the lower 47-bit | |
- Guest application does an ioctl syscall, which calls IOCTL_DRM, allocates buffer | |
- This needs to return in the lower 47-bit | |
- Guest application does mmap with MAP_32BIT flag | |
- This doesn't exist on ARM | |
- Use mmap_range to restrict the range INSIDE of the prctl range to match 32-bit x86 range | |
- Range is [0x4000'0000, 0x8000'0000) | |
- FEX internal allocator calls mmap to allocate some memory | |
- This can return in the entire unrestricted 48-bit VA range. | |
- Possible solutions | |
- typedef struct va_limit { uint64_t lower_bound, uint64_t upper_bound }; | |
- Lower bound provided since other emulators can reuse this as a base_offset limit | |
- prctl(PR_SET_VA_LIMITS, const struct va_limit *limit); | |
- Sets the VA limits, clamping to the range of configured VA (TASK_SIZE_64) so that mmap won't return bad values | |
- Fixes mmap, mmap2, mremap, shmat, ioctl memory allocations to ensure they fit inside the range. | |
- Does /NOT/ fix FEX wanting to freely allocate | |
- See following *_range syscalls | |
- prctl(PR_GET_VA_LIMITS, struct va_limit *limit); | |
- Gets the current set VA limits. Introspection as to what the current VA limit is and ensuring restriction was set. | |
- mmap_range(uint64_t begin_range, uint64_t end_range, size_t size, int prot, int flags, int fd, off_t offset); | |
- mremap_range(void *old_address, size_t old_size, size_t new_size, int flags, uint64_t begin_range, uint64_t end_range); | |
- Useful for MREMAP_MAYMOVE | |
- shmat_range(int shmid, uint64_t begin_range, uint64_t end_range, int shmflg); | |
- Else restrict range to range provided | |
- ioctl_range - *Nope* - use prctl to limit its allocation range. | |
- For each of the syscalls that have a begin_range and end_range | |
- if begin_range < end_range | |
- Allowed allocation region must fit fully within [begin_range, end_range) exclusive | |
- if begin_range == end_range | |
- behave like their non-ranged versions | |
- if begin_range > end_range | |
- This should cause the range to wrap around | |
- This allows the SET_VA_LIMITS prctl to place the limit at an `lower_bound` offset greather than 0 (or 0x1'0000 since | |
first 16kb is preotected). This means that you can allocate around the hole of memory still | |
- Thunks On: | |
- FEX no longer controls all syscalls. | |
- Syscalls inside of the emulated space are still captured. | |
- Syscalls from a thunk library (like libGL) are uncaptured | |
- All *guest AND thunk* memory allocation syscalls must return data in the VA range below 47-bit to match x86-64 | |
- FEX itself can still allocate in 48-bit range fine. | |
- Problem examples: | |
- AArch64 glibc loads shared library thunk with `mmap(nullptr, <size>, <prot>, <flags>, <fd>, <some offset>)` | |
- This needs to return in the lower 47-bit | |
- AArch64 thunk libraries need to be returned in same guest address space because of returning local pointers. | |
- AArch64 thunked library does an ioctl syscall, which calls IOCTL_DRM, allocates buffer | |
- This needs to return in the lower 47-bit | |
- FEX internal allocator calls mmap to allocate some memory | |
- This can return in the entire unrestricted 48-bit VA range. | |
- Possible solutions | |
- Same solutions as Thunks off | |
FEX+32Bit: | |
- Common problems: | |
- Guest can not allocate memory in the >4GB VA space | |
- Current workarounds: | |
- Allocate all VA space above 4GB. Up to 256TB (subtract 4GB) of VA space | |
- Takes 50-100 ms, benchmarked on Apple M1. Cortex is slower. | |
- Thunks Off: | |
- FEX controls all guest syscalls | |
- All *guest* memory allocation syscalls must return data in the VA range below 4GB to match 32-bit x86 | |
- All *host* memory allocations are unrestricted and can be allowed to go in to the 48-bit range | |
- Problem examples: | |
- Guest application loads shared library with `mmap(nullptr, <size>, <prot>, <flags>, <fd>, <some offset>)` | |
- This needs to return in the lower 4GB | |
- Guest application does an ioctl syscall, which calls IOCTL_DRM, allocates buffer | |
- This needs to return in the lower 4GB | |
- FEX internal allocator calls mmap to allocate some memory | |
- This can return in the entire unrestricted 48-bit VA range. | |
- Possible solutions: | |
Same solutions as the 64-bit side, but instead of restricting ranges to the lower 47-bits, restricting ranges to the lower 4GB. | |
- Thunks On: | |
- FEX no longer controls all syscalls. | |
- Syscalls inside of the emulated space are still captured. | |
- Syscalls from a thunk library (like libGL) are uncaptured | |
- All *guest AND thunk* memory allocation syscalls must return data in the VA range below 4GB to match 32-bit x86 | |
- FEX itself can still allocate in 48-bit range fine. | |
- Problem examples: | |
- AArch64 glibc loads shared library thunk with `mmap(nullptr, <size>, <prot>, <flags>, <fd>, <some offset>)` | |
- This needs to return in the lower 4GB | |
- AArch64 thunk libraries need to be returned in same guest address space because of returning local pointers. | |
- AArch64 thunked library does an ioctl syscall, which calls IOCTL_DRM, allocates buffer | |
- This needs to return in the lower 4GB | |
- FEX internal allocator calls mmap to allocate some memory | |
- This can return in the entire unrestricted 48-bit VA range. | |
- Possible solutions | |
- Same solutions as Thunks off | |
Possible pain points: | |
- A thunk library allocating memory might pick up on FEX's internal memory allocator. | |
- This can be fixed with time and symbol visibility fixes | |
- For now FEX might leak /some/ data in to guest VA range when thunks are enabled | |
- Thunks not enabled there is no leak |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment