A SIGBUS
is happening in HotSpot when running on Rosetta 2:
oracle/graal#4314 (comment)
It happens around a very specific kind of code
0x10c6a5833: 48 81 c3 48 01 00 00 addq $0x148, %rbx ; imm = 0x148
0x10c6a583a: b8 01 00 00 00 movl $0x1, %eax
0x10c6a583f: f0 lock # <== crash is happening here
0x10c6a5840: 0f c1 03 xaddl %eax, (%rbx)
Alas we do not have more information (e.g. what the value of %rbx
is in this case),
but it's fair to assume that it must be somewhat unaligned due to getting a SIGBUS
.
I want to understand what Rosetta does with this code, so I started to work on a minimal reproducer.
In c.c
we have the following inline assembly that reassembles in the mentioned crash above:
void please_fault(long *p) {
/* [...] */
__asm__ volatile (
"mov %1, %%rbx; "
"addq $0x148, %%rbx; "
"movl $0x1, %%eax; "
"lock; xaddl %%eax,(%%rbx); "
"mov %%rbx, %0"
: "=r" (p) /* output, %0 */
: "r" (p) /* input, %1 */
: "%rbx", "%rax" /* clobbered regs */
);
/* [...] */
}
Run it like this:
$ make
clang -O2 -arch x86_64 -g c.c -o ua_x86_64
clang -O2 -arch arm64 -g c.c -o ua_aarch64
clang -arch arm64 runner.c -o runner
$ ./ua_x86_64
trying 0x10b534f36 with i=0
trying 0x10b53cf37 with i=1
trying 0x10b544f38 with i=2
[...]
trying 0x12b528f34 with i=16382
trying 0x12b530f35 with i=16383
done
Well, it doesn't crash. But let's not give up yet, what is Rosetta actually doing with that?
It's not documented by Apple, but there is an amazing reverse engineering effort going on at https://ffri.github.io/ProjectChampollion/
Translated files by Rosetta 2 are SIP protected, so they can't be accessed without disabling it.
If you attach with lldb
to a x86_64
process, you will see the x86_64
code.
However, they discovered a neat trick:
If you start lldb
with a helper program in arm64
context, which then execve
into a x86_64
binary, then you'll end up in its arm64
translated code. Woot!
See the helper in runner.c
(it also sets an environment variable to help us setting a "breakpoint"):
$ arch -arch arm64 lldb -- ./runner ./ua_x86_64
(lldb) target create "./runner"
Current executable set to '/Users/lewurm/tmp/atomic-rosetta/runner' (arm64).
(lldb) settings set -- target.run-args "./ua_x86_64"
(lldb) run
Process 84572 launched: '/Users/lewurm/tmp/atomic-rosetta/runner' (arm64)
Process 84572 stopped
* thread #2, stop reason = exec
frame #0: 0x00007ff7fffbba2c runtime`_mh_execute_header + 14892
runtime`_mh_execute_header:
-> 0x7ff7fffbba2c <+14892>: mov x19, sp
0x7ff7fffbba30 <+14896>: and sp, x19, #0xfffffffffffffff0
0x7ff7fffbba34 <+14900>: mov x29, sp
0x7ff7fffbba38 <+14904>: ldr x20, [x19, #0x20]
Target 0: (runtime) stopped.
(lldb) # we are in the rosetta runtime now
(lldb) continue
Process 84572 resuming
trying 0x108676f36 with i=0
# the program is hanging now due to PLEASE_HANG=1 being set, press CTRL+C to get back into lldb
Process 84572 stopped
* thread #2, stop reason = signal SIGSTOP
frame #0: 0x0000000100011168
-> 0x100011168: ldur w22, [x5, #-0x2c]
0x10001116c: cmp w22, #0x0 ; =0x0
0x100011170: b.ne 0x100011168
0x100011174: add w12, w12, #0x1 ; =0x1
Target 0: (runtime) stopped.
(lldb) x/23i $pc-0x50
0x100011118: 0xf81f8c98 str x24, [x4, #-0x8]!
0x10001111c: 0x94000033 bl 0x1000111e8
0x100011120: 0xaa0d03e3 mov x3, x13
0x100011124: 0x91052063 add x3, x3, #0x148 ; =0x148
0x100011128: 0x52800020 mov w0, #0x1
0x10001112c: 0xb8e00077 ldaddal w0, w23, [x3] # <-- this is the "lock; xaddl" instruction!
0x100011130: 0x2a1703e0 mov w0, w23
0x100011134: 0xaa0303e1 mov x1, x3
0x100011138: 0xaa0f03e7 mov x7, x15
0x10001113c: 0xd0ffff98 adrp x24, -14
0x100011140: 0x913b8718 add x24, x24, #0xee1 ; =0xee1
0x100011144: 0x10000099 adr x25, #0x10
0x100011148: 0xa9bf66b8 stp x24, x25, [x21, #-0x10]!
0x10001114c: 0xf81f8c98 str x24, [x4, #-0x8]!
0x100011150: 0x9400003a bl 0x100011238
0x100011154: 0x52800001 mov w1, #0x0
0x100011158: 0xea00001f tst x0, x0
0x10001115c: 0x9a9f07f6 cset x22, ne
0x100011160: 0xb3401ec1 bfxil x1, x22, #0, #8
0x100011164: 0xb81d40a1 stur w1, [x5, #-0x2c]
-> 0x100011168: 0xb85d40b6 ldur w22, [x5, #-0x2c]
0x10001116c: 0x710002df cmp w22, #0x0 ; =0x0
0x100011170: 0x54ffffc1 b.ne 0x100011168
So the instructions starting from 0x100011120 is the translated code for the x86_64
assembly in arm64
assembly.
The takeaway here is that lock; xaddl %eax,(%rbx)
gets implemented with the ldaddal
arm64 instruction.
This is interesting, because ldaddal
can fault if you try to access an unaligned address.
It indeed does as the same repro with arm64
inline assembly proves:
$ ./ua_aarch64
trying 0x104f13f36 with i=0
zsh: bus error ./ua_aarch64
For what it's worth, it's a bit more subtle than this: It doesn't fault for unaligned accesses in general, but it seems to fault if:
- the access crosses the 0x80 cache line boundary
- both cache lines are not hot, i.e. both lines are not in any caches and need to be retrieved from memory.
Which needs some stars to be aligned to say the least.
So this is weird though, why does the "same" code fault for pure arm64
but it doesn't for the translated code of Rosetta 2?
I tried quite hard to get it to crash, but no luck.
I think the reason why it doesn't is the ACTRL_EL1_EnTSO
flag.
It's a flag for the CPU to enable Total Store Order, the memory ordering required by x86_64
.
This flag is only enabled for processes running Rosetta 2 code.
My best guess is that it messes also with the semantics around unaligned atomic memory accesses.
There is a kernel module that would enable it for any arbitrary process, but I didn't go the extra mile yet to verify if my theory holds up: https://github.com/saagarjha/TSOEnabler
But even if this would confirm my theory, I'm not sure what do to with this result.
All potential conclusions that I have is that the HotSpot snippet from above
should never SIGBUS
when running with Rosetta 2.
Hum.