Skip to content

Instantly share code, notes, and snippets.

@lewurm
Last active December 8, 2022 11:52
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lewurm/527dd29aec1bd6103548d505146f3c98 to your computer and use it in GitHub Desktop.
Save lewurm/527dd29aec1bd6103548d505146f3c98 to your computer and use it in GitHub Desktop.
Fun with Rosetta 2 and atomic instructions

Context

A SIGBUS is happening in HotSpot when running on Rosetta 2: oracle/graal#4314 (comment)

It happens around a very specific kind of code

0x10c6a5833: 48 81 c3 48 01 00 00  addq   $0x148, %rbx    ; imm = 0x148
0x10c6a583a: b8 01 00 00 00        movl   $0x1, %eax
0x10c6a583f: f0                    lock                   # <== crash is happening here
0x10c6a5840: 0f c1 03              xaddl  %eax, (%rbx)

Alas we do not have more information (e.g. what the value of %rbx is in this case), but it's fair to assume that it must be somewhat unaligned due to getting a SIGBUS. I want to understand what Rosetta does with this code, so I started to work on a minimal reproducer.

x86_64 repro

In c.c we have the following inline assembly that reassembles in the mentioned crash above:

void please_fault(long *p) {
    /* [...] */
    __asm__ volatile (
            "mov %1, %%rbx; "
            "addq $0x148, %%rbx; "
            "movl $0x1, %%eax; "
            "lock; xaddl %%eax,(%%rbx); "
            "mov %%rbx, %0"
            : "=r" (p) /* output, %0 */
            : "r" (p)  /* input, %1 */
            : "%rbx", "%rax"  /* clobbered regs */
            );
    /* [...] */
}

Run it like this:

$ make
clang -O2 -arch x86_64 -g c.c -o ua_x86_64
clang -O2 -arch arm64 -g c.c -o ua_aarch64
clang -arch arm64 runner.c -o runner
$ ./ua_x86_64
trying 0x10b534f36 with i=0
trying 0x10b53cf37 with i=1
trying 0x10b544f38 with i=2
[...]
trying 0x12b528f34 with i=16382
trying 0x12b530f35 with i=16383
done

Well, it doesn't crash. But let's not give up yet, what is Rosetta actually doing with that?

Messing around with Rosetta 2

It's not documented by Apple, but there is an amazing reverse engineering effort going on at https://ffri.github.io/ProjectChampollion/

Translated files by Rosetta 2 are SIP protected, so they can't be accessed without disabling it. If you attach with lldb to a x86_64 process, you will see the x86_64 code. However, they discovered a neat trick: If you start lldb with a helper program in arm64 context, which then execve into a x86_64 binary, then you'll end up in its arm64 translated code. Woot!

See the helper in runner.c (it also sets an environment variable to help us setting a "breakpoint"):

$ arch -arch arm64 lldb -- ./runner ./ua_x86_64
(lldb) target create "./runner"
Current executable set to '/Users/lewurm/tmp/atomic-rosetta/runner' (arm64).
(lldb) settings set -- target.run-args  "./ua_x86_64"
(lldb) run
Process 84572 launched: '/Users/lewurm/tmp/atomic-rosetta/runner' (arm64)
Process 84572 stopped
* thread #2, stop reason = exec
    frame #0: 0x00007ff7fffbba2c runtime`_mh_execute_header + 14892
runtime`_mh_execute_header:
->  0x7ff7fffbba2c <+14892>: mov    x19, sp
    0x7ff7fffbba30 <+14896>: and    sp, x19, #0xfffffffffffffff0
    0x7ff7fffbba34 <+14900>: mov    x29, sp
    0x7ff7fffbba38 <+14904>: ldr    x20, [x19, #0x20]
Target 0: (runtime) stopped.
(lldb) # we are in the rosetta runtime now
(lldb) continue
Process 84572 resuming
trying 0x108676f36 with i=0
# the program is hanging now due to PLEASE_HANG=1 being set, press CTRL+C to get back into lldb
Process 84572 stopped
* thread #2, stop reason = signal SIGSTOP
    frame #0: 0x0000000100011168
->  0x100011168: ldur   w22, [x5, #-0x2c]
    0x10001116c: cmp    w22, #0x0                 ; =0x0
    0x100011170: b.ne   0x100011168
    0x100011174: add    w12, w12, #0x1            ; =0x1
Target 0: (runtime) stopped.
(lldb) x/23i $pc-0x50
    0x100011118: 0xf81f8c98   str    x24, [x4, #-0x8]!
    0x10001111c: 0x94000033   bl     0x1000111e8
    0x100011120: 0xaa0d03e3   mov    x3, x13
    0x100011124: 0x91052063   add    x3, x3, #0x148            ; =0x148
    0x100011128: 0x52800020   mov    w0, #0x1
    0x10001112c: 0xb8e00077   ldaddal w0, w23, [x3]  # <-- this is the "lock; xaddl" instruction!
    0x100011130: 0x2a1703e0   mov    w0, w23
    0x100011134: 0xaa0303e1   mov    x1, x3
    0x100011138: 0xaa0f03e7   mov    x7, x15
    0x10001113c: 0xd0ffff98   adrp   x24, -14
    0x100011140: 0x913b8718   add    x24, x24, #0xee1          ; =0xee1
    0x100011144: 0x10000099   adr    x25, #0x10
    0x100011148: 0xa9bf66b8   stp    x24, x25, [x21, #-0x10]!
    0x10001114c: 0xf81f8c98   str    x24, [x4, #-0x8]!
    0x100011150: 0x9400003a   bl     0x100011238
    0x100011154: 0x52800001   mov    w1, #0x0
    0x100011158: 0xea00001f   tst    x0, x0
    0x10001115c: 0x9a9f07f6   cset   x22, ne
    0x100011160: 0xb3401ec1   bfxil  x1, x22, #0, #8
    0x100011164: 0xb81d40a1   stur   w1, [x5, #-0x2c]
->  0x100011168: 0xb85d40b6   ldur   w22, [x5, #-0x2c]
    0x10001116c: 0x710002df   cmp    w22, #0x0                 ; =0x0
    0x100011170: 0x54ffffc1   b.ne   0x100011168

So the instructions starting from 0x100011120 is the translated code for the x86_64 assembly in arm64 assembly. The takeaway here is that lock; xaddl %eax,(%rbx) gets implemented with the ldaddal arm64 instruction.

Repro it with pure arm64?

This is interesting, because ldaddal can fault if you try to access an unaligned address. It indeed does as the same repro with arm64 inline assembly proves:

$ ./ua_aarch64
trying 0x104f13f36 with i=0
zsh: bus error  ./ua_aarch64

For what it's worth, it's a bit more subtle than this: It doesn't fault for unaligned accesses in general, but it seems to fault if:

  1. the access crosses the 0x80 cache line boundary
  2. both cache lines are not hot, i.e. both lines are not in any caches and need to be retrieved from memory.

Which needs some stars to be aligned to say the least.

Everyone confused yet?

So this is weird though, why does the "same" code fault for pure arm64 but it doesn't for the translated code of Rosetta 2? I tried quite hard to get it to crash, but no luck. I think the reason why it doesn't is the ACTRL_EL1_EnTSO flag. It's a flag for the CPU to enable Total Store Order, the memory ordering required by x86_64. This flag is only enabled for processes running Rosetta 2 code. My best guess is that it messes also with the semantics around unaligned atomic memory accesses.

There is a kernel module that would enable it for any arbitrary process, but I didn't go the extra mile yet to verify if my theory holds up: https://github.com/saagarjha/TSOEnabler

But even if this would confirm my theory, I'm not sure what do to with this result. All potential conclusions that I have is that the HotSpot snippet from above should never SIGBUS when running with Rosetta 2.

Hum.

#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
void please_fault(long *p) {
#ifdef __aarch64__
int out;
__asm__ volatile (
"mov x3, %1\n\t"
"add x3, x3, #0x148\n\t"
"mov w0, #0x1\n\t"
"ldaddal w0, w23, [x3]\n\t"
"mov %w0, w23\n\t"
: "=r" (out) /* output, %0 */
: "r" (p) /* input, %1 */
: "3", "0", "23" /* clobbered regs */
);
#else
__asm__ volatile (
"mov %1, %%rbx; "
"addq $0x148, %%rbx; "
"movl $0x1, %%eax; "
"lock; xaddl %%eax,(%%rbx); "
"mov %%rbx, %0; "
: "=r" (p) /* output, %0 */
: "r" (p) /* input, %1 */
: "%rbx", "%rax" /* clobbered regs */
);
#endif
volatile int hang_in_loop = !!getenv("PLEASE_HANG");
while(hang_in_loop);
}
int main(void) {
#define ITERS (0x40000 * 4)
char **ptrs = (char **) malloc (sizeof(char*) * ITERS);
for (int i = 0; i < ITERS; i++) {
char *buf = mmap(NULL, 2 * ITERS, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
ptrs[i] = buf;
}
for (int i = 0; i < ITERS / 4; i++) {
char *buf = ptrs[i];
/* 0x36 + 0x148 = 0x17e , which crosses the cache line boundary of 0x80 for a 4 byte access */
long *p = (long *)(buf + 0x36 - 0x100 + i);
fprintf(stderr, "trying %p with i=%d\n", p, i);
please_fault(p);
}
fprintf(stderr, "done\n");
}
SHELL:=bash
all: ua_x86_64 ua_aarch64 runner
ua_x86_64: c.c
clang -O2 -arch x86_64 -g $< -o $@
ua_aarch64: c.c
clang -O2 -arch arm64 -g $< -o $@
runner: runner.c
clang -arch arm64 $< -o $@
// runner.c
#include <unistd.h>
int main(int argc, char* argv[]) {
if (argc != 2) {
return 1;
}
char *env[] = { "PLEASE_HANG=1", NULL };
execve(argv[1], NULL, env);
return 0;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment