Skip to content

Instantly share code, notes, and snippets.

@airMeng
Last active December 25, 2023 06:57
Show Gist options
  • Save airMeng/3e15adaae0fa28becb424af4e694ef86 to your computer and use it in GitHub Desktop.
Save airMeng/3e15adaae0fa28becb424af4e694ef86 to your computer and use it in GitHub Desktop.
Debugging Xbyak via GDB.md

OneDNN teams suggests to use SDE to dump the JITTed code like the following:

You can dump the JITTed kernel via the following c++ code:

void dump(const void *code, size_t code_size)
{
    FILE *file = fopen("dump.bin", "wb+");
    if (file) {
        size_t unused = fwrite(code, code_size, 1, file);
        fclose(file);
    }
}
dump(kernel.getCode(), kernel.getSize());  //kernel is an instantiation of Xbyak::CodeGenerator

then you will get a binary named "dump.bin" and you can parse it use SDE:

path/to/sde/xed64 -64 -ir dump.bin >> assembly.txt
Click to see the whole assembly.txt!
XDIS 0: PUSH      BASE       53                       push rbx
XDIS 1: PUSH      BASE       55                       push rbp
XDIS 2: PUSH      BASE       4154                     push r12
XDIS 4: PUSH      BASE       4155                     push r13
XDIS 6: PUSH      BASE       4156                     push r14
XDIS 8: PUSH      BASE       4157                     push r15
XDIS a: DATAXFER  BASE       BD00040000               mov ebp, 0x400
XDIS f: DATAXFER  BASE       4C8B3F                   mov r15, qword ptr [rdi]
XDIS 12: DATAXFER  BASE       4C8B7708                 mov r14, qword ptr [rdi+0x8]
XDIS 16: DATAXFER  BASE       4C8B6F10                 mov r13, qword ptr [rdi+0x10]
XDIS 1a: LOGICAL   AVX512EVEX 62F17D48EFC0             vpxord zmm0, zmm0, zmm0
XDIS 20: LOGICAL   AVX512EVEX 62F15D48EFE4             vpxord zmm4, zmm4, zmm4
XDIS 26: LOGICAL   AVX512EVEX 62513D48EFC0             vpxord zmm8, zmm8, zmm8
XDIS 2c: LOGICAL   AVX512EVEX 62511D48EFE4             vpxord zmm12, zmm12, zmm12
XDIS 32: LOGICAL   AVX512EVEX 62F17548EFC9             vpxord zmm1, zmm1, zmm1
XDIS 38: LOGICAL   AVX512EVEX 62F15548EFED             vpxord zmm5, zmm5, zmm5
XDIS 3e: LOGICAL   AVX512EVEX 62513548EFC9             vpxord zmm9, zmm9, zmm9
XDIS 44: LOGICAL   AVX512EVEX 62511548EFED             vpxord zmm13, zmm13, zmm13
XDIS 4a: LOGICAL   AVX512EVEX 62F16D48EFD2             vpxord zmm2, zmm2, zmm2
XDIS 50: LOGICAL   AVX512EVEX 62F14D48EFF6             vpxord zmm6, zmm6, zmm6
XDIS 56: LOGICAL   AVX512EVEX 62512D48EFD2             vpxord zmm10, zmm10, zmm10
XDIS 5c: LOGICAL   AVX512EVEX 62510D48EFF6             vpxord zmm14, zmm14, zmm14
XDIS 62: DATAXFER  AVX512EVEX 62C17C481006             vmovups zmm16, zmmword ptr [r14]
XDIS 68: DATAXFER  AVX512EVEX 62C17C48104E01           vmovups zmm17, zmmword ptr [r14+0x40]
XDIS 6f: DATAXFER  AVX512EVEX 62C17C48105602           vmovups zmm18, zmmword ptr [r14+0x80]
XDIS 76: BROADCAST AVX512EVEX 62427D48183F             vbroadcastss zmm31, dword ptr [r15]
XDIS 7c: VFMA      AVX512EVEX 62927D40B8C7             vfmadd231ps zmm0, zmm16, zmm31
XDIS 82: VFMA      AVX512EVEX 62927540B8CF             vfmadd231ps zmm1, zmm17, zmm31
XDIS 88: VFMA      AVX512EVEX 62926D40B8D7             vfmadd231ps zmm2, zmm18, zmm31
XDIS 8e: BROADCAST AVX512EVEX 62427D48187F04           vbroadcastss zmm31, dword ptr [r15+0x10]
XDIS 95: VFMA      AVX512EVEX 62927D40B8E7             vfmadd231ps zmm4, zmm16, zmm31
XDIS 9b: VFMA      AVX512EVEX 62927540B8EF             vfmadd231ps zmm5, zmm17, zmm31
XDIS a1: VFMA      AVX512EVEX 62926D40B8F7             vfmadd231ps zmm6, zmm18, zmm31
XDIS a7: BROADCAST AVX512EVEX 62427D48187F08           vbroadcastss zmm31, dword ptr [r15+0x20]
XDIS ae: VFMA      AVX512EVEX 62127D40B8C7             vfmadd231ps zmm8, zmm16, zmm31
XDIS b4: VFMA      AVX512EVEX 62127540B8CF             vfmadd231ps zmm9, zmm17, zmm31
XDIS ba: VFMA      AVX512EVEX 62126D40B8D7             vfmadd231ps zmm10, zmm18, zmm31
XDIS c0: BROADCAST AVX512EVEX 62427D48187F0C           vbroadcastss zmm31, dword ptr [r15+0x30]
XDIS c7: VFMA      AVX512EVEX 62127D40B8E7             vfmadd231ps zmm12, zmm16, zmm31
XDIS cd: VFMA      AVX512EVEX 62127540B8EF             vfmadd231ps zmm13, zmm17, zmm31
XDIS d3: VFMA      AVX512EVEX 62126D40B8F7             vfmadd231ps zmm14, zmm18, zmm31
XDIS d9: DATAXFER  AVX512EVEX 62C17C48104603           vmovups zmm16, zmmword ptr [r14+0xc0]
XDIS e0: DATAXFER  AVX512EVEX 62C17C48104E04           vmovups zmm17, zmmword ptr [r14+0x100]
XDIS e7: DATAXFER  AVX512EVEX 62C17C48105605           vmovups zmm18, zmmword ptr [r14+0x140]
XDIS ee: BROADCAST AVX512EVEX 62427D48187F01           vbroadcastss zmm31, dword ptr [r15+0x4]
XDIS f5: VFMA      AVX512EVEX 62927D40B8C7             vfmadd231ps zmm0, zmm16, zmm31
XDIS fb: VFMA      AVX512EVEX 62927540B8CF             vfmadd231ps zmm1, zmm17, zmm31
XDIS 101: VFMA      AVX512EVEX 62926D40B8D7             vfmadd231ps zmm2, zmm18, zmm31
XDIS 107: BROADCAST AVX512EVEX 62427D48187F05           vbroadcastss zmm31, dword ptr [r15+0x14]
XDIS 10e: VFMA      AVX512EVEX 62927D40B8E7             vfmadd231ps zmm4, zmm16, zmm31
XDIS 114: VFMA      AVX512EVEX 62927540B8EF             vfmadd231ps zmm5, zmm17, zmm31
XDIS 11a: VFMA      AVX512EVEX 62926D40B8F7             vfmadd231ps zmm6, zmm18, zmm31
XDIS 120: BROADCAST AVX512EVEX 62427D48187F09           vbroadcastss zmm31, dword ptr [r15+0x24]
XDIS 127: VFMA      AVX512EVEX 62127D40B8C7             vfmadd231ps zmm8, zmm16, zmm31
XDIS 12d: VFMA      AVX512EVEX 62127540B8CF             vfmadd231ps zmm9, zmm17, zmm31
XDIS 133: VFMA      AVX512EVEX 62126D40B8D7             vfmadd231ps zmm10, zmm18, zmm31
XDIS 139: BROADCAST AVX512EVEX 62427D48187F0D           vbroadcastss zmm31, dword ptr [r15+0x34]
XDIS 140: VFMA      AVX512EVEX 62127D40B8E7             vfmadd231ps zmm12, zmm16, zmm31
XDIS 146: VFMA      AVX512EVEX 62127540B8EF             vfmadd231ps zmm13, zmm17, zmm31
XDIS 14c: VFMA      AVX512EVEX 62126D40B8F7             vfmadd231ps zmm14, zmm18, zmm31
XDIS 152: DATAXFER  AVX512EVEX 62C17C48104606           vmovups zmm16, zmmword ptr [r14+0x180]
XDIS 159: DATAXFER  AVX512EVEX 62C17C48104E07           vmovups zmm17, zmmword ptr [r14+0x1c0]
XDIS 160: DATAXFER  AVX512EVEX 62C17C48105608           vmovups zmm18, zmmword ptr [r14+0x200]
XDIS 167: BROADCAST AVX512EVEX 62427D48187F02           vbroadcastss zmm31, dword ptr [r15+0x8]
XDIS 16e: VFMA      AVX512EVEX 62927D40B8C7             vfmadd231ps zmm0, zmm16, zmm31
XDIS 174: VFMA      AVX512EVEX 62927540B8CF             vfmadd231ps zmm1, zmm17, zmm31
XDIS 17a: VFMA      AVX512EVEX 62926D40B8D7             vfmadd231ps zmm2, zmm18, zmm31
XDIS 180: BROADCAST AVX512EVEX 62427D48187F06           vbroadcastss zmm31, dword ptr [r15+0x18]
XDIS 187: VFMA      AVX512EVEX 62927D40B8E7             vfmadd231ps zmm4, zmm16, zmm31
XDIS 18d: VFMA      AVX512EVEX 62927540B8EF             vfmadd231ps zmm5, zmm17, zmm31
XDIS 193: VFMA      AVX512EVEX 62926D40B8F7             vfmadd231ps zmm6, zmm18, zmm31
XDIS 199: BROADCAST AVX512EVEX 62427D48187F0A           vbroadcastss zmm31, dword ptr [r15+0x28]
XDIS 1a0: VFMA      AVX512EVEX 62127D40B8C7             vfmadd231ps zmm8, zmm16, zmm31
XDIS 1a6: VFMA      AVX512EVEX 62127540B8CF             vfmadd231ps zmm9, zmm17, zmm31
XDIS 1ac: VFMA      AVX512EVEX 62126D40B8D7             vfmadd231ps zmm10, zmm18, zmm31
XDIS 1b2: BROADCAST AVX512EVEX 62427D48187F0E           vbroadcastss zmm31, dword ptr [r15+0x38]
XDIS 1b9: VFMA      AVX512EVEX 62127D40B8E7             vfmadd231ps zmm12, zmm16, zmm31
XDIS 1bf: VFMA      AVX512EVEX 62127540B8EF             vfmadd231ps zmm13, zmm17, zmm31
XDIS 1c5: VFMA      AVX512EVEX 62126D40B8F7             vfmadd231ps zmm14, zmm18, zmm31
XDIS 1cb: DATAXFER  AVX512EVEX 62C17C48104609           vmovups zmm16, zmmword ptr [r14+0x240]
XDIS 1d2: DATAXFER  AVX512EVEX 62C17C48104E0A           vmovups zmm17, zmmword ptr [r14+0x280]
XDIS 1d9: DATAXFER  AVX512EVEX 62C17C4810560B           vmovups zmm18, zmmword ptr [r14+0x2c0]
XDIS 1e0: BROADCAST AVX512EVEX 62427D48187F03           vbroadcastss zmm31, dword ptr [r15+0xc]
XDIS 1e7: VFMA      AVX512EVEX 62927D40B8C7             vfmadd231ps zmm0, zmm16, zmm31
XDIS 1ed: VFMA      AVX512EVEX 62927540B8CF             vfmadd231ps zmm1, zmm17, zmm31
XDIS 1f3: VFMA      AVX512EVEX 62926D40B8D7             vfmadd231ps zmm2, zmm18, zmm31
XDIS 1f9: BROADCAST AVX512EVEX 62427D48187F07           vbroadcastss zmm31, dword ptr [r15+0x1c]
XDIS 200: VFMA      AVX512EVEX 62927D40B8E7             vfmadd231ps zmm4, zmm16, zmm31
XDIS 206: VFMA      AVX512EVEX 62927540B8EF             vfmadd231ps zmm5, zmm17, zmm31
XDIS 20c: VFMA      AVX512EVEX 62926D40B8F7             vfmadd231ps zmm6, zmm18, zmm31
XDIS 212: BROADCAST AVX512EVEX 62427D48187F0B           vbroadcastss zmm31, dword ptr [r15+0x2c]
XDIS 219: VFMA      AVX512EVEX 62127D40B8C7             vfmadd231ps zmm8, zmm16, zmm31
XDIS 21f: VFMA      AVX512EVEX 62127540B8CF             vfmadd231ps zmm9, zmm17, zmm31
XDIS 225: VFMA      AVX512EVEX 62126D40B8D7             vfmadd231ps zmm10, zmm18, zmm31
XDIS 22b: BROADCAST AVX512EVEX 62427D48187F0F           vbroadcastss zmm31, dword ptr [r15+0x3c]
XDIS 232: VFMA      AVX512EVEX 62127D40B8E7             vfmadd231ps zmm12, zmm16, zmm31
XDIS 238: VFMA      AVX512EVEX 62127540B8EF             vfmadd231ps zmm13, zmm17, zmm31
XDIS 23e: VFMA      AVX512EVEX 62126D40B8F7             vfmadd231ps zmm14, zmm18, zmm31
XDIS 244: DATAXFER  AVX512EVEX 62D17C48114500           vmovups zmmword ptr [r13], zmm0
XDIS 24b: DATAXFER  AVX512EVEX 62D17C48116503           vmovups zmmword ptr [r13+0xc0], zmm4
XDIS 252: DATAXFER  AVX512EVEX 62517C48114506           vmovups zmmword ptr [r13+0x180], zmm8
XDIS 259: DATAXFER  AVX512EVEX 62517C48116509           vmovups zmmword ptr [r13+0x240], zmm12
XDIS 260: DATAXFER  AVX512EVEX 62D17C48114D01           vmovups zmmword ptr [r13+0x40], zmm1
XDIS 267: DATAXFER  AVX512EVEX 62D17C48116D04           vmovups zmmword ptr [r13+0x100], zmm5
XDIS 26e: DATAXFER  AVX512EVEX 62517C48114D07           vmovups zmmword ptr [r13+0x1c0], zmm9
XDIS 275: DATAXFER  AVX512EVEX 62517C48116D0A           vmovups zmmword ptr [r13+0x280], zmm13
XDIS 27c: DATAXFER  AVX512EVEX 62D17C48115502           vmovups zmmword ptr [r13+0x80], zmm2
XDIS 283: DATAXFER  AVX512EVEX 62D17C48117505           vmovups zmmword ptr [r13+0x140], zmm6
XDIS 28a: DATAXFER  AVX512EVEX 62517C48115508           vmovups zmmword ptr [r13+0x200], zmm10
XDIS 291: DATAXFER  AVX512EVEX 62517C4811750B           vmovups zmmword ptr [r13+0x2c0], zmm14
XDIS 298: POP       BASE       415F                     pop r15
XDIS 29a: POP       BASE       415E                     pop r14
XDIS 29c: POP       BASE       415D                     pop r13
XDIS 29e: POP       BASE       415C                     pop r12
XDIS 2a0: POP       BASE       5D                       pop rbp
XDIS 2a1: POP       BASE       5B                       pop rbx
XDIS 2a2: AVX       AVX        C5F877                   vzeroupper
XDIS 2a5: RET       BASE       C3                       ret
# end of text section.
# Errors: 0
#XED3 DECODE STATS
#Total DECODE cycles:        29220
#Total instructions DECODE: 68
#Total tail DECODE cycles:        236418
#Total tail instructions DECODE: 118
#Total cycles/instruction DECODE: 429.71
#Total tail cycles/instruction DECODE: 2003.54

However, the above methods need deep understanding of assembly, for beginners it is hard to simulate the code running in the mind. Here I introduce how to debugging xbyak using GDB, like any c++/python programs.

how to debug during JITTed kernel generation

Take a naive program as an example, suppose the output after compilanation is ./toy. ps: DON'T forget to add -g during building.

  1 #include <xbyak/xbyak_util.h>
  2
  3 struct Code : public Xbyak::CodeGenerator {
  4     Code()
  5     {
  6         // xbyak also provides advanced usage like StakeFrame
  7         // see xbyak/sample/sf_test.cpp for how to use other parameter
  8         // Xbyak::util::StackFrame sf(this, 4);
  9         sub(rsp, 256);
 10         mov(eax, ptr[rdi + 4]);           // rdi is always the reg for the 1st argument
 11         mov(rax, eax);                    // since the 1st arguments will be a pointer, we need to read the address and load the interger
 12         add(rax, rsi);                    // rsi is always the reg for the 2nd argument
 13         add(rax, rdx);                    // rdx is always the reg for the 3rd argument
 14         mov(ptr[rcx], rax);               // rax is always the reg for the 4th argument
 15         add(rsp, 256);
 16         ret();
 17     }
 18 };
 19
 20 int main()
 21 {
 22     Code c;
 23     int* a = (int*) malloc(2 * sizeof(int));
 24     a[0] = 3;
 25     a[1] = 4;
 26     int res;
 27     void (*f)(int*, int, int, int*) = c.getCode<void(*) (int*, int, int, int*)>();
 28     f(a, 5, 2, &res);
 29     if (res == 4 + 5 + 2) {
 30         puts("ok");
 31     } else {
 32         printf("res = %d\n", res);
 33         puts("ng");
 34     }
 35 }

I suggest to use GDB with tui option, you can build it from source or directly install it via conda

gdb --tui ./toy
...
(gdb) b 11
(gdb) r
(gdb) x/1i this->top_

Here we set a brakpoint at line 11, just after the first line of generate function. The key is x/2i this->top_. top_is the beginng address of generated kernel. x/1i means printing the next 1 instuction beginng at this address. For detailed usge of x/, you can refer to GDB official. Here we get:

(gdb) x/2i this->top_
+x/2i this->top_
   0x7ffff7ff9000:      sub    rsp,0x2000
   0x7ffff7ff9007:      mov    eax,DWORD PTR [rdi+0x4]

That is exactly what we want to generate!

how to debug during JITTed kernel running

As we know, Xbyak return the JITTed kernel as void* function, so the question is where to find address of this function and how to debugging assembly. Luckily we know where the function will be called, it is line 28, then we can dive into the assembly from here

(gdb) b 28
(gdb) c
(gdb) layout asm

The output will be

image

and then we can continue debugging just like a c++ program, instead that next for c++ will be nexti forassembly, step for c++ will be stepi for assembly. And morever, we know the JITTed kernel is a name-less function so it must be called, yes, it is 0x40270f <main()+200> call r8 !!!, let's go on:

(gdb) b *0x40270f
(gdb) c
(gdb) stepi

Now we entered the JITTed kernel and we can debug per line:

image

note that we can dump register value and observe its movements:

(gdb) nexti
(gdb) i r rdi
+i r rax
rax            0x4                 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment