Skip to content

Instantly share code, notes, and snippets.

@dmpots
Created May 1, 2025 17:20
Show Gist options
  • Save dmpots/5e8d23410ea81f4cf76f8fd1f0389dd1 to your computer and use it in GitHub Desktop.
Save dmpots/5e8d23410ea81f4cf76f8fd1f0389dd1 to your computer and use it in GitHub Desktop.
Notes on debugging flaky TestDAP_exception_cpp.py

Debugging a non-deterministic failure of TestDAP_exception_cpp.py

I am debugging a non-deterministic failure of the TestDAP_exception_cpp.py test. The tested binary throws an uncaught exception, which causes the C++ runtime to call std::terminate. That eventually calls abort, which then raises SIGABRT, which is caught by lldb-dap.

The test then makes a DAP request to get the information about the exception. The test will randomly fail sometimes because the returned info is missing the “details” part of the record.

self.assertIsNotNone(exceptionInfo["details"])

The DAP exception info request eventually calls into ItaniumABILanguageRuntime::GetExceptionObjectForThread to get the details of the exception.

That function looks up the __cxa_current_exception_type function and calls ExecuteFunction to get the result.

modules.FindSymbolsWithNameAndType(
    ConstString("__cxa_current_exception_type"), eSymbolTypeCode, contexts);
...
func_call_ret = function_caller->ExecuteFunction(exe_ctx, nullptr, options,
                                                  diagnostics, results);

The __cxa_current_exception_type() function looks up the exception record from the thread_local storage via the __cxa_get_globals function. Most of the time the call to __cxa_current_exception_type() will return a valid exception type. But occasionally it returns a nullptr, which eventually causes the exceptionInfo[“details”] to be missing.

I was able to catch an instance of __cxa_current_exception_type() returning nullptr by debugging the test executable with lldb. The interesting thing is that depending on the frame which I evaluate the function call, it will return a valid exception type.

(lldb) bt
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
  * frame #0: 0x00007ffff7828760 libc.so.6`abort
    frame #1: 0x00007ffff7eaf006 libc++abi.so.1`__abort_message(format="terminating due to %s exception of type %s: %s") at abort_message.cpp:66:5
    frame #2: 0x00007ffff7e90a99 libc++abi.so.1`demangling_terminate_handler() at cxa_default_handlers.cpp:72:9
    frame #3: 0x00007ffff7eae2f3 libc++abi.so.1`std::__terminate(func=<unavailable>) at cxa_handlers.cpp:59:9
    frame #4: 0x00007ffff7eb0fb6 libc++abi.so.1`__cxxabiv1::failed_throw(exception_header=0x000055555556aeb0) at cxa_exception.cpp:152:5
    frame #5: 0x00007ffff7eb0fa0 libc++abi.so.1`__cxa_throw(thrown_object=0x000055555556af30, tinfo=0x00007ffff7eb6580, dest=<unavailable>) at cxa_exception.cpp:299:5
    frame #6: 0x00005555555551cf a.out`main(argc=1, argv=0x00007fffffffd638) at main.cpp:4:3
    frame #7: 0x00007ffff78295d0 libc.so.6`__libc_start_call_main + 128
    frame #8: 0x00007ffff7829680 libc.so.6`__libc_start_main@@GLIBC_2.34 + 128
    frame #9: 0x00005555555550b5 a.out`_start + 37

(lldb) frame select 0
frame #0: 0x00007ffff7828760 libc.so.6`abort
libc.so.6`abort:
->  0x7ffff7828760 <+0>:  endbr64
    0x7ffff7828764 <+4>:  pushq  %rbp
    0x7ffff7828765 <+5>:  leaq   0x1d2724(%rip), %rbp ; lock
    0x7ffff782876c <+12>: pushq  %rbx
(lldb) e __cxa_current_exception_type()
(std::type_info *) $16 = nullptr

(lldb) frame select 1
frame #1: 0x00007ffff7eaf006 libc++abi.so.1`__abort_message(format="terminating due to %s exception of type %s: %s") at abort_message.cpp:66:5
   63       closelog();
   64   #endif // __BIONIC__
   65
-> 66       abort();
   67   }
(lldb) e __cxa_current_exception_type()
(__cxxabiv1::__si_class_type_info *) $17 = 0x00007ffff7eb6580

I am trying to understand how the selected frame impacts the evaluation of __cxa_current_exception_type() and whether lldb is expected to handle TLS correctly.

@dmpots
Copy link
Author

dmpots commented May 3, 2025

Ok, I think I understand the problem now. Thanks to @clayborg for pointing me in the right direction!

The problem is that we have definitions in two different modules of the __cxa_current_exception_type symbol. In the broken case lldb picks the "wrong" one for the expression evaluation (from libstdc++.so) and we get nullptr as the result. In the working case it picks the right one from libc++abi.so and we get the the pointer to the exception type as expected.

The parallel module loading comes into play by changing the order in which we load the modules, which then results in lldb resolving the symbol to different locations.

The broken case looks like this

(lldb) image lookup -n __cxa_current_exception_type
2 matches found in /lib64/libstdc++.so.6:
        Address: libstdc++.so.6[0x00000000000ad880] (libstdc++.so.6.PT_LOAD[1]..text + 50352)
        Summary: libstdc++.so.6`__cxa_current_exception_type
        Address: libstdc++.so.6[0x00000000000ad880] (libstdc++.so.6.PT_LOAD[1]..text + 50352)
        Summary: libstdc++.so.6`__cxa_current_exception_type
1 match found in /my/build/dir/libc++abi.so.1:
        Address: libc++abi.so.1[0x000000000003e0d0] (libc++abi.so.1.PT_LOAD[1]..text + 133216)
        Summary: libc++abi.so.1`::__cxa_current_exception_type() at cxa_exception.cpp:600

The working case looks like this

(lldb) image lookup -n __cxa_current_exception_type
1 match found in /my/build/dir/libc++abi.so.1:
        Address: libc++abi.so.1[0x000000000003e0d0] (libc++abi.so.1.PT_LOAD[1]..text + 133216)
        Summary: libc++abi.so.1`::__cxa_current_exception_type() at cxa_exception.cpp:600
2 matches found in /lib64/libstdc++.so.6:
        Address: libstdc++.so.6[0x00000000000ad880] (libstdc++.so.6.PT_LOAD[1]..text + 50352)
        Summary: libstdc++.so.6`__cxa_current_exception_type
        Address: libstdc++.so.6[0x00000000000ad880] (libstdc++.so.6.PT_LOAD[1]..text + 50352)
        Summary: libstdc++.so.6`__cxa_current_exception_type

In the working case the symbol from the locally built libc++abi library comes first.

This also explains why changing the frame makes the expression evaluation work correctly. My understanding is that to resolve the symbol for expression evaluation lldb uses a few heuristics. It first looks for the symbol in the module of the frame where we are located. If it does not find it there it goes through the module list to look for a definition and takes the first one it finds.

Given our backtrace

* thread #1, name = 'a.out', stop reason = breakpoint 1.1
  * frame #0: 0x00007ffff7828760 libc.so.6`abort
    frame #1: 0x00007ffff7eaf006 libc++abi.so.1`__abort_message
    ...
    frame #9: 0x00005555555550b5 a.out`_start + 37

We see that we start in the libc.so module. So the expression evaluation picks the first symbol it finds for __cxa_current_exception_type, which happens to be in the libstdc++.so module in the broken case.

(lldb) e __cxa_current_exception_type()
(std::type_info *) $7 = nullptr

When we go up one frame, we land in the lib++abi.so module, so lldb picks the symbol it finds from that module

(lldb) frame sel 1
frame #1: 0x00007ffff7eaf006 libc++abi.so.1`__abort_message
(lldb) e __cxa_current_exception_type()
(__cxxabiv1::__si_class_type_info *) $8 = 0x00007ffff7eb6580

But if we go, higher up out of the module it will again default to the first symbol

(lldb) frame select 9
frame #9: 0x00005555555550b5 a.out`_start + 37
(lldb) e __cxa_current_exception_type()
(std::type_info *) $9 = nullptr

It looks bad that the program is linking two different c++ standard libraries, but it does seem to work correctly. I believe we could construct an example that simply links two different modules with private definitions of some foo function and see the same confusing behavior.

As for the lldb issue, I am not sure what the correct fix should be.

@clayborg
Copy link

clayborg commented May 9, 2025

One question I have is: should we be loading the libraries in the exact order that the dynamic loader loads the libraries in the POSIX DYLD? This would be nice to ensure consistent ordering of modules.

@Michael137
Copy link

Michael137 commented May 21, 2025

Slightly tangential, but I added a way to specify a "preferred module" during expression evaluation. See llvm/llvm-project#129733 (though I haven't exposed it in the CLI, though that shouldn't be much work)

We were running into similar non-determinism in the way that ASAN libraries got loaded. So that might be a stop-gap if needed. The expression evaluator can't always know what function the user intended to call, especially if it's cross-module and there are multiple definitions. Explicitly specifying the module to look in seems like a reasonable option

@dmpots
Copy link
Author

dmpots commented May 21, 2025

Slightly tangential, but I added a way to specify a "preferred module" during expression evaluation. See llvm/llvm-project#129733 (though I haven't exposed it in the CLI, though that shouldn't be much work)

Oh, thanks for pointing that out. It looks like we could use something similar when doing the lookup for the __cxa_current_exception_type symbol.

Problem is I'm not sure how to pick the "right" module in this case. It also bothers me that we are loading two different c++ libraries here. Might be able to just clean up the linker command to only link one.

Also, to close the loop here a bit I created a post to discuss whether we should enforce a deterministic module order in the lldb modules list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment