I am debugging a non-deterministic failure of the TestDAP_exception_cpp.py test. The tested binary throws an uncaught exception, which causes the C++ runtime to call std::terminate
. That eventually calls abort
, which then raises SIGABRT, which is caught by lldb-dap.
The test then makes a DAP request to get the information about the exception. The test will randomly fail sometimes because the returned info is missing the “details” part of the record.
self.assertIsNotNone(exceptionInfo["details"])
The DAP exception info request eventually calls into ItaniumABILanguageRuntime::GetExceptionObjectForThread to get the details of the exception.
That function looks up the __cxa_current_exception_type
function and calls ExecuteFunction
to get the result.
modules.FindSymbolsWithNameAndType(
ConstString("__cxa_current_exception_type"), eSymbolTypeCode, contexts);
...
func_call_ret = function_caller->ExecuteFunction(exe_ctx, nullptr, options,
diagnostics, results);
The __cxa_current_exception_type() function looks up the exception record from the thread_local storage via the __cxa_get_globals function. Most of the time the call to __cxa_current_exception_type()
will return a valid exception type. But occasionally it returns a nullptr, which eventually causes the exceptionInfo[“details”]
to be missing.
I was able to catch an instance of __cxa_current_exception_type()
returning nullptr by debugging the test executable with lldb. The interesting thing is that depending on the frame which I evaluate the function call, it will return a valid exception type.
(lldb) bt
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
* frame #0: 0x00007ffff7828760 libc.so.6`abort
frame #1: 0x00007ffff7eaf006 libc++abi.so.1`__abort_message(format="terminating due to %s exception of type %s: %s") at abort_message.cpp:66:5
frame #2: 0x00007ffff7e90a99 libc++abi.so.1`demangling_terminate_handler() at cxa_default_handlers.cpp:72:9
frame #3: 0x00007ffff7eae2f3 libc++abi.so.1`std::__terminate(func=<unavailable>) at cxa_handlers.cpp:59:9
frame #4: 0x00007ffff7eb0fb6 libc++abi.so.1`__cxxabiv1::failed_throw(exception_header=0x000055555556aeb0) at cxa_exception.cpp:152:5
frame #5: 0x00007ffff7eb0fa0 libc++abi.so.1`__cxa_throw(thrown_object=0x000055555556af30, tinfo=0x00007ffff7eb6580, dest=<unavailable>) at cxa_exception.cpp:299:5
frame #6: 0x00005555555551cf a.out`main(argc=1, argv=0x00007fffffffd638) at main.cpp:4:3
frame #7: 0x00007ffff78295d0 libc.so.6`__libc_start_call_main + 128
frame #8: 0x00007ffff7829680 libc.so.6`__libc_start_main@@GLIBC_2.34 + 128
frame #9: 0x00005555555550b5 a.out`_start + 37
(lldb) frame select 0
frame #0: 0x00007ffff7828760 libc.so.6`abort
libc.so.6`abort:
-> 0x7ffff7828760 <+0>: endbr64
0x7ffff7828764 <+4>: pushq %rbp
0x7ffff7828765 <+5>: leaq 0x1d2724(%rip), %rbp ; lock
0x7ffff782876c <+12>: pushq %rbx
(lldb) e __cxa_current_exception_type()
(std::type_info *) $16 = nullptr
(lldb) frame select 1
frame #1: 0x00007ffff7eaf006 libc++abi.so.1`__abort_message(format="terminating due to %s exception of type %s: %s") at abort_message.cpp:66:5
63 closelog();
64 #endif // __BIONIC__
65
-> 66 abort();
67 }
(lldb) e __cxa_current_exception_type()
(__cxxabiv1::__si_class_type_info *) $17 = 0x00007ffff7eb6580
I am trying to understand how the selected frame impacts the evaluation of __cxa_current_exception_type()
and whether lldb is expected to handle TLS correctly.
Ok, I think I understand the problem now. Thanks to @clayborg for pointing me in the right direction!
The problem is that we have definitions in two different modules of the
__cxa_current_exception_type
symbol. In the broken case lldb picks the "wrong" one for the expression evaluation (fromlibstdc++.so
) and we getnullptr
as the result. In the working case it picks the right one fromlibc++abi.so
and we get the the pointer to the exception type as expected.The parallel module loading comes into play by changing the order in which we load the modules, which then results in lldb resolving the symbol to different locations.
The broken case looks like this
The working case looks like this
In the working case the symbol from the locally built libc++abi library comes first.
This also explains why changing the frame makes the expression evaluation work correctly. My understanding is that to resolve the symbol for expression evaluation lldb uses a few heuristics. It first looks for the symbol in the module of the frame where we are located. If it does not find it there it goes through the module list to look for a definition and takes the first one it finds.
Given our backtrace
We see that we start in the libc.so module. So the expression evaluation picks the first symbol it finds for
__cxa_current_exception_type
, which happens to be in the libstdc++.so module in the broken case.When we go up one frame, we land in the lib++abi.so module, so lldb picks the symbol it finds from that module
But if we go, higher up out of the module it will again default to the first symbol
It looks bad that the program is linking two different c++ standard libraries, but it does seem to work correctly. I believe we could construct an example that simply links two different modules with private definitions of some
foo
function and see the same confusing behavior.As for the lldb issue, I am not sure what the correct fix should be.