dannas/debugging_optimized_code.md

## debugging_optimized_code.md

      
    Raw
  

              debugging_optimized_code.md
            
          
    My background: Not an expert on debuggers nor embedded systems.
Scope: GNU toolchain. What code does gcc generate? How does gdb interpret it?
END GOAL: An embedded developer should have an intuition for how call flow, control flow and data flow
is presented by the debugger.
Prior Art

Many people have described DWARF and how to debug optimized code, but I haven't found an article that gives
practical examples of debugging sessions with optimized code.

Michael J. Eagers Introduction to the DWARF Debugging Format
The DWARF Debugging Information Format Version 5
Li, Y., Ding, S., Zhang, Q., & Italiano, D. (2020, June). Debug information validation for optimized code.
In PLDI (pp. 1052-1065).
Hennessy, John. "Symbolic debugging of optimized code." ACM Transactions on Programming Languages and Systems (TOPLAS) 4.3 (1982): 323-344
Sami Al Bahras !!Con 2016 presentation Debugging Debuggers
Eli Benderskis 6-part series on Debuggers
Djordje Totorovic Triplefault presentation about Recovering optimized out variables by finding parameter values in parent frame
Alexandre Olivia GCC gology: studing the impact of optimization on debugging
Greg Laws CppCon presentation Under the hood of Linux c++ debugging tools
How to Update Debug Info: A Guide for LLVM Pass Authors

Approach


Use some common library such as newlib or libopencm3 and  inspect various functions with
optimizations disabled and enabled.
Present the internals of DWARF in a readable format using Eli Benderskis pyelftools.

Questions


What optimization level to choose? Should I discuss anything besides -Os?
A great advantage of -Os is that its far easier for a human to read the code compared to -O2
Should I run the code on target? Nucleos board. Or rednode or just compiler explorer?
How much knowledge of assembly to expect?
What embedded specific things is worth mentioning?

DWARF


What is the overall structure of the debug information in an ELF file?
A tree of Debugging Information Entrys (DIE)
Try not to get too bogged down in internals of the file format.
Touched on in https://interrupt.memfault.com/blog/gnu-binutils#dumping-dwarf-information

How map my source code to the assembly and visa versa?


.debug_line for mapping PC to src line
address->src is a N:1 mapping
src->address is a 1:N mapping
compiler explorer uses the dwarf .debug_line data for coloring lines
readelf -wL 
dwarfdump -l 

How does a debugger generate a backtrace?


.debug_frame
.eh_frame
Entry point of function is import - where you wanna place a breakpoint
You want to have a mapping [addr_low, addr_high] => function name
There's backtrace(3) which relies on .dynsym symbols

can't use .eh_frame for unwinding
you need to export functions with -rdynamic
inlined functions have no stack frames
tail-call opt replaces stack


So you need dwarf info for proper backtraces.
Many debuggers can link variables to their location on the stack, but I feel there should be a better visualizations of the stack.

How does a debugger know where a variable lives and how does it deal with relabelling?


.debug_info
Describe how the debug information can find if a variable is a constant, lives in memory or in a register.
Local variables: DW_TAG_variable
Parameters: DW_TAG_formal_parameter
Segger Ozone shows if a variable is in memory, register or is a compile constant.
Would be nice to have a tool that showed a marker in the left column for where a variable was defined when hovering.
Maybe I can write something in those lines using pyelftools

Outlining

A compiler may split a function into a hot and cold part. Happens a lot with JIT compilers
and I guess with whole program optimization. But is it worth bringing it up here?
Inlining


gdb pretends that the call site and the start of the inlined function are different instructions.
Stepi and nexti always show the inlined body though.
Settings breakpoints at the call site of an inlined function may not work.
Gdb may fail to locate the ret val of inlined calls after using the finish command.
Do different IDEs have different ways of displaying inlined functions?

Reordering


If code has been reordered you may find the debugger jumping back and forth
Show simple transformations that compilers do to loops: the canonical do-while form
Are there any clever ways of visualizing that, which I haven't heard of?

Volatile


Some things hinders the compiler from optimizing, such as pointers, volatile, compile barriers.
Discuss some tradeoffs.

C++ specific considerations


More aggressive inlining
More abstractions to see through
operator overloading
You may want to skip certain functions in your standard library when stepping
The joys of vtable overwrites (most likely out of scope)

Quality of Debuginfo


Clang and Gcc differs.
Mention something about differences between DWARF versions?, 3, 4, 5.
-g vs -g3

Tools


dwarfdump
readelf
pyelftools
Compare some IDEs?


.debug_abbrev   Abbrevations used in the .debug_info section
.debug_aranges   Lookup table for mapping addresses to compilation units
.debug_frame     Call Frame information
.debug_info      The core DWARF information section
.debug_line  Line number information
.debug_loc   Location lits used in DW_AT_location attributes
.debug_macinfo Macro information
.debug_pubnames Lookup table for mapping objects and function names to compilation units
.debug_pubtypes Lookup table for mapping type names to compilation units
.debug_ranges Address ranges unsed in DW_AT_ranges attributes
.debug_str  String table used in .debug_info