Skip to content

Instantly share code, notes, and snippets.

@simonhf
Created November 28, 2019 02:06
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save simonhf/489b1f3353a0ac095df01a1ddb6ad0e7 to your computer and use it in GitHub Desktop.
Save simonhf/489b1f3353a0ac095df01a1ddb6ad0e7 to your computer and use it in GitHub Desktop.
Which missing feature in gcc would make tracing function calls in userspace C/C++ applications easier

Which missing feature in gcc would make tracing function calls in userspace C/C++ applications easier

Background

Let's say we want to compile a larger C/C++ application with 80,000 unique functions so that at run-time the function call-tree is output for code comprehension and debugging purposes.

There are a couple of options to do this today which spring to mind:

Option 1: Manually markup the source code

  • This option only works if you've written all the code from scratch and marked up the source code as you've gone along.
  • It's going to be very tedious if you try to add instrumentation macros to 80,000 functions...
  • Plus, it makes the source code unnecessarily verbose to read.
  • Plus, it's sucky to have to add multiple leave instrumentation macros in case of multiple function returns.
void foo(void) {
    TRACE_ENTER(...);
    ...
    if (...) {
        ...
        TRACE_LEAVE(...);
        return;
    }
    ...
    TRACE_LEAVE(...);
}

Option 2: Automatically markup the source code using CastXML

  • This option automatically marks up the source code so in theory works easily with 80,000 functions or more.
  • It works by using CastXML to reveal the source file and line of every function which needs instrumentating.
  • The source file and line is then used to add an trace enter instrumentation macro to the very start of every function.
  • A trace leave instrumentation does not need to be added due to using gcc's attribute cleanup RAII mechanism.
  • The mechanism is described here [1], however disadvantages are:
  • It's necessary to generate a copy and mark up every source file at compile-time which slows the build.
  • CastXML must be run on every compiled file at compile-time which slows the build.
  • The attribute cleanup mechanism calls a cleanup function upon function exit, which is an unnecessary overhead.
  • The attribute cleanup mechanism pushes a pointer onto the local at each run-time, which is an unnecessary overhead.

[1] https://gist.github.com/simonhf/3d4da8597d878b1481eb77e38ca81c1b

Option 3: Automatically instrument the source code using systemtap

  • This option uses debug info from the application to automatically add systemtap probes to the enter and leave locations of each function.
  • The mechanism is described here [1] [2], however disadvantages are:
  • Instrumentation is automatic but uses USDT (Userspace Dtrace) probes which are a quick 'nop' until enabled, and then a hefty 'int3' with probe info getting piped via the kernel into a systemtap process in userspace, which is all an overhead.
  • Prior to the C/C++ application starting, systemtap creates a .ko kernel module on-the-fly which in my tests generate about 3.5 KB of source code for each function.
  • 80,000 functions * 3.5 KB is an enormous source file for gcc to compile and takes over 30 GB or RAM for gcc and over an hour to compile... if it doesn't run out of memory first.
  • On top of all these overheads, there appears to be no easy way to add your own data to the trace, such as how many times each function was called, or a dynamic per-function verbosity level, etc.

[1] https://sourceware.org/systemtap/examples/general/para-callgraph.txt [2] https://alex.dzyoba.com/blog/systemtap/

Option 4: Automatically instrument the source code using gcc

  • gcc has a command line option called -finstrument-functions [1] which automatically adds a call to these functions at the start and end of each function:
void __cyg_profile_func_enter (void *this_fn, void *call_site);
void __cyg_profile_func_exit  (void *this_fn, void *call_site);
  • There are many examples available of using this feature to trace function calls at run-time, e.g. [2].
  • And some tracing systems such as LTTng even have support for this mechanism [3].
  • However, there are disadvantages:
  • Taken from [2]: gcc embeds the functions calls which are always called. So there is the overhead of calling the function, and then deciding if the instrumentation is enabled, etc.
__attribute__((no_instrument_function))
void __cyg_profile_func_enter(void *this_fn, void *call_site) {
  if (CYG_Enabled) {
    CYG_Store(TRUE, call_site, this_fn);
  }
}
  • In addtion, there's an extra overhead for figuring out the name of the original function, based upon the address passed to gcc embedded call to the enter or leave function. There are various ways to lookup the name, but they all involve various amounts of overhead at run-time.
  • In additon, because everything is done on the stack, there's no easy way to implement any type of static context info for a unique function, e.g. containing data such as the number of times the function has been called, or what the verbosity of the function is, etc.

[1] https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html [2] https://mcuoneclipse.com/2015/04/04/poor-mans-trace-free-of-charge-function-entryexit-trace-with-gnu-tools/ [3] https://lttng.org/man/3/lttng-ust-cyg-profile/v2.9/

Which missing feature in gcc could make this easier and more flexible?

  • The feature would be very similar to the current gcc command line option -finstrument-functions but called something like -finstrument-macros.
  • However, instead of embedding calls to generic global functions __cyg_profile_func_enter and __cyg_profile_func_exit, it would embed macros instead.
  • And the macros would be given a unique parameter which is unique to the function being instrumented.
  • For example, consider the following two function example:
$ cat foo.c
void foo(void) {
}

int main(void) {
    foo();
    foo();
}
  • Using the to be implemented command line option -finstrument-macros then gcc would end up compiling the equivalent source file to:
void foo(void) {
// e.g. #define __FUNCTION_UNIQUE__ trace_file_foo_class_std_function_foo_args_void
    TRACE_ENTER(__FUNCTION_UNIQUE__);
    TRACE_LEAVE(__FUNCTION_UNIQUE__);
}

int main(void) {
// e.g. #define __FUNCTION_UNIQUE__ trace_file_foo_class_std_function_main_args_void
    TRACE_ENTER(__FUNCTION_UNIQUE__);
    foo();
    foo();
    TRACE_LEAVE(__FUNCTION_UNIQUE__);
}
  • The macros TRACE_ENTER and TRACE_LEAVE can be force included on the gcc command line using the --include command line option, e.g.:
  • And trace.h can contain any particular user implementation including global variables for context (unavaible with -finstrument-functions), etc.
  • gcc will auto generate a unique global identifier for a function AKA __FUNCTION_UNIQUE__ such as trace_file_foo_class_std_function_foo_args_void.
  • Note: In this example printf() is used but this is purely up to the user. More advanced code can be used to efficently log to shared memory, or whatever.
  • Note: In this example if(UNIQUE_FUNCTION.verbosity >= trace_verbosity) is used to dynamically enable or disable instrumentation for this function, however, more efficient code can be used such as 'jump labels' [1] which would make disabled function instrumentation have a similar low overhead to dtrace probes, etc.
$ gcc -c -o foo.o foo.c --include trace.h
$ cat trace.h
#include <stdio.h>

extern int trace_verbosity;

typedef struct TRACE_DATA {
    int verbosity;
    int count;
} TRACE_DATA;

#define TRACE_ENTER(UNIQUE_FUNCTION) \
    extern TRACE_DATA UNIQUE_FUNCTION; \
    if(UNIQUE_FUNCTION.verbosity >= trace_verbosity){ printf("- %s() // %s\n", __FUNCTION__, #UNIQUE_FUNCTION); }

#define TRACE_LEAVE(UNIQUE_FUNCTION) \
    extern /* __attribute__((weak)) */ TRACE_DATA UNIQUE_FUNCTION; \
    if(UNIQUE_FUNCTION.verbosity >= trace_verbosity){ printf("  } // %s() #%d\n", __FUNCTION__, ++ UNIQUE_FUNCTION.count); }
  • Before linking then the global function contexts can be discovered and a special new source file trace.c is generated, e.g.:
  • Note: This trace.c generation code can also be used to define custom defaults on a function by function basis, e.g. auto enabling or disabling run-time function instrumentation defaults at compile time.
  • Note: This will scale to 80,000 functions without resulting in RAM or compilation time issues.
  • Note: Extra code can be added as desired to inspect, traverse, and manipulate the trace data at run-time.
  • For example, during debugging we can add code like push_trace_verbosity(debug) in order to show more instrumentation only in function foo() and further in the call-tree.
$ nm foo.o | egrep "U trace_file_" | perl -lane 'sub BEGIN{ printf qq[int trace_verbosity = 1;\n]; }if(m~(trace_file_[^\s]+)~){ $uf=$1; printf qq[TRACE_DATA %s = {1, 0};\n], $uf }' | tee trace.c
int trace_verbosity = 1;
TRACE_DATA trace_file_foo_class_std_function_foo_args_void = {1, 0};
TRACE_DATA trace_file_foo_class_std_function_main_args_void = {1, 0};
$ gcc -c -o trace.o trace.c --include trace.h
  • Finally everything is linked and run:
  • In this example we can see the number of times the instrumented function has been called getting displayed via the function leave macro.
$ gcc -o foo foo.o trace.o && ./foo
- main() // trace_file_foo_class_std_function_main_args_void
- foo() // trace_file_foo_class_std_function_foo_args_void
  } // foo() #1
- foo() // trace_file_foo_class_std_function_foo_args_void
  } // foo() #2
  } // main() #1
  • Ideally, as well, there would be 3rd macro TRACE_DEBUG() which can be scattered in the source code as permanent instrumentation, e.g.:
  • This macro can also use the to be implemented new gcc macro __FUNCTION_UNIQUE__ to access static function instrumentation context data if necessary.
void foo(void) {
    TRACE_DEBUG("hello world");
}

[1] https://github.com/computee/s-keys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment