Skip to content

Instantly share code, notes, and snippets.

@simonhf
Last active October 29, 2019 19:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save simonhf/35a5d45d1498c0e451b579a0edd46768 to your computer and use it in GitHub Desktop.
Save simonhf/35a5d45d1498c0e451b579a0edd46768 to your computer and use it in GitHub Desktop.

Experiments with gcc auto instrumentration of C/C++ source code for run-time function call tree tracing

What are we trying to do?

Imagine a simple C project. Here is a very small one as an example:

#include <stdio.h>

extern int bar(int a);
static int foo(int a);

       int baz(int a) { return     a ; }
extern int bar(int a) { return baz(a); }
static int foo(int a) { return bar(a); }

int main() {
    printf("Hello World\n");
    int b = baz(1);
    return bar(foo(0));
}

The following comments illustrate example code you might add to output useful info at certain places in the code:

#include <stdio.h>

extern int bar(int a);
static int foo(int a);

       int baz(int a) { /* if (a) cwrap_log("- a=%d", a); */ return     a ; }
extern int bar(int a) {                                      return baz(a); }
static int foo(int a) {                                      return bar(a); }

int main() {
    printf("Hello World\n");
    int b = baz(1);
    // cwrap_log("- b=%d", b);
    return bar(foo(0));
}

With such a small project it's easy to see what's happening. Which function is calling which other function, etc. However, as the project gets bigger then it's more and more difficult to comprehend the source code by just reading it, or adding a few log lines here and there. Eventually with hundreds or thousands of source files it will be unpractical to quickly get into the source base by just reading the source code. Wouldn't it be great if we could compile a project and it would automatically generate a human friendly log at run-time showing us which functions called which other functions, e.g.:

T28106 0.000000s + baz() {
T28106 0.000080s   - a=1
T28106 0.000108s   } = 1 // baz() #1
T28106 0.000115s - b=1
T28106 0.000122s + foo() {
T28106 0.000123s   + bar() {
T28106 0.000130s     + baz() {} = 0 // #2
T28106 0.000143s     } = 0 // bar() #1
T28106 0.000150s   } = 0 // foo() #1
T28106 0.000157s + bar() {
T28106 0.000158s   + baz() {} = 0 // #3
T28106 0.000171s   } = 0 // bar() #2

Explanation of the above output log:

  • Each line starts with a (T)hread ID, in case the program is multi-threaded.
  • Next is the elapsed (s)econds since the start of the program.
  • Two log lines are generated for each function call; an entry line, and an exit line.
  • The exit line shows the integer result of the function.
  • The exit line shows the number of calls made to that function so far.
  • The entry and exit lines are collapsed to a single line if no log lines occur between them.

Other data could be added to the output too, such as human friendly function parameters, etc.

Possible implementation methods

gcc -finstrument-functions command line option

  • One way is to use gcc's -finstrument-functions command line switch [1].
  • This causes gcc to automatically call a generic entry and exit function for every function called at run-time.
  • The generic entry and exit functions are called __cyg_profile_func_enter() and __cyg_profile_func_exit().
  • This method is well documented with lots of examples that can be Googled, e.g. [2].
  • However, the main disadvantage is run-time speed:
  • Every function gets instrumented whether you like it or not, which comes with a performance penality due to extra code overhead at run-time.
  • Even if the instrumentation function does not output to the log, it likely still must run an if(verbosity) statement etc, which bloats and slows down the code at run-time.
  • There are gcc command line options to exclude files and functions from instrumentation, but it can then be annoying to have to re-compile every time instrumentation is changed.
  • Another disadvantage is that the generic instrumentation functions know nothing about the callers function parameters or return type.

[1] -finstrument-functions [2] etrace

gcc self generated wrapper functions invoked via indirect assembly language call

There are two ideas at play here:

  • Idea #1: Pretend to be gcc so when the makefile calls gcc it's actually running our script which implements extra business logic as well as running gcc :-)
  • In this way, we can potentially instrument a project without changing a single line of source code or makefile!
  • Idea #2: If we create a wrapper function for function foo(), and call foo_wrap() instead of foo(), then no matter how many return statements foo() has, control is always returned to foo_wrap() which foo() returns.
  • This means we need to somehow auto generate the wrapper functions which is easier said than done.
  • Idea #3: gcc generates assembly call statements to call function foo() but on Intel CPUs it's 'easy' to doctor these call statements into indirect call statements.
  • Yes, indirect call statements are slightly slower than direct call statements, but it's still much faster than the extra if(verbosity) statements and generally bloating code using the gcc built in intrumentation.
  • Because the indirect functions calls can be changed at run-time, this means many functions that we end up not being interested in, can be dynamically switched off without having to re-compile again.

This is the idea which we will experiment with further!

How to pretend to be gcc?

  • If the makefile correcty uses the environment variable $CC to invoke the compiler we can hijack it.
  • E.g. make CC="perl cwrap.pl gcc" will cause make to invoke perl cwrap.pl gcc whenever it would normally invoke gcc.
  • Any of the gcc line parameters normally passed to gcc will also be passed to the hijack script.
  • This means the hijack script can call the real gcc with the real command line parameters.
  • However, it can also do a bunch of other stuff like changing the command line parameters and/or changing the source code!

How to auto generate the wrapper functions?

  • One way to do this for C code is to use the gcc command line parameter -aux-info [1] which generates a list of all function prototypes.
  • The function prototypes can be used to create the wrapper functions.

[1] -aux-info

How to substitute direct assembly calls for indirect assembly calls?

  • By pretending to be gcc, instead of compiling .c files to .o files, we can compile .c files to .s assembly files, and then to .o object files.
  • After the .s assembly file is generated then we can change the Intel call foo assembly statements into call *indirect_foo(%rip) statements.
  • After the call statements have been doctored then the .o object files are created.

End to end example without instrumentation

$ make -f c-example.mak clean ; make                        -f c-example.mak && ./c-example ; ll c-example*
rm --force --verbose c-example c-example*cwrap.* c-example.o
cc -g -c c-example.c -o c-example.o
cc -g -o c-example c-example.o
Hello World
-rwxrwxr-x 1 simon simon 9992 Oct 26 15:21 c-example*
-rw-rw-r-- 1 simon simon  421 Oct 26 15:19 c-example.c
-rw-rw-r-- 1 simon simon  508 Oct 26 15:16 c-example.mak
-rw-rw-r-- 1 simon simon 4032 Oct 26 15:21 c-example.o

End to end example with instrumentation

  • Note: The same makefile and source file are used without any changes!
$ make -f c-example.mak clean ; make CC="perl cwrap.pl gcc" -f c-example.mak && ./c-example ; ll c-example* ; cat c-example.cwrap.out
rm --force --verbose c-example c-example*cwrap.* c-example.o
perl cwrap.pl gcc -g -c c-example.c -o c-example.o
Sat Oct 26 15:16:45 2019 0.056485 cwrap: gcc .. >                                                    c-example.o.cwrap.cc.out;    4 functions,    0 warnings
perl cwrap.pl gcc -g -o c-example c-example.o
Sat Oct 26 15:16:45 2019 0.070839 cwrap: gcc .. >                                                               .cwrap.cc.out;    1 objects  ,    0 libs
Hello World
-rwxrwxr-x 1 simon simon 18208 Oct 26 15:19 c-example*
-rw-rw-r-- 1 simon simon   421 Oct 26 15:19 c-example.c
-rw-rw-r-- 1 simon simon  8848 Oct 26 15:19 c-example.cwrap.1.aux
-rw-rw-r-- 1 simon simon  5776 Oct 26 15:19 c-example.cwrap.2.c
-rw-rw-r-- 1 simon simon 23401 Oct 26 15:19 c-example.cwrap.3.s
-rw-rw-r-- 1 simon simon 23567 Oct 26 15:19 c-example.cwrap.4.s
-rw-rw-r-- 1 simon simon  3793 Oct 26 15:19 c-example.cwrap.c
-rw-rw-r-- 1 simon simon  6792 Oct 26 15:19 c-example.cwrap.o
-rw-rw-r-- 1 simon simon   390 Oct 26 15:19 c-example.cwrap.out
-rw-rw-r-- 1 simon simon   508 Oct 26 15:16 c-example.mak
-rw-rw-r-- 1 simon simon 11584 Oct 26 15:19 c-example.o
-rw-rw-r-- 1 simon simon  2176 Oct 26 15:19 c-example.o.cwrap.cc.out
T03551 0.000000s + baz() {
T03551 0.000082s   - a=1
T03551 0.000108s   } = 1 // baz() #1
T03551 0.000117s - b=1
T03551 0.000125s + foo() {
T03551 0.000126s   + bar() {
T03551 0.000134s     + baz() {} = 0 // #2
T03551 0.000149s     } = 0 // bar() #1
T03551 0.000157s   } = 0 // foo() #1
T03551 0.000165s + bar() {
T03551 0.000166s   + baz() {} = 0 // #3
T03551 0.000181s   } = 0 // bar() #2

Examining c-example.cwrap.1.aux

  • We currently ignore the imported system API prototypes, but in theory these can also be instrumented via -aux-info.
$ cat c-example.cwrap.1.aux
/* compiled from: . */
/* /usr/include/libio.h:385:NC */ extern int __underflow (_IO_FILE *);
/* /usr/include/libio.h:386:NC */ extern int __uflow (_IO_FILE *);
/* /usr/include/libio.h:387:NC */ extern int __overflow (_IO_FILE *, int);
...
/* /usr/include/stdio.h:912:NC */ extern void flockfile (FILE *);
/* /usr/include/stdio.h:916:NC */ extern int ftrylockfile (FILE *);
/* /usr/include/stdio.h:919:NC */ extern void funlockfile (FILE *);
/* c-example.c:3:NC */ extern int bar (int);
/* c-example.c:4:NC */ static int foo (int);
/* c-example.c:6:NF */ extern int baz (int a); /* (a) int a; */
/* c-example.c:7:NF */ extern int bar (int a); /* (a) int a; */
/* c-example.c:8:NF */ static int foo (int a); /* (a) int a; */
/* c-example.c:10:OF */ extern int main (void); /* () */

Examining c-example.cwrap.2.c

  • Each source file gets appended with the instrumented functions.
  • Comments containing cwrap_log are automatically 'uncommented' :-)
  • And the function addresses of the original function, and wrapper function are also saved, together with the address which will actually be used at run-time.
  • In this example, instrumentation output is auto capped at 10,000 calls so that small, frequently called functions do not become log hogs :-)
  • There's lots of features which could be added on here in the future, e.g.:
    • Function parameters and return values if complicated types, e.g. by user provided helper functions?
    • Ability to call a mock function for dynamic run-time testing.
    • Keep statistics e.g. function call frequency and performance, and output these upon process exit.
$ cat c-example.cwrap.2.c
#include <stdio.h>

extern int bar(int a);
static int foo(int a);

       int baz(int a) {    if (a) cwrap_log("- a=%d", a);    return     a ; }
extern int bar(int a) {                                      return baz(a); }
static int foo(int a) {                                      return bar(a); }

int main() {
    printf("Hello World\n");
    int b = baz(1);
       cwrap_log("- b=%d", b);
    return bar(foo(0));
}


extern unsigned long long cwrap_count_call_baz;
extern int (* cwrap_indirect_orig_baz) (int a);
extern int (* cwrap_indirect_wrap_baz) (int a);
extern int (* cwrap_indirect_mock_baz) (int a);
static int cwrap_wrap_baz (int a)
{
    cwrap_count_call_baz ++;
    // todo: add per binary/file/function verbosity checking
    // todo: show function parameters for simple types, e.g. foo=123, bar=7
    if (cwrap_count_call_baz <= 10000) cwrap_log_push(1, 1, "+ baz() {");
    int lines = cwrap_log_get_lines();
    int cwrap_result = (* cwrap_indirect_orig_baz)(a); // call wrapped function
    if (cwrap_count_call_baz <= 10000) {
        if (0 == (cwrap_log_get_lines() - lines)) { cwrap_log_pop(-1, 0 /*    append */, "} = %d // "          "#%llu", cwrap_result, cwrap_count_call_baz); }
        else                                      { cwrap_log_pop(-1, 1 /* no append */, "} = %d // baz() #%llu", cwrap_result, cwrap_count_call_baz); }

    }
    // todo: show return value for simple types
    return cwrap_result;
}
int (* cwrap_indirect_orig_baz) (int a) =            baz;
int (* cwrap_indirect_wrap_baz) (int a) = cwrap_wrap_baz;
int (* cwrap_indirect_mock_baz) (int a) = cwrap_wrap_baz;
unsigned long long cwrap_count_call_baz = 0;

extern unsigned long long cwrap_count_call_bar;
extern int (* cwrap_indirect_orig_bar) (int a);
extern int (* cwrap_indirect_wrap_bar) (int a);
extern int (* cwrap_indirect_mock_bar) (int a);
static int cwrap_wrap_bar (int a)
{
    cwrap_count_call_bar ++;
    // todo: add per binary/file/function verbosity checking
    // todo: show function parameters for simple types, e.g. foo=123, bar=7
    if (cwrap_count_call_bar <= 10000) cwrap_log_push(1, 1, "+ bar() {");
    int lines = cwrap_log_get_lines();
    int cwrap_result = (* cwrap_indirect_orig_bar)(a); // call wrapped function
    if (cwrap_count_call_bar <= 10000) {
        if (0 == (cwrap_log_get_lines() - lines)) { cwrap_log_pop(-1, 0 /*    append */, "} = %d // "          "#%llu", cwrap_result, cwrap_count_call_bar); }
        else                                      { cwrap_log_pop(-1, 1 /* no append */, "} = %d // bar() #%llu", cwrap_result, cwrap_count_call_bar); }

    }
    // todo: show return value for simple types
    return cwrap_result;
}
int (* cwrap_indirect_orig_bar) (int a) =            bar;
int (* cwrap_indirect_wrap_bar) (int a) = cwrap_wrap_bar;
int (* cwrap_indirect_mock_bar) (int a) = cwrap_wrap_bar;
unsigned long long cwrap_count_call_bar = 0;

extern unsigned long long cwrap_count_call_foo_static_in_c_example_c;
extern int (* cwrap_indirect_orig_foo_static_in_c_example_c) (int a);
extern int (* cwrap_indirect_wrap_foo_static_in_c_example_c) (int a);
extern int (* cwrap_indirect_mock_foo_static_in_c_example_c) (int a);
static int cwrap_wrap_foo_static_in_c_example_c (int a)
{
    cwrap_count_call_foo_static_in_c_example_c ++;
    // todo: add per binary/file/function verbosity checking
    // todo: show function parameters for simple types, e.g. foo=123, bar=7
    if (cwrap_count_call_foo_static_in_c_example_c <= 10000) cwrap_log_push(1, 1, "+ foo() {");
    int lines = cwrap_log_get_lines();
    int cwrap_result = (* cwrap_indirect_orig_foo_static_in_c_example_c)(a); // call wrapped function
    if (cwrap_count_call_foo_static_in_c_example_c <= 10000) {
        if (0 == (cwrap_log_get_lines() - lines)) { cwrap_log_pop(-1, 0 /*    append */, "} = %d // "          "#%llu", cwrap_result, cwrap_count_call_foo_static_in_c_example_c); }
        else                                      { cwrap_log_pop(-1, 1 /* no append */, "} = %d // foo() #%llu", cwrap_result, cwrap_count_call_foo_static_in_c_example_c); }

    }
    // todo: show return value for simple types
    return cwrap_result;
}
int (* cwrap_indirect_orig_foo_static_in_c_example_c) (int a) =            foo;
int (* cwrap_indirect_wrap_foo_static_in_c_example_c) (int a) = cwrap_wrap_foo_static_in_c_example_c;
int (* cwrap_indirect_mock_foo_static_in_c_example_c) (int a) = cwrap_wrap_foo_static_in_c_example_c;
unsigned long long cwrap_count_call_foo_static_in_c_example_c = 0;

extern unsigned long long cwrap_count_call_main;
extern int (* cwrap_indirect_orig_main) (void);
extern int (* cwrap_indirect_wrap_main) (void);
extern int (* cwrap_indirect_mock_main) (void);
static int cwrap_wrap_main (void)
{
    cwrap_count_call_main ++;
    // todo: add per binary/file/function verbosity checking
    // todo: show function parameters for simple types, e.g. foo=123, bar=7
    if (cwrap_count_call_main <= 10000) cwrap_log_push(1, 1, "+ main() {");
    int lines = cwrap_log_get_lines();
    int cwrap_result = (* cwrap_indirect_orig_main)(); // call wrapped function
    if (cwrap_count_call_main <= 10000) {
        if (0 == (cwrap_log_get_lines() - lines)) { cwrap_log_pop(-1, 0 /*    append */, "} = %d // "          "#%llu", cwrap_result, cwrap_count_call_main); }
        else                                      { cwrap_log_pop(-1, 1 /* no append */, "} = %d // main() #%llu", cwrap_result, cwrap_count_call_main); }

    }
    // todo: show return value for simple types
    return cwrap_result;
}
int (* cwrap_indirect_orig_main) (void) =            main;
int (* cwrap_indirect_wrap_main) (void) = cwrap_wrap_main;
int (* cwrap_indirect_mock_main) (void) = cwrap_wrap_main;
unsigned long long cwrap_count_call_main = 0;

Examining c-example.cwrap.3.s and c-example.cwrap.4.s

  • The only changes in the assembly file are the type of call statements:
$ diff c-example.cwrap.3.s c-example.cwrap.4.s
56c56
<       call    baz@PLT
---
>       call     *cwrap_indirect_mock_baz@PLT(%rip)
78c78
<       call    bar@PLT
---
>       call     *cwrap_indirect_mock_bar@PLT(%rip)
108c108
<       call    baz@PLT
---
>       call     *cwrap_indirect_mock_baz@PLT(%rip)
118c118
<       call    foo
---
>       call     *cwrap_indirect_mock_foo_static_in_c_example_c@PLT(%rip)
120c120
<       call    bar@PLT
---
>       call     *cwrap_indirect_mock_bar@PLT(%rip)

Examining c-example.cwrap.c

  • This file is currently a minimalist, boilerplate logging function.
  • There's lots of features which could be added on here in the future, e.g.:
    • High resolution time stamps.
    • Multi-threaded logging to an in-memory buffer for faster output.
    • Ability for developer supplied log lines to be integrated with auto entry and exit instrumentation.
$ cat c-example.cwrap.c
#define _GNU_SOURCE         /* See feature_test_macros(7) */
#include <unistd.h>
#include <sys/syscall.h> /* for syscall(), SYS_gettid */
#include <stdio.h>
#include <stdarg.h>
#include <sys/time.h>

#define CWRAP_LOG_LINE_MAX (4096)

static          char     cwrap_log_spaces[] = "                                                                                                                                                                                                                                      ";
static __thread char     cwrap_log_line[CWRAP_LOG_LINE_MAX];
static __thread int      cwrap_log_line_pos = 0;
       __thread int      cwrap_log_lines = 0;
static __thread int      cwrap_log_indent = 0;
static          FILE   * cwrap_log_file = NULL;
static          int      cwrap_log_fd;
static          double   cwrap_log_time;

int cwrap_log_get_lines(void)
{
    return cwrap_log_lines;
}

double cwrap_get_time_in_seconds(void)
{
    struct timeval tv;

    gettimeofday(&tv, NULL);
    return (double)tv.tv_sec + 1.e-6 * (double)tv.tv_usec;
}

void cwrap_log_flush(void)
{
    fprintf(cwrap_log_file, "%s\n", &cwrap_log_line[0]);

    cwrap_log_line_pos = 0;
    fflush(cwrap_log_file);
    //fsync(cwrap_log_fd); // todo: fsync intermittantly?
}

void cwrap_log_push_v(int indent_direction, int no_append, const char * format, va_list argument_list)
{
    double time = cwrap_get_time_in_seconds();

    if (no_append && (cwrap_log_line_pos > 0)) {
        cwrap_log_flush();
    }

    if (NULL == cwrap_log_file) { // todo: make thread safe
        cwrap_log_file = fopen("c-example.cwrap.out","w+");
        cwrap_log_fd   = fileno(cwrap_log_file);
        cwrap_log_time = time;
    }

    double time_elapsed = time - cwrap_log_time;

    if (no_append) {
//  if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos +=  snprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, "%u ", cwrap_log_lines                            ); }
    if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos +=  snprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, "T%05ld ", syscall(SYS_gettid)                    ); } // linux; cat /proc/sys/kernel/pid_max
    if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos +=  snprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, "%fs ", time_elapsed                              ); }
    if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos +=  snprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, "%.*s", 2 * cwrap_log_indent, &cwrap_log_spaces[0]); } // note: % because of Perl
    }
    if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos += vsnprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, format, argument_list                              ); }

    cwrap_log_lines ++;
    cwrap_log_indent += indent_direction;
}

void cwrap_log_push(int indent_direction, int no_append, const char * format, ...)
{
    va_list argument_list;
    va_start(argument_list, format);

    cwrap_log_push_v(indent_direction, no_append, format, argument_list);

    va_end(argument_list);
}

void cwrap_log_pop(int indent_direction, int no_append, const char * format, ...)
{
    va_list argument_list;
    va_start(argument_list, format);

    cwrap_log_push_v(indent_direction, no_append, format, argument_list);
    cwrap_log_flush();

    va_end(argument_list);
}

void cwrap_log(const char * format, ...)
{
    va_list argument_list;
    va_start(argument_list, format);

    cwrap_log_push_v(0 /* no indent_direction */, 1 /* no append */, format, argument_list);
    cwrap_log_flush();

    va_end(argument_list);
}

Examining c-example.cwrap.c

  • During the build then extra commands are generated, ran, and logged.
  • For a 'cwrap' build then CWRAP is defined.
  • And the auto generated header file cwrap.h is included by force on the command line.
$ cat c-example.o.cwrap.cc.out
Sat Oct 26 15:22:40 2019 0.000246s cwrap: debug: arguments       : gcc -g -c c-example.c -o c-example.o
Sat Oct 26 15:22:40 2019 0.000310s cwrap: debug: deps_file_stem  :
Sat Oct 26 15:22:40 2019 0.000336s cwrap: debug: input_file      :c-example.c
Sat Oct 26 15:22:40 2019 0.000358s cwrap: debug: input_file_stem :c-example
Sat Oct 26 15:22:40 2019 0.000379s cwrap: debug: input_file_ext  :.c
Sat Oct 26 15:22:40 2019 0.000400s cwrap: debug: output_file     :c-example.o
Sat Oct 26 15:22:40 2019 0.000420s cwrap: debug: output_file_stem:c-example
Sat Oct 26 15:22:40 2019 0.000440s cwrap: debug: output_file_ext :.o
Sat Oct 26 15:22:40 2019 0.000526s cwrap: step 1: c->aux: running: gcc -g -c c-example.c -aux-info c-example.cwrap.1.aux -o /dev/null -DCWRAP -include .cwrap.h >> c-example.o.cwrap.cc.out 2>&1
Sat Oct 26 15:22:40 2019 0.021158s cwrap: step 2: c->c  : read  bytes of aux file: 8848 < c-example.cwrap.1.aux
Sat Oct 26 15:22:40 2019 0.021222s cwrap: step 2: c->c  : read  bytes of c   file: 421 < c-example.c
Sat Oct 26 15:22:40 2019 0.022171s cwrap: step 2: c->c  : wrote bytes to c   file: 5776 > c-example.cwrap.2.c
Sat Oct 26 15:22:40 2019 0.022197s cwrap: step 3: c->s  : running:  gcc -g -c c-example.cwrap.2.c -fPIC -S -o c-example.cwrap.3.s -DCWRAP -include .cwrap.h >> c-example.o.cwrap.cc.out 2>&1
Sat Oct 26 15:22:40 2019 0.045515s cwrap: step 4:  ->Tpo: skipping .Tpo
Sat Oct 26 15:22:40 2019 0.046002s cwrap: step 5: s->s  : read  bytes of s   file: 23401 bytes / 1333 lines < c-example.cwrap.3.s
Sat Oct 26 15:22:40 2019 0.046307s cwrap: step 5: s->s  : unique function ignored:   1 times puts
Sat Oct 26 15:22:40 2019 0.046343s cwrap: step 5: s->s  : found 6 call lines, replaced 5 calls with 3 unique functions, 1 unique functions ignored
Sat Oct 26 15:22:40 2019 0.046605s cwrap: step 5: s->s  : wrote bytes to s   file: 23567 > c-example.cwrap.4.s
Sat Oct 26 15:22:40 2019 0.046665s cwrap: step 6: s->o  : running: as c-example.cwrap.4.s -o c-example.o >> c-example.o.cwrap.cc.out 2>&1
Sat Oct 26 15:22:40 2019 0.052439s cwrap: gcc .. >                                                    c-example.o.cwrap.cc.out;    4 functions,    0 warnings

Examining .cwrap.h

$ cat .cwrap.h
#ifdef __cplusplus
extern "C" {
#endif
extern void cwrap_log(const char * format, ...);
extern void cwrap_log_pop(int indent_direction, int no_append, const char * format, ...);
extern void cwrap_log_push(int indent_direction, int no_append, const char * format, ...);
extern int  cwrap_log_get_lines(void);
#ifdef __cplusplus
}
#endif

What about auto instrumenting a C++ project?

  • Auto instrumenting C++ projects using the above technique is a little more difficult.
  • Todo: Get this working!

Example C++ source code to instrument

#include <iostream>

int baz(int a) { /* if (a) cwrap_log("- a=%d", a); */ return     a ; }
int bar(int a) {                                      return baz(a); }

class Foo
{
    private:
    int doitPrivate(int a) { return bar(a); }

    public:
    Foo() {
        std::cout << "constructing Foo\n";
    }
    int doitPublic(int a) { return doitPrivate(a); }
    ~Foo() {
        std::cout << "deconstructing Foo\n";
    }
};

int main()
{
    std::cout << "Hello World\n";
    int b = baz(1);
    // cwrap_log("- b=%d", b);
    Foo foo;
    return bar(foo.doitPublic(0));
}

Todo: An alternative to -aux-info?

  • Unfortunately the gcc command line parameter -aux-info only works with C and not C++ source code.
  • One possibility is to use ctags to discover the function prototypes, e.g.:
$ ctags -f - --format=2 --c++-kinds=+p --fields=+iaS --language-force=C c-example.c
bar     c-example.c     /^int bar(int a) { return baz(a); }$/;" f       signature:(int a)
baz     c-example.c     /^int baz(int a) { return     a ; }$/;" f       signature:(int a)
foo     c-example.c     /^int foo(int a) { return bar(a); }$/;" f       signature:(int a)
main    c-example.c     /^int main()$/;"        f

$ ctags -f - --format=2 --c++-kinds=+p --fields=+iaS --language-force=C++ cpp-example.cpp
Foo     cpp-example.cpp /^    Foo() {$/;"       f       class:Foo       access:public   signature:()
Foo     cpp-example.cpp /^class Foo$/;" c       file:
bar     cpp-example.cpp /^int bar(int a) { return baz(a); }$/;" f       signature:(int a)
baz     cpp-example.cpp /^int baz(int a) { return     a ; }$/;" f       signature:(int a)
doitPrivate     cpp-example.cpp /^    int doitPrivate(int a) { return bar(a); }$/;"     f       class:Foo       file:   access:private  signature:(int a)
doitPublic      cpp-example.cpp /^    int doitPublic(int a) { return doitPrivate(a); }$/;"      f       class:Foo       access:public   signature:(int a)
main    cpp-example.cpp /^int main()$/;"        f       signature:()
~Foo    cpp-example.cpp /^    ~Foo() {$/;"      f       class:Foo       access:public   signature:()

Todo: Add code to pretend gcc script handling gcc name mangling

  • Here are examples of names being mangled in C++, versus non-mangled names in C:
$ egrep "^\s+"call c-example.cwrap.3.s | egrep -i "(foo|bar|baz)"
        call    baz
        call    bar
        call    foo
        call    bar

$ egrep "^\s+"call cpp-example.cwrap.3.s | egrep -i "(foo|bar|baz)"
        call    _Z3bazi
        call    _Z3bari
        call    _ZN3Foo11doitPrivateEi
        call    _ZN3FooC1Ev
        call    _ZN3Foo10doitPublicEi
        call    _Z3bari
        call    _ZN3FooD1Ev
#include <stdio.h>
extern int bar(int a);
static int foo(int a);
int baz(int a) { /* if (a) cwrap_log("- a=%d", a); */ return a ; }
extern int bar(int a) { return baz(a); }
static int foo(int a) { return bar(a); }
int main() {
printf("Hello World\n");
int b = baz(1);
// cwrap_log("- b=%d", b);
return bar(foo(0));
}
# $ make -f c-example.mak clean ; make -f c-example.mak && ./c-example ; ll c-example*
# $ make -f c-example.mak clean ; make CC="perl cwrap.pl gcc" -f c-example.mak && ./c-example ; ll c-example* ; cat c-example.cwrap.out
TARGET = c-example
OBJECT = $(TARGET).o
SOURCE = $(TARGET).c
$(TARGET): $(OBJECT)
$(CC) -g -o $(TARGET) $(OBJECT)
$(OBJECT) : $(SOURCE)
$(CC) -g -c $(SOURCE) -o $(OBJECT)
all : $(TARGET)
clean :
rm --force --verbose $(TARGET) $(TARGET)*cwrap.* $(OBJECT)
#include <iostream>
int baz(int a) { /* if (a) cwrap_log("- a=%d", a); */ return a ; }
int bar(int a) { return baz(a); }
class Foo
{
private:
int doitPrivate(int a) { return bar(a); }
public:
Foo() {
std::cout << "constructing Foo\n";
}
int doitPublic(int a) { return doitPrivate(a); }
~Foo() {
std::cout << "deconstructing Foo\n";
}
};
int main()
{
std::cout << "Hello World\n";
int b = baz(1);
// cwrap_log("- b=%d", b);
Foo foo;
return bar(foo.doitPublic(0));
}
# $ make -f cpp-example.mak clean ; make -f cpp-example.mak && ./cpp-example ; ll cpp-example*
# $ make -f cpp-example.mak clean ; make CC="perl cwrap.pl g++" -f cpp-example.mak && ./cpp-example ; ll cpp-example* ; cat cpp-example.cwrap.out
TARGET = cpp-example
OBJECT = $(TARGET).o
SOURCE = $(TARGET).cpp
$(TARGET): $(OBJECT)
$(CC) -g -o $(TARGET) $(OBJECT) -lstdc++
$(OBJECT) : $(SOURCE)
$(CC) -g -c $(SOURCE) -o $(OBJECT)
all : $(TARGET)
clean :
rm --force --verbose $(TARGET) $(TARGET).cwrap.* $(OBJECT)
# Copyright (c) 2012 Simon Hardy-Francis.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
# http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
# http://www.mindfruit.co.uk/2012/06/relocations-relocations.html
# http://www.akkadia.org/drepper/dsohowto.pdf
# $ time make CC="perl ~/work/20121228-cwrap/cwrap.pl gcc"
# todo: compile with only auxinfo for fast but summarized build
# todo: compile c only once with auxinfo, create c2 with mock functions, munge s files together == faster
use strict;
use Time::HiRes;
use File::Basename;
$| ++;
sub get_stem_and_ext {
my ($path_file_exit) = @_;
#debug printf qq[debug: (?, ?, ?)=fileparse(%s)\n], $path_file_exit;
my ( $f, $d, $s ) = fileparse($path_file_exit, "", qr/\.[^.]*/); # suffix maybe nothing or .something
$d = '' if ($d eq './');
my $stem;
$stem = $d if ($d ne ''); ; # e.g. 'connection' in 'connection.o'
$stem .= $f ; # e.g. 'connection' in 'connection.o'
my $ext = $s ; # e.g. '.o' in 'connection.o'
#debug printf qq[debug: ($f, $d, $s)=fileparse(%s) // %s=stem %s=ext\n], $path_file_exit, $stem, $ext;
return ($stem, $ext);
}
my $start_time = Time::HiRes::time;
# e.g. -DHAVE_CONFIG_H -I. -I../.. -DSHARE_DATADIR="\"/usr/local/share\"" -DLOCALSTATEDIR="\"/usr/local/var\"" -DBINDIR="\"/usr/local/bin\"" -I../../src/common -g -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all -Wstack-protector -fwrapv - -param ssp-buffer-size=1 -fPIE -Wall -fno-strict-aliasing -MT connection.o -MD -MP -MF . deps/connection.Tpo -c -o connection.o connection.c
my $arguments = "";
for ( my $c = 0; $c < @ARGV; $c++ ) {
my $tmp = $ARGV[$c];
$tmp =~ s~"~\\"~g;
$arguments .= ' ' . $tmp;
}
my $sub_end_hint ;
my $c2s ;
my $c2s2 ;
my $deps_file_stem ;
my $input_file ;
my $input_file_stem ;
my $input_file_ext ;
my $output_file ;
my $output_file_stem;
my $output_file_ext ;
my $output_file_log ;
my $s2o ;
if ( $ARGV[0] eq 'ar' ) {
$c2s = $arguments; # e.g. ar cru libtor.a buffers.o circuitbuild.o circuitlist.o circuituse.o command.o config.o connection.o connection_edge.o connection_or.o control.o cpuworker.o directory.o dirserv.o dirvote.o dns.o dnsserv.o geoip.o hibernate.o main.o microdesc.o networkstatus.o nodelist.o onion.o transports.o policies.o reasons.o relay.o rendclient.o rendcommon.o rendmid.o rendservice.o rephist.o router.o routerlist.o routerparse.o status.o config_codedigest.o
( $output_file ) = $c2s =~ m~\s+([^ ]+\.a\b)~ ; # e.g. 'libtor.a' in '-o libtor.a'
( $output_file_stem, # e.g. 'libtor' in 'libtor.a'
$output_file_ext ) = get_stem_and_ext($output_file) ; # e.g. '.a' in 'libtor.a'
( $output_file_log ) = $input_file_stem . $output_file_ext . '.cwrap.ar.out'; # e.g. 'libtor.a.cwrap.ar.out'
$sub_end_hint = sprintf qq[; %4u objects], scalar $arguments =~ s~(\.o)\b~$1~g;
}
elsif ( $ARGV[0] =~ m~^(gcc|g\+\+)$~ ) {
# -aux-info filename
# Output to the given filename prototyped declarations for all functions declared and/or defined in a translation unit, including those in header files. This option is silently ignored in any language other than C.
# Besides declarations, the file indicates, in comments, the origin of each declaration (source file and line), whether the declaration was implicit, prototyped or unprototyped (`I', `N' for new or `O' for old, respectively, in the first character after the line number and the colon), and whether it came from a declaration or a definition (`C' or `F', respectively, in the following character). In the case of function definitions, a K&R-style list of arguments followed by their declarations is also provided, inside comments, after the declaration.
# $ ctags -f - --format=2 --c++-kinds=+p --fields=+iaS --language-force=C c-example.c
# bar c-example.c /^int bar(int a) { return baz(a); }$/;" f signature:(int a)
# baz c-example.c /^int baz(int a) { return a ; }$/;" f signature:(int a)
# foo c-example.c /^int foo(int a) { return bar(a); }$/;" f signature:(int a)
# main c-example.c /^int main()$/;" f
# $ egrep "^\s+"call c-example.cwrap.3.s | egrep -i "(foo|bar|baz)"
# call baz
# call bar
# call foo
# call bar
# Consider using ctags for C++? e.g.:
# $ ctags -f - --format=2 --c++-kinds=+p --fields=+iaS --language-force=C++ cpp-example.cpp
# Foo cpp-example.cpp /^ Foo() {$/;" f class:Foo access:public signature:()
# Foo cpp-example.cpp /^class Foo$/;" c file:
# bar cpp-example.cpp /^int bar(int a) { return baz(a); }$/;" f signature:(int a)
# baz cpp-example.cpp /^int baz(int a) { return a ; }$/;" f signature:(int a)
# doitPrivate cpp-example.cpp /^ int doitPrivate(int a) { return bar(a); }$/;" f class:Foo file: access:private signature:(int a)
# doitPublic cpp-example.cpp /^ int doitPublic(int a) { return doitPrivate(a); }$/;" f class:Foo access:public signature:(int a)
# main cpp-example.cpp /^int main()$/;" f signature:()
# ~Foo cpp-example.cpp /^ ~Foo() {$/;" f class:Foo access:public signature:()
# How to understand C++ marking up? e.g.:
# Note: https://en.wikipedia.org/wiki/Name_mangling
# Note: https://stackoverflow.com/questions/41524956/gcc-c-name-mangling-reference
# Note: http://www.int0x80.gr/papers/name_mangling.pdf
# Note: http://demangler.com/
# $ egrep "^\s+"call cpp-example.cwrap.3.s | egrep -i "(foo|bar|baz)"
# call _Z3bazi
# call _Z3bari
# call _ZN3Foo11doitPrivateEi
# call _ZN3FooC1Ev
# call _ZN3Foo10doitPublicEi
# call _Z3bari
# call _ZN3FooD1Ev
#our $gcc = `which clang`; chomp $gcc; # clang doesn't support -aux-info
#our $gcc = `which gcc`; chomp $gcc;
$c2s = $arguments;
$c2s2 = $arguments;
( $deps_file_stem ) = $arguments =~ m~-MF (.*\.deps/[^\.]+).Tpo~ ; # e.g. '.deps/util' in '-MF .deps/util.Tpo'
( $input_file ) = $c2s =~ m~\s+([^ ]+\.cp*)~ ; # e.g. 'connection.c'
( $input_file_stem, # e.g. 'connection' in 'connection.c'
$input_file_ext ) = get_stem_and_ext($input_file) if ($input_file ne '') ; # e.g. '.c' in 'connection.c'
( $output_file ) = $c2s =~ m~\-o\s+([^ ]+)~ ; # e.g. 'connection.o' in '-o connection.o'
( $output_file_stem, # e.g. 'connection' in 'connection.o'
$output_file_ext ) = get_stem_and_ext($output_file) ; # e.g. '.o' in 'connection.o'
( $output_file_log ) = $input_file_stem . $output_file_ext . '.cwrap.cc.out'; # e.g. 'connection.o.cwrap.compile.out'
die qq[ERROR: not -o <file> detected in arguments: ] . $arguments if ( not $output_file );
$c2s =~ s~\s+\-MT\s+[^ ]+~~; # remove -MT util.o
$c2s =~ s~\s+\-MD\b~~ ; # remove -MD -MP
$c2s =~ s~\s+\-MP\b~~ ; # remove -MD -MP
$c2s =~ s~\s+\-MF\s+[^ ]+~~; # remove -MF .deps/util.Tpo
$c2s =~ s~ -o\s+$output_file_stem\.o\b~ -aux-info $input_file_stem.cwrap.1.aux -o /dev/null~; # NOTE: precompiled headers appear to be incompatible with -aux-info option :-(
$c2s2 =~ s~ -o\s+$output_file_stem\.o\b~ -fPIC -S -o $input_file_stem.cwrap.3.s~; # todo: do not add in -fPIC if it's already present
$c2s2 =~ s~ $input_file\b~ $input_file_stem.cwrap.2.c~;
$s2o = sprintf qq[as $input_file_stem.cwrap.4.s -o $output_file_stem.o] if ( $output_file_stem ); # -al
$sub_end_hint = sprintf qq[; %4u objects , %4u libs], scalar $arguments =~ s~(\.o)\b~$1~g, scalar $arguments =~ s~(\.a)\b~$1~g;
}
else {
die qq[ERROR: unrecognized first argument:] . $arguments;
}
my $arguments_short = sprintf qq[%-3s .. > %75s], $ARGV[0], $output_file_log;
sub END {
printf qq[%s %f cwrap: %s%s\n], scalar localtime, Time::HiRes::time - $start_time, $arguments_short, $sub_end_hint;
open ( my $fd_output_file_log, ">>", $output_file_log ) || die qq[];
printf $fd_output_file_log qq[%s %fs cwrap: %s%s\n], scalar localtime, Time::HiRes::time - $start_time, $arguments_short, $sub_end_hint;
close $fd_output_file_log;
}
$c2s .= ' -DCWRAP -include .cwrap.h';
$c2s2 .= ' -DCWRAP -include .cwrap.h';
$c2s .= ' >> ' . $output_file_log . ' 2>&1';
$c2s2 .= ' >> ' . $output_file_log . ' 2>&1';
$s2o .= ' >> ' . $output_file_log . ' 2>&1';
my $cwrap_log_not_first_time;
sub cwrap_log {
my ( $printf_format ) = shift;
open ( my $fd_output_file_log, defined $cwrap_log_not_first_time ? '>>' : '>', $output_file_log ) || die sprintf qq[ERROR: cannot open log file: %s: $!], $output_file_log;
printf $fd_output_file_log qq[%s %fs cwrap: ] . $printf_format . qq[\n], scalar localtime, Time::HiRes::time - $start_time, @_;
close $fd_output_file_log;
$cwrap_log_not_first_time = 1;
}
cwrap_log qq[debug: arguments :%s], $arguments ;
cwrap_log qq[debug: deps_file_stem :%s], $deps_file_stem ;
cwrap_log qq[debug: input_file :%s], $input_file ;
cwrap_log qq[debug: input_file_stem :%s], $input_file_stem ;
cwrap_log qq[debug: input_file_ext :%s], $input_file_ext ;
cwrap_log qq[debug: output_file :%s], $output_file ;
cwrap_log qq[debug: output_file_stem:%s], $output_file_stem;
cwrap_log qq[debug: output_file_ext :%s], $output_file_ext ;
die sprintf qq[ERROR: regex sanity check failed: output_file parts mismatch: '%s'=output_file ne ('%s'=output_file_stem . '%s'=output_file_exit)], $output_file, $output_file_stem, $output_file_ext if($output_file ne ($output_file_stem . $output_file_ext));
open ( my $fd_h_file, ">", ".cwrap.h" ) || die qq[];
printf $fd_h_file <<EOL;
#ifdef __cplusplus
extern "C" {
#endif
extern void cwrap_log(const char * format, ...);
extern void cwrap_log_pop(int indent_direction, int no_append, const char * format, ...);
extern void cwrap_log_push(int indent_direction, int no_append, const char * format, ...);
extern int cwrap_log_get_lines(void);
#ifdef __cplusplus
}
#endif
EOL
close $fd_h_file;
if ( $output_file_ext eq '.o' ) {
cwrap_log qq[step 1: c->aux: running:%s], $c2s ;
}
elsif ( $output_file_ext eq '.a' ) {
cwrap_log qq[step 1: o->lib: running:%s], $c2s ;
}
else { # come here if linking exe
open ( my $fd_c_file, ">", $output_file_stem . '.cwrap.c' ) || die qq[];
printf $fd_c_file <<EOL;
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <unistd.h>
#include <sys/syscall.h> /* for syscall(), SYS_gettid */
#include <stdio.h>
#include <stdarg.h>
#include <sys/time.h>
#define CWRAP_LOG_LINE_MAX (4096)
static char cwrap_log_spaces[] = " ";
static __thread char cwrap_log_line[CWRAP_LOG_LINE_MAX];
static __thread int cwrap_log_line_pos = 0;
__thread int cwrap_log_lines = 0;
static __thread int cwrap_log_indent = 0;
static FILE * cwrap_log_file = NULL;
static int cwrap_log_fd;
static double cwrap_log_time;
int cwrap_log_get_lines(void)
{
return cwrap_log_lines;
}
double cwrap_get_time_in_seconds(void)
{
struct timeval tv;
gettimeofday(&tv, NULL);
return (double)tv.tv_sec + 1.e-6 * (double)tv.tv_usec;
}
void cwrap_log_flush(void)
{
fprintf(cwrap_log_file, "%%s\\n", &cwrap_log_line[0]);
cwrap_log_line_pos = 0;
fflush(cwrap_log_file);
//fsync(cwrap_log_fd); // todo: fsync intermittantly?
}
void cwrap_log_push_v(int indent_direction, int no_append, const char * format, va_list argument_list)
{
double time = cwrap_get_time_in_seconds();
if (no_append && (cwrap_log_line_pos > 0)) {
cwrap_log_flush();
}
if (NULL == cwrap_log_file) { // todo: make thread safe
cwrap_log_file = fopen("$output_file_stem.cwrap.out","w+");
cwrap_log_fd = fileno(cwrap_log_file);
cwrap_log_time = time;
}
double time_elapsed = time - cwrap_log_time;
if (no_append) {
// if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos += snprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, "%%u ", cwrap_log_lines ); }
if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos += snprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, "T%%05ld ", syscall(SYS_gettid) ); } // linux; cat /proc/sys/kernel/pid_max
if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos += snprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, "%%fs ", time_elapsed ); }
if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos += snprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, "%%.*s", 2 * cwrap_log_indent, &cwrap_log_spaces[0]); } // note: %% because of Perl
}
if (cwrap_log_line_pos < CWRAP_LOG_LINE_MAX) { cwrap_log_line_pos += vsnprintf(&cwrap_log_line[cwrap_log_line_pos], CWRAP_LOG_LINE_MAX - cwrap_log_line_pos, format, argument_list ); }
cwrap_log_lines ++;
cwrap_log_indent += indent_direction;
}
void cwrap_log_push(int indent_direction, int no_append, const char * format, ...)
{
va_list argument_list;
va_start(argument_list, format);
cwrap_log_push_v(indent_direction, no_append, format, argument_list);
va_end(argument_list);
}
void cwrap_log_pop(int indent_direction, int no_append, const char * format, ...)
{
va_list argument_list;
va_start(argument_list, format);
cwrap_log_push_v(indent_direction, no_append, format, argument_list);
cwrap_log_flush();
va_end(argument_list);
}
void cwrap_log(const char * format, ...)
{
va_list argument_list;
va_start(argument_list, format);
cwrap_log_push_v(0 /* no indent_direction */, 1 /* no append */, format, argument_list);
cwrap_log_flush();
va_end(argument_list);
}
EOL
close $fd_c_file;
my $compile_cwrap = sprintf qq[ gcc -O2 -fPIC -c -o %s.cwrap.o %s.cwrap.c >> %s 2>&1], $output_file_stem, $output_file_stem, $output_file_log;
cwrap_log qq[step 1: ->c : running:%s], $compile_cwrap;
system $compile_cwrap;
$c2s =~ s~(\s+)([^\.\s]+\.o\s+)~$1$output_file_stem.cwrap.o $2~; # insert brand new .o file containing logging code etc
cwrap_log qq[step 2: o->exe: running:%s], $c2s;
}
system $c2s;
if ($? == -1) {
print "failed to execute: $!\n";
}
elsif ($? & 127) {
printf "child died with signal %d, %s coredump\n", ($? & 127), ($? & 128) ? 'with' : 'without';
}
else {
my $error_code = $? >> 8;
if ( $error_code > 0 ) {
system 'cat ' . $output_file_log;
$sub_end_hint = sprintf qq[; ERROR: failed with error code: %u], $error_code;
exit ( $error_code );
}
}
# router.cwrap.1.aux:
# ...
# /* router.c:102:NF */ extern crypto_pk_t *get_onion_key (void); /* () */
# ...
# /* router.c:1461:NF */ extern extrainfo_t *router_get_my_extrainfo (void); /* () */
# ...
# /* router.c:1796:NF */ extern void check_descriptor_bandwidth_changed (time_t now); /* (now) time_t now; */
# /* router.c:1820:NF */ static void log_addr_has_changed (int severity, const tor_addr_t *prev, const tor_addr_t *cur, const char *source); /* (severity, prev, cur, source) int severity; const tor_addr_t *prev; const tor_addr_t *cur; const char *source; */
# /* router.c:1848:NF */ extern void check_descriptor_ipaddress_changed (time_t now); /* (now) time_t now; */
# /* router.c:1882:NF */ extern void router_new_address_suggestion (const char *suggestion, const dir_connection_t *d_conn); /* (suggestion, d_conn) const char *suggestion; const dir_connection_t *d_conn; */
# /* router.c:1942:NF */ static int router_guess_address_from_dir_headers (uint32_t *guess); /* (guess) uint32_t *guess; */
# /* router.c:1956:NF */ extern void get_platform_str (char *platform, size_t len); /* (platform, len) char *platform; size_t len; */
# ...
# /* router.c:2503:NF */ extern const char *format_node_description (char *buf, const char *id_digest, int is_named, const char *nickname, const tor_addr_t *addr, uint32_t addr32h); /* (buf, id_digest, is_named, nickname, addr, addr32h) char *buf; const char *id_digest; int is_named; const char *nickname; const tor_addr_t *addr; uint32_t addr32h; */
# ...
# const char *shadow_format_node_description (char *buf, const char *id_digest, int is_named, const char *nickname, const tor_addr_t *addr, uint32_t addr32h) {
# printf("format_node_description()\n");
# const char * result = format_node_description(buf, id_digest, is_named, nickname, addr, addr32h);
# printf("return\n");
# return result;
# }
# .globl shadow_router_dump_router_to_string
# .section .data.rel,"aw",@progbits
# .align 8
# .type shadow_router_dump_router_to_string, @object
# .size shadow_router_dump_router_to_string, 8
# shadow_router_dump_router_to_string:
# .quad router_dump_router_to_string
# 8164 4251 E8000000 call _tor_malloc@PLT
# 8164 00
#
# 8168 4259 E8000000 call get_server_identity_key
# 8168 00
#
# 8174 426c E8000000 call shadow_router_dump_router_to_string <-- this
# 8174 00
#
# 8174 426c FF142500 call *shadow_router_dump_router_to_string <-- or this!
# 8174 000000
#
# 12689 .globl shadow_router_dump_router_to_string
# 12690 .section .data.rel,"aw",@progbits
# 12691 .align 8
# 12692 .type shadow_router_dump_router_to_string, @object
# 12693 .size shadow_router_dump_router_to_string, 8
# 12694 shadow_router_dump_router_to_string:
# 12695 0000 00000000 .quad other_router_dump_router_to_string
# 12695 00000000
#
# 37303 cfd6 00000000 .quad shadow_router_dump_router_to_string
# 37303 00000000
if ( $output_file_ext eq '.o' ) {
my $functions;
my $function_by_name;
{
my $aux_file = $input_file_stem . '.cwrap.1.aux';
open ( my $fd_aux_file, "<", $aux_file ) || die qq[];
sysread ( $fd_aux_file, my $aux_info, 9_999_999 );
close $fd_aux_file;
cwrap_log qq[step 2: c->c : read bytes of aux file: %u < %s], length $aux_info, $aux_file;
open ( my $fd_c_file, "<", $input_file ) || die qq[ERROR: cannot open: ] . $input_file;
sysread ( $fd_c_file, my $c_file, 9_999_999 );
close $fd_c_file;
cwrap_log qq[step 2: c->c : read bytes of c file: %u < %s], length $c_file, $input_file;
while ($c_file =~ s~\/\*(.*cwrap_log.*)\*\/~ $1 ~gm) {} # uncomment lines that look like: /* cwrap_log("foo") */ <-- todo: come up with a better regex :-)
while ($c_file =~ s~\/\/(.*cwrap_log.*)~ $1~gm) {} # uncomment lines that look like: // cwrap_log("foo") <-- todo: come up with a better regex :-)
my $c2_file = $input_file_stem . '.cwrap.2.c';
open ( my $fd_c_file, ">", $c2_file ) || die qq[];
syswrite ( $fd_c_file, $c_file, length $c_file );
printf $fd_c_file qq[\n];
# printf $fd_c_file <<EOL;
#//extern int cwrap_log_get_lines(void);
#//extern void cwrap_log(int indent_direction, int no_append, const char * format, ...);
#//extern void cwrap_log_push(int indent_direction, int no_append, const char * format, ...);
#
#EOL
my $static_funcion_count = 0;
# /* /usr/include/x86_64-linux-gnu/sys/stat.h:493:NF */ extern int mknodat (int __fd, const char *__path, __mode_t __mode, __dev_t __dev); /* (__fd, __path, __mode, __dev) int __fd; const char *__path; __mode_t __mode; __dev_t __dev; */
# static int router_guess_address_from_dir_headers (uint32_t *); <no comment!!!> <-- special case!
while ( $aux_info =~ m~/\* ([^:]+:\d+:..) \*/ ([^\n\r]+)~gis ) { # /* ../../src/common/address.h:110:NF */ static sa_family_t tor_addr_family (const tor_addr_t *a); /* (a) const tor_addr_t *a; */
my ( $pre_line, $line_rest ) = ( $1, $2 ); # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
my ( $scope, $prototype, $comment ) = $line_rest =~ m~^(extern|static) ([^;]+);(.*)$~; # ^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <-- comment optional & only for 'NF' $decl_type
my ( $return, $function, $params ) = $prototype =~ m~^(.*?)\s*([a-z0-9_]+)\s*\((.*)\)$~i; # ^^^^^^^^^^^ ^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
my ( $vars ) = $comment =~ m~\(([^\)]+)\)~; # ^
my ( $file, $decl_type ) = $pre_line =~ m~^([^:]+):\d+:(..)~; # ^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^
#debug cwrap_log qq[- %15s %30s %-50s %s // ($vars)\n], $scope, $return, $function, $params;
#debug cwrap_log qq[- debug: fixme: $pre_line $line_rest\n] if ( $file =~ m~/stat.h~ );
next if ( $return =~ m~^$~ ); # ignore indirect functions for now e.g. /* libarchive/archive_pack_dev.c:325:NF */ extern pack_t (*pack_find (const char *name)); /* (name) const char *name; */
next if ( $file =~ m~^/~ ); # ignore system files for now, e.g. /* /usr/include/event2/util.h:288:NC */ extern int evutil_socketpair (int, int, int, int *);
next if ( $params =~ m~\.\.\.~ ); # todo: how to wrap va_list functions?
next if ( $params =~ m~\b__va_list_tag\b~ ); # todo: how to wrap __va_list_tag functions?
next if ( $function =~ m~^(memset|memmove|strlcpy|strlcat|evdns_shutdown|TO_EDGE_CONN)$~ ); # hack for Tor; TO_EDGE_CONN() is inline in or.h
# }
#
# while ( $aux_info =~ m~/\* ($o_file.c:\d+:.F) \*/ ([^\n\r]+)~gis ) { # /* router.c:1461:NF */ extern void routerstatus_get_verbose_nickname (char *buf, const routerstatus_t *router); /* (buf, router) char *buf; const routerstatus_t *router; */
# my ( $pre_line, $line_rest ) = ( $1, $2 ); # ^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# my ( $scope, $prototype, $comment ) = $line_rest =~ m~^(extern|static) ([^;]+);(.*)$~; # ^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# my ( $return, $function, $params ) = $prototype =~ m~^(.*?)\s*([a-z0-9_]+)\s*\((.*)\)$~i; # ^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# my ( $vars ) = $comment =~ m~\(([^\)]+)\)~; # ^^^^^^^^^^^
# my ( $file ) = $pre_line =~ m~^([^:]+):~; # ^^^^^^^^
if (( $file !~ m~^$input_file$~ ) # e.g. router.c
&& ( $decl_type =~ m~.F~ )) {
#debug cwrap_log qq[step 2: c->c : skipping aux line: %s %s // e.g. inline function defined in header file], $pre_line, $line_rest;
next;
}
my $function_static_postfix = $scope eq q[static] ? sprintf qq[_static_in_%s], $file : q[]; # static functions
$function_static_postfix =~ s~[^a-z0-9_]~_~gi;
$functions->{"$function$function_static_postfix"}{name} = $function;
$functions->{"$function$function_static_postfix"}{static_postfix} = $function_static_postfix;
$functions->{"$function$function_static_postfix"}{file} = $file;
$function_by_name->{$function}{mock} = "$function$function_static_postfix";
$function_by_name->{$function}{static_postfix} = $function_static_postfix;
$function_by_name->{$function}{file} = $file;
if (( $file !~ m~^$input_file$~ ) # e.g. router.c
|| ( $decl_type !~ m~.F~ )) {
#debug cwrap_log qq[step 2: c->c : skipping aux line: %s %s], $pre_line, $line_rest;
next; # only process further if we need to create the wrapper functions
}
printf $fd_c_file qq[\n];
printf $fd_c_file qq[extern unsigned long long cwrap_count_call_%s%s;\n], $function, $function_static_postfix;
printf $fd_c_file qq[extern %s (* cwrap_indirect_orig_%s%s) (%s);\n], $return, $function, $function_static_postfix, $params;
printf $fd_c_file qq[extern %s (* cwrap_indirect_wrap_%s%s) (%s);\n], $return, $function, $function_static_postfix, $params;
printf $fd_c_file qq[extern %s (* cwrap_indirect_mock_%s%s) (%s);\n], $return, $function, $function_static_postfix, $params;
my $not_void_return_assign = sprintf qq[%s cwrap_result =], $return if ( $return !~ m~\bvoid\b~ || $return =~ m~\*~ );
my $not_void_return_return = qq[return cwrap_result;] if ( $return !~ m~\bvoid\b~ || $return =~ m~\*~ );
my $function_exit;
if ( $return =~ m~\bvoid\b~ ) {
$function_exit = <<EOL;
if (0 == (cwrap_log_get_lines() - lines)) { cwrap_log_pop(-1, 0 /* append */, "} // " "#%%llu", cwrap_count_call_$function$function_static_postfix); }
else { cwrap_log_pop(-1, 1 /* no append */, "} // $function() #%%llu", cwrap_count_call_$function$function_static_postfix); }
EOL
}
elsif ( $return =~ m~\bint\b~ ) {
$function_exit = <<EOL;
if (0 == (cwrap_log_get_lines() - lines)) { cwrap_log_pop(-1, 0 /* append */, "} = %%d // " "#%%llu", cwrap_result, cwrap_count_call_$function$function_static_postfix); }
else { cwrap_log_pop(-1, 1 /* no append */, "} = %%d // $function() #%%llu", cwrap_result, cwrap_count_call_$function$function_static_postfix); }
EOL
}
else {
$function_exit = <<EOL;
if (0 == (cwrap_log_get_lines() - lines)) { cwrap_log_pop(-1, 0 /* append */, "} = ? // " "#%%llu", cwrap_count_call_$function$function_static_postfix); }
else { cwrap_log_pop(-1, 1 /* no append */, "} = ? // $function() #%%llu", cwrap_count_call_$function$function_static_postfix); }
EOL
}
printf $fd_c_file <<EOL;
static $return cwrap_wrap_$function$function_static_postfix ($params)
{
cwrap_count_call_$function$function_static_postfix ++;
// todo: add per binary/file/function verbosity checking
// todo: show function parameters for simple types, e.g. foo=123, bar=7
if (cwrap_count_call_$function$function_static_postfix <= 10000) cwrap_log_push(1, 1, "+ $function() {");
int lines = cwrap_log_get_lines();
$not_void_return_assign (* cwrap_indirect_orig_$function$function_static_postfix)($vars); // call wrapped function
if (cwrap_count_call_$function$function_static_postfix <= 10000) {
$function_exit
}
// todo: show return value for simple types
$not_void_return_return
}
EOL
#fixme: consider: __attribute__((used))
printf $fd_c_file qq[%s (* cwrap_indirect_orig_%s%s) (%s) = %s;\n] , $return, $function, $function_static_postfix, $params, $function ;
printf $fd_c_file qq[%s (* cwrap_indirect_wrap_%s%s) (%s) = cwrap_wrap_%s%s;\n], $return, $function, $function_static_postfix, $params, $function, $function_static_postfix;
printf $fd_c_file qq[%s (* cwrap_indirect_mock_%s%s) (%s) = cwrap_wrap_%s%s;\n], $return, $function, $function_static_postfix, $params, $function, $function_static_postfix; # point to wrapped function
printf $fd_c_file qq[unsigned long long cwrap_count_call_%s%s = 0;\n], $function, $function_static_postfix;
$static_funcion_count ++;
}
$sub_end_hint = sprintf qq[; %4u functions], $static_funcion_count;
close $fd_c_file;
cwrap_log qq[step 2: c->c : wrote bytes to c file: %u > %s], -s $c2_file, $c2_file;
cwrap_log qq[step 3: c->s : running: %s], $c2s2;
system $c2s2;
my $warnings = `egrep -c ": warning: " $output_file_log`;
$sub_end_hint .= sprintf ", %4u warnings", $warnings;
if ( $deps_file_stem ) {
my $po_file = $deps_file_stem . '.Tpo';
open ( my $fd_po_file, "<", $po_file ) || die qq[];
sysread ( $fd_po_file, my $po_file_bytes, 9_999_999 );
close $fd_po_file;
cwrap_log qq[step 4: ->Tpo: read bytes of Tpo file: %u < %s], length $po_file_bytes, $po_file;
$po_file_bytes =~ s~$input_file_stem.cwrap.2.c~$input_file_stem.c~gs; # change e.g. 'util.o: util.cwrap.2.c ../../orconfig.h /usr/include/fcntl.h \' to 'util.o: util.c ../../orconfig.h /usr/include/fcntl.h \'
$po_file_bytes =~ s~\.cwrap.h\:~~gs;
$po_file_bytes =~ s~\.cwrap.h\s*~~gs;
open ( my $fd_po_file, ">", $po_file ) || die qq[];
syswrite ( $fd_po_file, $po_file_bytes, length $po_file_bytes );
close $fd_po_file;
cwrap_log qq[step 4: ->Tpo: wrote bytes to Tpo file: %u > %s], length $po_file_bytes, $po_file;
}
else {
cwrap_log qq[step 4: ->Tpo: skipping .Tpo];
}
}
{
my $s3_file = $input_file_stem . '.cwrap.3.s';
open ( my $sfile, "<", $s3_file ) || die qq[ERROR: open failed: $!];
my @assembler_lines = <$sfile>;
close $sfile;
cwrap_log qq[step 5: s->s : read bytes of s file: %u bytes / %u lines < %s], -s $s3_file, scalar @assembler_lines, $s3_file;
my $call_lines;
my $call_line_replacements;
my $functions_ignored;
my $unique_functions_replaced;
if(0) {
foreach ( grep ( m~(call|jmp)[ \t]+([^ \t\@\r\n]+)(\@PLT|)~, @assembler_lines ) ) {
m~(call|jmp)[ \t]+([^ \t\@\r\n]+)(\@PLT|)~;
my ( $op_type, $function ) = ( $1, $2 );
next if ( $function =~ m~^cwrap_log~ );
next if ( $function =~ m~^\*cwrap_indirect_~ );
#next if ( $function =~ m~^__~ ); # e.g. __stack_chk_fail() or __fprintf_chk() or __errno_location()
next if ( $function =~ m~^\*~ ); # e.g. *40(%r14)
next if ( $function =~ m~\.part\.\d+$~ ); # e.g. tor_tls_get_by_ssl.part.6
if ( exists $function_by_name->{$function} ) {
#debug printf $fd_output_file_log qq[%s %fs cwrap: function: replace: %-50s in %s], scalar localtime, Time::HiRes::time - $start_time, $function, $_;
my $function_mock = $function_by_name->{$function}{mock};
my $function_static_postfix = $function_by_name->{$function}{static_postfix};
my $function_file = $function_by_name->{$function}{file};
# s~(call[ \t]+)($function)(\@PLT|)$~$1 *cwrap_indirect_mock_$2$function_static_postfix$3(\%rip)~;
if (s~(call[ \t]+)($function)(\@PLT)$~$1 *cwrap_indirect_mock_$2$function_static_postfix$3(\%rip)~) {}
elsif (s~(call[ \t]+)($function)$~$1 *cwrap_indirect_mock_$2$function_static_postfix\@PLT(\%rip)~) {} # https://stackoverflow.com/questions/48369771/relocation-r-x86-64-pc32-against-symbol-when-calling-function-from-inline-assemb
if (s~(jmp[ \t]+)($function)(\@PLT)$~$1 *cwrap_indirect_mock_$2$function_static_postfix$3(\%rip)~) {}
elsif (s~(jmp[ \t]+)($function)$~$1 *cwrap_indirect_mock_$2$function_static_postfix\@PLT(\%rip)~) {}
if ( not exists $unique_functions_replaced->{$function} ) {
#debug cwrap_log qq[step 5: s->s : replaced new unique function: %s // %s], $function, $function_by_name->{$function}{static_postfix};
}
$unique_functions_replaced->{$function} ++;
$call_line_replacements ++;
}
else {
$functions_ignored->{$function} ++;
#debug printf $fd_output_file_log qq[%s %fs cwrap: function: ignore : %-50s in %s], scalar localtime, Time::HiRes::time - $start_time, $function, $_;
}
$call_lines ++;
}
}
# e.g. "leaq client_read_proxy(%rip)," from C code like this "filter->read = client_read_proxy;"
foreach ( grep ( m~(call|jmp|leaq)[ \t]+([^ \t\@\(\r\n]+)(..rip|\@PLT|)~, @assembler_lines ) ) {
m~(call|jmp|leaq)[ \t]+([^ \t\@\(\r\n]+)(..rip|\@PLT|)~;
my ( $op_type, $function ) = ( $1, $2 ); # e.g. leaq, client_read_proxy
next if ( $function =~ m~^cwrap_log~ );
next if ( $function =~ m~^\*cwrap_indirect_~ );
#next if ( $function =~ m~^__~ ); # e.g. __stack_chk_fail() or __fprintf_chk() or __errno_location()
next if ( $function =~ m~^\*~ ); # e.g. *40(%r14)
next if ( $function =~ m~\.part\.\d+$~ ); # e.g. tor_tls_get_by_ssl.part.6
next if ( $function =~ m~^[\.0-9\-]~ ); # e.g. .L828 or 144 or -1
next if ( $function =~ m~\.~ ); # e.g. av.7393
if ( exists $function_by_name->{$function} ) {
#debug printf $fd_output_file_log qq[%s %fs cwrap: function: replace: %-50s in %s], scalar localtime, Time::HiRes::time - $start_time, $function, $_;
my $function_mock = $function_by_name->{$function}{mock};
my $function_static_postfix = $function_by_name->{$function}{static_postfix};
my $function_file = $function_by_name->{$function}{file};
# s~(call[ \t]+)($function)(\@PLT|)$~$1 *cwrap_indirect_mock_$2$function_static_postfix$3(\%rip)~;
if (s~(call[ \t]+)($function)(\@PLT)$~$1 *cwrap_indirect_mock_$2$function_static_postfix$3(\%rip)~) {}
elsif (s~(call[ \t]+)($function)$~$1 *cwrap_indirect_mock_$2$function_static_postfix\@PLT(\%rip)~) {} # https://stackoverflow.com/questions/48369771/relocation-r-x86-64-pc32-against-symbol-when-calling-function-from-inline-assemb
elsif (s~(jmp[ \t]+)($function)(\@PLT)$~$1 *cwrap_indirect_mock_$2$function_static_postfix$3(\%rip)~) {}
elsif (s~(jmp[ \t]+)($function)$~$1 *cwrap_indirect_mock_$2$function_static_postfix\@PLT(\%rip)~) {}
elsif (s~(leaq[ \t]+)($function)~$1 cwrap_wrap_$2$function_static_postfix~) {}
if ( not exists $unique_functions_replaced->{$function} ) {
#debug cwrap_log qq[step 5: s->s : replaced new unique function: %s // %s], $function, $function_by_name->{$function}{static_postfix};
}
$unique_functions_replaced->{$function} ++;
$call_line_replacements ++;
}
else {
$functions_ignored->{$function} ++;
my $assembler_line = $_;
$assembler_line =~ s~\s+$~~g;
cwrap_log qq[step 5: s->s : function: ignore : %-50s in %s], $function, $assembler_line;
}
$call_lines ++;
}
foreach ( sort keys %{ $functions_ignored } ) {
cwrap_log qq[step 5: s->s : unique function ignored: %3u times %s], $functions_ignored->{$_}, $_;
}
cwrap_log qq[step 5: s->s : found %u call lines, replaced %u calls with %u unique functions, %u unique functions ignored], $call_lines, $call_line_replacements, scalar keys %{ $unique_functions_replaced }, scalar keys %{ $functions_ignored };
my $s4_file = $input_file_stem . '.cwrap.4.s';
open ( my $sfile, ">", $s4_file ) || die qq[];
foreach (@assembler_lines) {
print $sfile $_;
}
close $sfile;
cwrap_log qq[step 5: s->s : wrote bytes to s file: %u > %s], -s $s4_file, $s4_file;
}
cwrap_log qq[step 6: s->o : running: %s], $s2o;
system $s2o;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment