fitzgen/wasm-tracing.md Secret

## wasm-tracing.md

      
    Raw
  

              wasm-tracing.md
            
          
    Wasm Tracing

Here is a brain dump about an idea for a wasm tracing tool I have been thinking
about. I hope this is useful :)
Inspiration

dtrace. Lightweight, make your own debugger/profiler. Not a complete profiler
or debugger for some specific use case, but is instead a collection of legos or
a toolkit for building one-off debuggers and profilers.

https://en.wikipedia.org/wiki/DTrace
http://www.brendangregg.com/dtrace.html
http://www.brendangregg.com/DTrace/dtrace_oneliners.txt

Features

Things to Trace


Maintain ring buffer of N latest functions were called or returned


Same as above but only for certain functions that match a regex


"strace" functionality by tracing calls to imported functions (which are the
moral equivalent of syscalls in native code)


Arguments to and values returned from specific functions


The grow_memory instruction


Traps


Maintain a shadow stack in memory (via inserting prologue and epilogue into
every function) and then capture the current stack on various events (things
listed above).


Ways to Aggregate or Display Traced Data


Easiest: as a flat log. For example, listing the last N calls to imported
functions:
query_selector(0x12345678) -> 0xbad0bad1
create_element(0xcafecafe) -> 0xdeaddead
append_child(0xbad0bad1, 0xdeaddead)
...


As a nested log where a call introduces new indenting and a return removes
indenting. For example, show me the last N function calls that happened before
this bug.
call crate::mod::func(123, 456)
  call crate::mod::helper(0)
  return 42 from crate::mod::helper
  call crate::mod::another()
    call util::blah(986, 345)
    return from util::blah
  return 1 from crate::mod::another
...


For a series of captured stacks: a call tree with counts (can be inverted
too). For example, trace the stack whenever we call the free function, and
then aggregate this into a call stack:
Total Count | Self Count | Stack Frame
------------+------------+----------------------------
        123 |          0 | do_tick
         67 |          0 | ├── physics
         67 |         67 | │   └── destroy_collision_node
         56 |          0 | └── render
         43 |         43 |     ├── finish_draw_rect
         13 |         13 |     └── finish_draw_circle


For any scalar data, ie the arguments to and values returned from some
functions, we could draw histograms. This would be neat combined with tracing
the requested sizes of allocations, for example:
value  ------------- Distribution ------------- count
   16 |                                         0
   32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       169
   64 |@@@                                      16
  128 |@@                                       10
  256 |                                         0


Usability

This is the hardest feature to do well, but also most important, in my opinion!
I want to use this tool when debugging or profiling, and I want to be able to
just apply it to my code without any ceremony. As much as possible, I don't want
to mess with build configurations, I don't want to manually add new <script>
tags, I don't want to have to change my own source code, or the way I
instantiate the wasm module. This is hard because we are talking about
introducing new JS code and potentially linking another wasm object file into
the instrumented wasm.
Ideally, to trace the first argument to the malloc function and display it as
a histogram, I would just do something like
$ wasm-trace --trace "arg(0, malloc)" --display histogram path/to/module.wasm

And that is all. It would mutate the wasm binary in place, so my existing build
system and all that wouldn't have to understand this temporary build step.
Components

I foresee two main components: (1) the thing that does the instrumenting and the
code it inserts into the instrumented binary, and (2) the JS that extracts the
traced data and displays it.
Instrumenter

Inserts new instructions into a wasm binary to capture and maintain tracing
information. Adds new data segments to store traced info inside.
Do we want a ring buffer, where old data is overwritten when we wrap around, or
do we start summarizing data at that point (when applicable), or do we call a
well-known imported function from the JS displayer that knows how to empty all
the data? Maybe different approaches in different situations.
Maybe the instrumenter could itself be compiled to wasm and the instrumentation
of a debuggee wasm binary could be applied just before wasm compilation inside a
webpage? yo_dawg.jpg
JS Displayer

Some JavaScript module that collects the traced data from inside the wasm memory
and displays it in console.log or within some <pre> or does a cool canvas
visualization or something.
Would be awesome if this worked with both node.js and the web (or if there were
two versions).
Design decision: do all aggregation in the instrumented wasm (via linking a
runtime into the instrumented code?) or post process in this displayer JS?
Former is likely more performant, but latter might be easier?
Available Tools / Libraries

binaryen

https://github.com/WebAssembly/binaryen
Framework for compilation passes over wasm, analyses, instrumentation,
etc. Written in C++, fairly mature.
parity_wasm

https://crates.io/crates/parity-wasm
Wasm parser crate. Written in Rust and is pretty solid. However, it is really
only a parser / builder for wasm. Anyone doing analyses on top of this
probably has to build out more infrastructure compared to binaryen.