Skip to content

Instantly share code, notes, and snippets.

@fitzgen
Created May 24, 2018 21:26
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fitzgen/34073d61f2c358f2b35038fa263b74a3 to your computer and use it in GitHub Desktop.
Save fitzgen/34073d61f2c358f2b35038fa263b74a3 to your computer and use it in GitHub Desktop.
Brain dump of an instrumentation-based wasm tracing tool.

Wasm Tracing

Here is a brain dump about an idea for a wasm tracing tool I have been thinking about. I hope this is useful :)

Inspiration

dtrace. Lightweight, make your own debugger/profiler. Not a complete profiler or debugger for some specific use case, but is instead a collection of legos or a toolkit for building one-off debuggers and profilers.

Features

Things to Trace

  • Maintain ring buffer of N latest functions were called or returned

  • Same as above but only for certain functions that match a regex

  • "strace" functionality by tracing calls to imported functions (which are the moral equivalent of syscalls in native code)

  • Arguments to and values returned from specific functions

  • The grow_memory instruction

  • Traps

  • Maintain a shadow stack in memory (via inserting prologue and epilogue into every function) and then capture the current stack on various events (things listed above).

Ways to Aggregate or Display Traced Data

  • Easiest: as a flat log. For example, listing the last N calls to imported functions:

    query_selector(0x12345678) -> 0xbad0bad1
    create_element(0xcafecafe) -> 0xdeaddead
    append_child(0xbad0bad1, 0xdeaddead)
    ...
    
  • As a nested log where a call introduces new indenting and a return removes indenting. For example, show me the last N function calls that happened before this bug.

    call crate::mod::func(123, 456)
      call crate::mod::helper(0)
      return 42 from crate::mod::helper
      call crate::mod::another()
        call util::blah(986, 345)
        return from util::blah
      return 1 from crate::mod::another
    ...
    
  • For a series of captured stacks: a call tree with counts (can be inverted too). For example, trace the stack whenever we call the free function, and then aggregate this into a call stack:

    Total Count | Self Count | Stack Frame
    ------------+------------+----------------------------
            123 |          0 | do_tick
             67 |          0 | ├── physics
             67 |         67 | │   └── destroy_collision_node
             56 |          0 | └── render
             43 |         43 |     ├── finish_draw_rect
             13 |         13 |     └── finish_draw_circle
    
  • For any scalar data, ie the arguments to and values returned from some functions, we could draw histograms. This would be neat combined with tracing the requested sizes of allocations, for example:

    value  ------------- Distribution ------------- count
       16 |                                         0
       32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       169
       64 |@@@                                      16
      128 |@@                                       10
      256 |                                         0
    

Usability

This is the hardest feature to do well, but also most important, in my opinion!

I want to use this tool when debugging or profiling, and I want to be able to just apply it to my code without any ceremony. As much as possible, I don't want to mess with build configurations, I don't want to manually add new <script> tags, I don't want to have to change my own source code, or the way I instantiate the wasm module. This is hard because we are talking about introducing new JS code and potentially linking another wasm object file into the instrumented wasm.

Ideally, to trace the first argument to the malloc function and display it as a histogram, I would just do something like

$ wasm-trace --trace "arg(0, malloc)" --display histogram path/to/module.wasm

And that is all. It would mutate the wasm binary in place, so my existing build system and all that wouldn't have to understand this temporary build step.

Components

I foresee two main components: (1) the thing that does the instrumenting and the code it inserts into the instrumented binary, and (2) the JS that extracts the traced data and displays it.

Instrumenter

Inserts new instructions into a wasm binary to capture and maintain tracing information. Adds new data segments to store traced info inside.

Do we want a ring buffer, where old data is overwritten when we wrap around, or do we start summarizing data at that point (when applicable), or do we call a well-known imported function from the JS displayer that knows how to empty all the data? Maybe different approaches in different situations.

Maybe the instrumenter could itself be compiled to wasm and the instrumentation of a debuggee wasm binary could be applied just before wasm compilation inside a webpage? yo_dawg.jpg

JS Displayer

Some JavaScript module that collects the traced data from inside the wasm memory and displays it in console.log or within some <pre> or does a cool canvas visualization or something.

Would be awesome if this worked with both node.js and the web (or if there were two versions).

Design decision: do all aggregation in the instrumented wasm (via linking a runtime into the instrumented code?) or post process in this displayer JS? Former is likely more performant, but latter might be easier?

Available Tools / Libraries

binaryen

https://github.com/WebAssembly/binaryen

Framework for compilation passes over wasm, analyses, instrumentation, etc. Written in C++, fairly mature.

parity_wasm

https://crates.io/crates/parity-wasm

Wasm parser crate. Written in Rust and is pretty solid. However, it is really only a parser / builder for wasm. Anyone doing analyses on top of this probably has to build out more infrastructure compared to binaryen.

@vshymanskyy
Copy link

vshymanskyy commented Feb 24, 2020

Hi @fitzgen, thanks for sharing your thoughts!
While working on Wasm3, I found that having such a tool would be very useful.
I'm aware of existing tools like Wasabi and sliminality/wasm-trace.
But I decided to give it a try and develop our own version: https://github.com/wasm3/wasm-trace
For example I would really like to decouple the instrumentation, execution and visualization phases, so that we can get similar (and directly comparable) results from different wasm engines and runtimes.
Getting good results already, see this thread on Twitter

Would love to hear some feedback from you. Have a nice day 😃

@fitzgen
Copy link
Author

fitzgen commented Feb 24, 2020

Nice! Excited to see progress in this space :)

While working on Wasm3, I found that having such a tool would be very useful.
I'm aware of existing tools like Wasabi and sliminality/wasm-trace.
But I decided to give it a try and develop our own version: https://github.com/wasm3/wasm-trace

FWIW, it's nice to give a nod to existing projects that helped inspire a new project in the README or something. It's also a great place to explain to potential users what the differences between the projects are, and why the new project was created vs contributing to the existing ones.

@vshymanskyy
Copy link

  • Wasabi for example, needs dynamic JS code generation. It means, that wasm runtime needs to have a JS engine to be able to run the instrumented files. This is very unfortunate, as Wasmer, Wasmtime, Wasm3 (and almost all WASI engines) do not run in JS environment be default. See danleh/wasabi#23
  • sliminality/wasm-trace is very Rust-centric and (AFAIK) does not allow arbitrary wasm file instrumentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment