Skip to content

Instantly share code, notes, and snippets.

@captivus
Created May 28, 2026 19:31
Show Gist options
  • Select an option

  • Save captivus/0a14be7e9ba65a7305f95d6ac0b8b6c7 to your computer and use it in GitHub Desktop.

Select an option

Save captivus/0a14be7e9ba65a7305f95d6ac0b8b6c7 to your computer and use it in GitHub Desktop.
Handy memory leak bugfix instrumentation

Mapping Handy's End-to-End Execution Flow

This document is a standalone, validated procedure for instrumenting Handy and producing a complete, cross-correlated trace of one or more transcription cycles — from process boot, through trigger, through the recording steady state, through VAD finalization and Whisper inference, through paste, back to idle.

It is written without assuming prior context. Every environment variable, flag, and structural decision in this document is one that has been validated empirically against Handy on Linux/WebKitGTK. Where a plausible alternative is wrong in a way that wastes time, the failure mode is spelled out inline so the reader can avoid it.

This document tells you how to capture and synthesize observations. It does not present observations about Handy's runtime behavior — those you will produce yourself by following the procedure.


Table of contents

  1. Definition of done
  2. Handy's runtime topology (what you are instrumenting)
  3. The four-tool methodology
  4. Pre-flight: environment and isolation
  5. Tool 1 — uftrace (Rust function tracing via mcount)
  6. Tool 2 — strace (syscall and IPC observation)
  7. Tool 3 — WebKit Remote Inspector (JS Timeline / ScriptProfiler / Heap)
  8. Tool 4 — source counters (compiled-in event counters)
  9. Building the instrumented binary
  10. The coverage audit (a priori static enumeration)
  11. The synchronized capture
  12. Post-hoc synthesis: producing the four deliverables
  13. Validation contracts (per tool and end-to-end)
  14. Common pitfalls and the failure modes they cause
  15. Gap audit: what this procedure does not capture
  16. Reference: flags, env vars, paths, commands

1. Definition of done

Before you start, agree on what a complete execution-flow map consists of. This procedure produces five things:

  1. A coverage audit (coverage-map.csv) — a static, a-priori enumeration of every code path that could fire during the lifecycle, with each path mapped to the lifecycle phase in which it is expected and to the instrumentation tool that could observe it.
  2. A synchronized capture — one continuous, wall-clock-anchored run with all chosen tools recording simultaneously through a real lifecycle. The raw artifacts:
    • uftrace.data/ — Rust function-call trace
    • strace.log — syscall + IPC trace
    • webinspector-recording_overlay.json — JS profile for the overlay webview
    • webinspector-settings.json — JS profile for the main webview
    • handy.stdout.log — application stdout, including any source-counter lines
    • phase-timeline.log — wall-clock timestamps for each lifecycle marker
  3. A synthesis of those artifacts into:
    • execution-trace.csv — every function / event / syscall observed, with phase assignment, tool source, and frequency
    • call-graph.csv — caller → callee edges from the function tracer
    • coverage.csv — for every row in the audit, whether it fired this cycle and (if not) why
    • execution-flow.mmd — a Mermaid sequence diagram across the architecture's lanes (user, Handy main, audio thread, overlay JS, external subprocess)
  4. A gap audit (gap-audit.md) — explicit enumeration of what this composition does not capture (see section 15 for the template).
  5. A validation pass (status-summary.md) — per-phase status with the evidence on which each phase's PASS rests.

If any of these is missing, the map is incomplete. Anything reported as "PASS" without an artifact whose size and content satisfy the validation contract in section 13 is a fraudulent PASS.


2. Handy's runtime topology (what you are instrumenting)

You are instrumenting a Tauri 2.x desktop application with a Rust backend (src-tauri/) and a React/TypeScript frontend (src/). The runtime is multi-process and multi-threaded.

                            PROCESS: handy (Rust binary)
  ┌─────────────────────────────────────────────────────────────────────────┐
  │  main thread                                                            │
  │   └─ tauri::Builder runs the event loop                                 │
  │      ├─ command handlers (specta_builder.invoke_handler)                │
  │      ├─ event emitters (app_handle.emit, window.emit_to)                │
  │      ├─ tray, autostart, single-instance, updater                       │
  │      └─ custom URI scheme handlers                                      │
  │                                                                         │
  │  audio thread (cpal's internal stream callback thread)                  │
  │   └─ samples → mpsc::Sender<AudioChunk>                                 │
  │                                                                         │
  │  audio consumer thread (run_consumer worker)                            │
  │   ├─ FrameResampler                                                     │
  │   ├─ AudioVisualiser (FFT → level buckets)                              │
  │   │     └─ level callback fires here                                    │
  │   └─ Silero VAD                                                         │
  │                                                                         │
  │  input thread (rdev global listener)                                    │
  │  signal-handler thread (Unix: SIGUSR1/SIGUSR2/SIGTERM)                  │
  │                                                                         │
  │  ad-hoc threads:                                                        │
  │   ├─ lazy-close watchdog                                                │
  │   ├─ model download                                                     │
  │   ├─ inference (transcribe-rs, blocking)                                │
  │   └─ idle watcher                                                       │
  └─────────────────────────────────────────────────────────────────────────┘
                                       │
                          Tauri IPC bridge (wry-managed)
            • commands : invoke handler called from JS                      │
            • events   : Rust → JS via wry InnerWebView::eval               │
            • schemes  : custom URI schemes registered in tauri::Builder    │
                                       │
                                       ▼
       ┌────────────────────────────┐       ┌────────────────────────────┐
       │ PROCESS: WebKitWebProcess  │       │ PROCESS: WebKitWebProcess  │
       │ window label "main"        │       │ window label "recording_   │
       │                            │       │   overlay"                 │
       │ src/main.tsx               │       │ src/overlay/main.tsx       │
       └────────────────────────────┘       └────────────────────────────┘
                                       │
                            PROCESS: WebKitNetworkProcess
                            (HTTP fetch isolation)

Key architectural facts you need to know to instrument correctly:

  • Two windows = two webviews = (typically) two WebKitWebProcess subprocesses on Linux. The window labels are "main" and "recording_overlay". Instrument both.
  • The Tauri event path is wry InnerWebView::eval, which crosses to the WebKit subprocess via webkit_web_view_run_javascript(). On Linux this serializes a JS source string over an unnamed SOCK_SEQPACKET Unix domain socketpair. The WebKit child process is invoked with argv [..., <eventfd_fd>, <ipc_socket_fd>]; argv[2] is the IPC socket FD inside the child. You can cross-reference /proc/<child>/fd/<argv[2]> to a socket inode in /proc/net/unix (type 0005 = SEQPACKET) and find its peer inode (which is the UIProcess-side FD on the Handy main process).
  • Trigger paths are not equivalent. A recording can be started by: global shortcut (rdev), CLI flag (--toggle-transcription, --cancel), or Unix signal (SIGUSR2). The CLI flag launches a second instance that initializes Whisper/Vulkan before tauri-plugin-single-instance hands off — under instrumentation overhead this can cascade into resource pressure that distorts the trace. Use SIGUSR2 for synchronized captures. The signal handler in signal_handle.rs invokes the same in-process trigger the shortcut would have driven; no second instance is spawned.
  • The single-instance plugin is bundle-ID-scoped, not display-scoped. A Handy you launch on Xvfb :103 will be blocked silently by a Handy the user has running on :0. Confirm no user Handy is running before launching any instrumented binary.
  • cpal on Linux routes through PipeWire/PulseAudio's ALSA emulation. Per-app input redirection via PULSE_SOURCE / PIPEWIRE_NODE env vars does NOT work for cpal — Handy will use the default source. Use pactl move-source-output on each stream after it appears.

3. The four-tool methodology

Handy's execution flow spans userspace function calls, syscalls (especially the IPC socketpair), the JS runtime inside the webview, and code paths the function tracer cannot see (closure bodies, FFI internals). No single tool covers the whole surface. Use all four:

Tool Sees Cannot see
uftrace Every Rust function entry/exit at the symbol level, with timestamps, thread IDs, and caller chain Closure bodies without their own symbol; C/C++ dylib internals; subprocess argv
strace Every syscall — execve (subprocess invocations), sendmsg/sendto/writev/write and recvmsg/recvfrom (IPC and I/O) Pre-syscall in-process work; userspace-only function calls
WebKit Remote Inspector JS Timeline events (EventDispatch, TimerFire, RenderingFrame), ScriptProfiler call samples, Heap.* events (GC, snapshots), console messages Native code inside the WebKit subprocess; sub-sampler-resolution JS work
Source counters Any path you statically add a counter to; useful where uftrace's mcount cannot reach because the symbol disappears under inlining or the path is a closure body Anything you didn't add a counter for; carries source-edit risk (section 8)

The execution-flow map is the union of what these four tools observe. Validate each independently, then synthesize.


4. Pre-flight: environment and isolation

These are non-negotiable. Skip any and your capture is contaminated.

4.1 No user Handy may be running

The single-instance plugin will silently block your launch otherwise. Confirm:

pgrep -x handy && echo "USER HANDY RUNNING -- quit it first"

In a controlled lab harness, also verify the exe path of any running handy to distinguish a user-installed instance (e.g., ~/.local/bin/handy, /usr/bin/handy, /opt/handy/handy, the AppImage) from a stale dev-tree process you can reap:

for pid in $(pgrep -x handy); do
  echo "$pid $(readlink /proc/$pid/exe 2>/dev/null)"
done

4.2 A dedicated Xvfb display

Run the instrumented binary on its own X display so window-manager events, input focus, and unrelated GTK traffic do not bleed into the trace. Pick a display number not in use by the user (:0 is the user's session; echo $DISPLAY confirms theirs):

Xvfb :103 -screen 0 1920x1080x24 -ac +extension RANDR -nolisten tcp \
  >/tmp/xvfb.log 2>&1 &
XVFB_PID=$!
xdpyinfo -display :103 >/dev/null   # poll until this succeeds
export DISPLAY=:103

Tear down with kill $XVFB_PID at the end.

4.3 A PipeWire null sink for deterministic audio input

Driving the microphone path with a known audio file gives you reproducible timing and a way to assert that audio actually reached cpal. Do NOT modify the user's default PipeWire source (typically a real microphone). Instead, load a per-capture null sink and move Handy's source-output onto it after it appears.

# Load the null sink.
SINK=handycap_sink
MODID=$(pactl load-module module-null-sink "sink_name=$SINK" media.class=Audio/Sink)
pactl set-sink-volume "$SINK" 100%
pactl set-source-volume "${SINK}.monitor" 100%

# Confirm the user's default source is untouched:
pactl info | grep '^Default Source:'   # should be the user's actual mic

After triggering recording (section 11), find Handy's source-output and move it:

# Find Handy's PipeWire source-output id.
pactl list source-outputs | awk '
  /^Source Output #/ { id=$3 }
  /PipeWire ALSA \[handy\]/ { print id }
'

# Move it onto the null sink's monitor.
pactl move-source-output <id> "${SINK}.monitor"

# Play the test audio into the null sink. paplay reads PCM WAV directly.
paplay --device="$SINK" --volume=65536 /path/to/test-audio.wav &

The audio path is: paplay → null sink → ${SINK}.monitor → Handy's source-output. This is the only Linux-validated method that reliably redirects cpal-ALSA-PipeWire input per app; env-var redirection does not work for this stack.

Tear the sink down in cleanup:

pactl unload-module "$MODID"

4.4 A coordination lock across instrumentation agents

If multiple instrumentation runs may launch concurrently (e.g., in a CI or multi-agent harness), serialize on a flock so only one Handy launches at a time:

exec 9>/tmp/handy-instr-lock
flock -x 9
# ...launch Handy under instrumentation...
flock -u 9

The lock is also a hedge against orphaned WebKitWebProcess children from prior failed runs — reap them while you hold the lock.

4.5 An emergency cleanup that runs on every exit

Loaded null sinks survive process death; WirePlumber's stream-restore cache can latch the user's installed Handy onto a sink that no longer exists and silently break their voice-to-text the moment they next try to use it. Register an unconditional cleanup that unloads any sink your run created and reaps any wrapper processes:

trap '
  pactl list short modules | grep "null-sink" | grep "$SINK" | cut -f1 |
    xargs -r -n1 pactl unload-module
  pgrep -f handycap-wrap | xargs -r kill -TERM
' EXIT

In a Python harness, register the same logic with atexit so it fires even on KeyboardInterrupt or AssertionError.

4.6 ptrace permission

strace uses ptrace to attach. On most distributions you can ptrace your own children with no extra permission. If /proc/sys/kernel/yama/ptrace_scope is 2 and your launcher does not exec the target as a child of strace, attaches will fail; the procedure below launches strace as the outermost process and execs the target down the chain, so ptrace_scope=1 is sufficient and no sudo is required.

4.7 Required tools

for t in uftrace strace pactl paplay xdpyinfo c++filt Xvfb objdump nm; do
  command -v $t >/dev/null || echo "MISSING: $t"
done

uftrace must be >= 0.15 for the mcount handling and dump --chrome output used by the synthesis step.


5. Tool 1 — uftrace (Rust function tracing via mcount)

uftrace records every Rust function entry and exit at the symbol level. It is the only tool in the four that gives you per-function call counts and a real call graph. It is the primary tool for execution-flow mapping.

5.1 What uftrace needs

The target binary must be compiled with the GCC/Clang mcount instrumentation hook — every function entry calls the mcount symbol, which uftrace's libmcount.so (LD_PRELOAD'd into the process) intercepts and records. For Rust this means a Cargo profile with -Z instrument-mcount set in RUSTFLAGS.

Nightly toolchain required. As of rustc 1.95 (stable), instrument-mcount is an unstable -Z flag and requires a nightly toolchain. The historical -C instrument-mcount form does NOT exist on stable rustc — the compiler rejects it. Use cargo +nightly for the instrumented build.

Add this to src-tauri/Cargo.toml (commit-quality, but tag the addition as instrumentation-only):

[profile.release-debug]
inherits = "release"
debug = "full"          # full DWARF for symbolication
strip = false           # do not strip symbols
lto = "off"             # uftrace cannot see inlined-then-stripped frames
codegen-units = 16
incremental = false

Build the binary with mcount enabled and into a non-default target dir so your normal cargo build is not contaminated:

cd src-tauri
CARGO_TARGET_DIR=target-uftrace \
RUSTFLAGS="-Z instrument-mcount" \
cargo +nightly build --profile release-debug --bin handy

The product is src-tauri/target-uftrace/release-debug/handy.

5.2 Confirm mcount and (optionally) devtools landed in the binary

BIN=src-tauri/target-uftrace/release-debug/handy

# mcount symbol must be undefined (it's imported from libmcount.so at runtime).
# The actual symbol on glibc-linked Rust binaries is `U mcount@GLIBC_X.Y.Z`
# (e.g., `U mcount@GLIBC_2.2.5`), NOT a bare `U mcount` — the strict regex
# `^ *U mcount$` matches nothing on a real instrumented build. Anchor on
# either `@` or end-of-line:
nm -D --undefined-only "$BIN" | grep -E '^ *U mcount(@|$)'

# Belt-and-suspenders: require the binary to also contain tens of thousands
# of `call mcount@plt` instructions. A successful Handy build shows >50k.
# A near-zero count means the flag silently no-op'd (most commonly because
# the build did not use a nightly toolchain — see 5.1).
CALL_COUNT=$(objdump -d "$BIN" | grep -c 'call.*<mcount@plt>')
[ "$CALL_COUNT" -gt 1000 ] || { echo "suspicious mcount call count: $CALL_COUNT"; exit 1; }

# If you also want Web Inspector in this same binary (recommended), the
# `devtools` feature of the tauri crate must be enabled at build time.
# Verify with:
objdump -d "$BIN" | grep -q 'webkit_settings_set_enable_developer_extras@' \
  && echo "devtools call site present"

To enable devtools, temporarily add "devtools" to the tauri feature list in src-tauri/Cargo.toml:

tauri = { version = "2.10.2", features = [
  "protocol-asset", "macos-private-api", "tray-icon", "image-png",
  "devtools",                # add for instrumentation builds only
] }

Restore the file when you are done so the change does not ride into a release build.

5.3 The Tauri resource-resolution gotcha

Tauri's resource-resolution code treats the binary as a "cargo development build" when the exe path's component at index len-3 is literally "target" — and only then does it resolve resources relative to the binary location. The path src-tauri/target-uftrace/release-debug/handy has target-uftrace at len-3, which fails the check; tray init will crash because the icon PNG cannot be resolved.

The fix is a wrapper directory that has target at len-3. Hardlink the binary and symlink resources next to it:

WRAP=/tmp/handycap-wrap/target/release-debug
mkdir -p "$WRAP"
ln -f src-tauri/target-uftrace/release-debug/handy "$WRAP/handy"
ln -snf "$PWD/src-tauri/target-uftrace/release-debug/resources" "$WRAP/resources"
touch "$WRAP/.cargo-lock"

# Verify the path component count.
python3 -c "
from pathlib import Path
p = Path('$WRAP/handy')
parts = p.parts
assert parts[-3] == 'target', f'wrong shape: {parts!r}'
print('OK', p)
"

Launch the binary from this wrapper path, not the original.

5.4 The smoke test (before recording starts)

Confirm the binary boots through resource resolution, database init, model preload, and global-shortcut registration before you wrap it in uftrace + strace, because composing all three layers can slow boot enough to mask a bad binary as "uftrace caused the timeout."

DISPLAY=:103 "$WRAP/handy" --start-hidden >/tmp/smoke.log 2>&1 &
SMOKE_PID=$!
# Wait up to 60 s for the boot-complete marker in stdout.
timeout 60 bash -c 'while ! grep -q "Shortcuts initialized" /tmp/smoke.log; do sleep 0.3; done'
echo $?    # 0 = boot succeeded
kill -TERM $SMOKE_PID

"Shortcuts initialized" is the canonical boot-complete marker emitted by shortcut/mod.rs during startup. The instrumented binary must reach it for any synchronized capture to be meaningful.

5.5 Running uftrace under the synchronized capture

uftrace is invoked as the inner process; strace is outermost. The launch line is built in section 11. The uftrace-specific args:

uftrace record \
  -d /path/to/uftrace.data \
  --no-libcall \
  -- \
  /tmp/handycap-wrap/target/release-debug/handy --start-hidden
  • -d — output directory.
  • --no-libcall — skip library function calls. Without this, uftrace records into libc, libgtk, libwebkit2gtk, libonnxruntime, libwhisper, etc., which inflates the trace by orders of magnitude and obscures Rust function behavior. The four-tool decomposition assigns dylib internals to a different (out-of-scope) instrumentation strategy (section 15).
  • Do not use -K (depth-limit) or -F (filter) for the initial map. Capture everything mcount sees; filter at synthesis.

5.6 Post-capture: report and dump

After the lifecycle, generate two views of the data:

# Per-function aggregate (calls, total time, self time). Pipe through
# c++filt to demangle any C++/Rust mangled symbols.
uftrace report -d /path/to/uftrace.data --no-libcall | c++filt \
  > uftrace-report.txt

# Chrome-trace-format dump for caller→callee edge extraction.
uftrace dump -d /path/to/uftrace.data --chrome | c++filt \
  > uftrace-dump-chrome.json.txt

The chrome dump is line-oriented JSON; each line is {"ts":..., "ph":"B"|"E", "name":..., "pid":..., "tid":...}. Build per-thread stacks from the B (begin) and E (end) events to reconstruct caller→callee edges. This is how call-graph.csv is produced (section 12).

5.7 What uftrace cannot see in this codebase

  • The cpal stream callback closure (built inside build_stream<T> in audio_toolkit/audio/recorder.rs). The closure has no Rust-source symbol; mcount only sees the function it is constructed in.
  • Level-callback closure bodies passed via with_level_callback. uftrace catches the parent function entry, not the closure body.
  • Any function whose symbol disappears under inlining. Set lto = "off" (as above) to keep most boundaries visible, but #[inline(always)] and aggressive optimization can still hide short helpers. For those, fall back to source counters (section 8).
  • Anything inside a dylib (libwebkit2gtk, libwhisper, libonnxruntime, libcuda, libasound, libgtk). Listed in the gap audit (section 15) with the tool that would reach each.

6. Tool 2 — strace (syscall and IPC observation)

strace records syscalls. Its job in this composition is to expose three things uftrace cannot:

  • The execve of every subprocess Handy spawns (the Linux paste path fans across multiple tools; clipboard helpers; system info queries).
  • The IPC socket traffic between the Handy main process and each WebKitWebProcess subprocess (event names appear as literal JSON in the payload).
  • The fork→exec chain of WebKit's UIProcess spawning new WebProcess / NetworkProcess children, so you can map a WebKit subprocess PID back to its IPC FD via the argv convention described in section 2.

6.1 Attach at exec, not after

The single biggest failure mode with strace is attaching to a Handy that is already running. strace -f -p <handy-pid> only follows forks from this moment forward; the long-lived WebKitWebProcess that was forked during boot retains its pre-attach state and most of its IPC traffic is invisible. Always launch under strace:

strace -f -yy -s 16384 \
  -e trace=execve,writev,write,sendmsg,sendto,recvmsg,recvfrom \
  -o /path/to/strace.log \
  -- <binary-or-inner-tool> <args>

6.2 The flags, justified

Flag Why
-f Follow forks. Captures the WebKit subprocess(es) spawned during boot. Without this, you only see the main process.
-yy Annotate FDs with file/peer info. sendmsg(24<UNIX:[12345->67890]> is how you confirm the IPC socketpair without manually cross-referencing /proc/net/unix.
-s 16384 Maximum string size per syscall. Tauri's IPC payloads — especially boot-time JS source shipped via webkit_web_view_run_javascript and event JSON like {"event":"<name>","handler":<id>,...} — can run several KB. The strace default of -s 32 and even commonly-suggested -s 200 truncates these silently. 16 KB is comfortably above any single Tauri payload observed.
-e trace=execve,writev,write,sendmsg,sendto,recvmsg,recvfrom The narrow set of syscalls relevant for IPC + subprocess invocations. The full syscall set explodes the log size during the audio loop (cpal poll calls run at audio-host cadence). Filter at strace level for a manageable artifact.
-o <file> Output destination. Do not rely on stderr; multi-process strace output interleaves and is hard to post-process if mixed with the target's own stderr.

Do NOT filter by PID. -f gives you the entire process tree; PID-level filtering at strace level loses cross-process events.

6.3 Post-capture: confirming the artifact contains real IPC

size=$(stat -c%s /path/to/strace.log)
echo "strace log: $size bytes"
[ "$size" -ge 50000 ] || echo "FAIL: too small"

# Count literal occurrences of known Tauri event names. The frontend's
# listen() registrations serialize the event name as a literal string
# in the IPC payload, so even a capture that did not include a recording
# cycle should surface several distinct event names from boot-time
# listener setup.
for ev in show-overlay hide-overlay model-state-changed loading_completed; do
  c=$(grep -c -- "$ev" /path/to/strace.log)
  echo "  $ev: $c"
done

If the count of all event names sums to zero, the capture is broken — most commonly because strace attached too late or -s was set too low. Re-run.

6.4 Mapping a WebKit subprocess PID to its IPC FD

After the run, identify WebKit child PIDs and their IPC sockets:

# Filter strace.log to execve lines for WebKit children:
grep 'execve.*WebKitWebProcess' /path/to/strace.log | head -5

# For each WebKitWebProcess PID, argv[2] is the IPC socket FD inside the child.
# Find the socket inode and peer:
for pid in $(pgrep -f WebKitWebProcess); do
  echo "=== $pid ==="
  ls -l /proc/$pid/fd/ | grep socket
done

The pair (WebKit FD, peer FD on Handy main) is the SOCK_SEQPACKET pair that carries every evaluate_script call and every Tauri IPC event.


7. Tool 3 — WebKit Remote Inspector (JS Timeline / ScriptProfiler / Heap)

The WebKit Remote Inspector gives you the JavaScript-side view: which events JS handled, which timers fired, which functions consumed sampler time, when GCs happened. It is the only tool in the four that observes the webview side.

7.1 The single most common mistake: wrong env var

There are TWO different inspector env vars, and they do different things:

  • WEBKIT_INSPECTOR_SERVER=host:port — enables WebKit's internal inspector:// scheme protocol. The listening socket performs a binary handshake; HTTP clients (including curl /json) receive "empty reply from server."
  • WEBKIT_INSPECTOR_HTTP_SERVER=host:port — enables the HTTP/Chromium-style inspector. The listening socket serves an HTML page listing inspectable targets and accepts WebSocket connections on per-target paths.

Use WEBKIT_INSPECTOR_HTTP_SERVER. Anything else will silently fail to connect.

export WEBKIT_INSPECTOR_HTTP_SERVER=127.0.0.1:9230

Reference: WebKit's own documentation, https://people.igalia.com/aperez/Documentation/wpe-webkit/remote-inspector.html.

7.2 The inspector's response shape on WebKitGTK is not /json

Many tutorials assume the Chromium DevTools Protocol shape: GET /json returns a JSON array of targets. WebKitGTK does not implement that endpoint. GET / returns an HTML page listing targets in a <table>, with WebSocket paths embedded in each row's onclick handler:

onclick="window.open('Main.html?ws=' + window.location.host +
                     '/socket/1/N/WebPage', ...)"

Parse the HTML for targets:

import re, urllib.request

html = urllib.request.urlopen("http://127.0.0.1:9230/").read().decode()
targets = re.findall(
    r'<div class="targetname">([^<]+)</div>.*?(/socket/1/\d+/WebPage)',
    html, re.DOTALL,
)
# targets is a list of (name, socket_path) tuples.
# Handy presents two targets during a normal run: the main settings
# webview and the recording_overlay webview.

Then open a WebSocket to each target at ws://127.0.0.1:9230<socket_path>.

<div class="targetname"> is the window TITLE, not the Tauri window LABEL. This is a subtle gotcha that will silently break filter-substring matching. Tauri's WebviewWindowBuilder::new(app, "recording_overlay", url).title("Recording") produces a window whose internal label is recording_overlay and whose inspector target name is the title-derived string (which may differ from the label and may have framework-added suffixes — observed Recording Overlay on WebKitGTK). Filter substrings on the title side, not the label side, and normalize whitespace and case before comparing — e.g., lowercase both sides and replace _ with spaces:

def matches(target_name: str, want: str) -> bool:
    norm = lambda s: s.lower().replace("_", " ").strip()
    return norm(want) in norm(target_name)

If you filter on the literal Tauri window label (recording_overlay) you will get zero matches and a quiet empty-records JSON output, which looks exactly like a different failure mode (lazy target registration, WS connection failure, etc.). Always normalize.

7.3 The Target.* multiplexer wrapping

WebKitGTK's inspector WebSocket protocol wraps every command and response in a Target.* envelope:

  • To send Inspector.enable to a target, you must wrap it:

    {
      "id": 2,
      "method": "Target.sendMessageToTarget",
      "params": {
        "targetId": "<targetId discovered earlier>",
        "message": "{\"id\":1,\"method\":\"Inspector.enable\",\"params\":{}}"
      }
    }
  • Responses and events from the target arrive wrapped as:

    {
      "method": "Target.dispatchMessageFromTarget",
      "params": {
        "targetId": "...",
        "message": "{\"method\":\"Timeline.eventRecorded\", ... }"
      }
    }

    Unwrap by parsing the inner message field as JSON.

To discover the targetId, listen for an initial Target.targetCreated event after connecting and use the targetInfo.targetId from that event.

7.4 The recording sequence

For each target's WebSocket, after the initial Target.targetCreated:

  1. Target.setPauseOnStart with {"pauseOnStart": false} (outer, not wrapped).
  2. Inside-target (wrapped via Target.sendMessageToTarget):
    • Inspector.enable {}
    • Timeline.start {"maxCallStackDepth": 5}
    • ScriptProfiler.startTracking {"includeSamples": true}
    • Heap.enable {}
  3. Drain all messages, accumulating into a list, until you decide to stop.
  4. Send the corresponding stop commands (wrapped):
    • ScriptProfiler.stopTracking {}
    • Heap.disable {}
    • Timeline.stop {}
  5. Drain a final batch of straggler messages, then close the WebSocket.

Persist the accumulated records to a JSON file per target with metadata including target name, socket path, and capture timestamp. A complete record file from a single recording cycle's overlay target typically runs to thousands of records (Timeline events dominate); the settings webview's record count is much lower (it is idle during a recording cycle if no UI interaction happens). Use this as a coarse sanity check on capture success.

7.5 The devtools Cargo feature

tauri = { features = ["devtools", ...] } must be enabled at build time for the Web Inspector to be accessible on a release-profile binary. In dev builds (debug_assertions true) it is automatic. The build in section 5.1 enables this; verify with the objdump check.


8. Tool 4 — source counters (compiled-in event counters)

uftrace cannot see closure bodies or inlined helpers; strace cannot see userspace-only call boundaries; the Web Inspector sees only JS. For paths that fall through all three, the remaining option is to compile a counter into the source.

This is the most-intrusive of the four tools. Use it sparingly. A prior attempt to bulk-inject counter calls at 56 paths via anchor-based regex edits produced 25 compile errors because anchors matched the middle of multi-line function signatures and turned &self, parameters into free arguments. Do not repeat that pattern.

8.1 When to add a source counter

Only when all of the following are true:

  • The path is not visible to uftrace (closure body without a symbol, or inlined-away helper).
  • It is not visible to strace (no syscall on the path).
  • It is not visible to the Web Inspector (it is Rust-side, not JS).
  • It fires at a rate that matters for the analysis you are doing.

Otherwise the path goes in the gap audit (section 15) without a counter.

8.2 How to add a counter safely

Edit the source file by hand, with the surrounding function signature fully visible. Do not use regex sed/awk replacement across the codebase.

A minimal counter primitive (place in src-tauri/src/instr.rs, behind a build feature so it is removable):

// Lightweight per-path counter. Increments are wait-free; periodic
// dumps go to stdout on a fixed cadence from a background thread.
#[cfg(feature = "instr-counters")]
pub mod counters {
    use std::sync::atomic::{AtomicU64, Ordering};

    macro_rules! counter {
        ($name:ident) => {
            pub static $name: AtomicU64 = AtomicU64::new(0);
        };
    }

    // Declare one counter per path of interest.
    counter!(LEVEL_CALLBACK_INVOCATIONS);
    counter!(SCHEME_HANDLER_HITS);
    // ...etc...

    pub fn bump(c: &'static AtomicU64) -> u64 {
        c.fetch_add(1, Ordering::Relaxed)
    }

    pub fn start_reporter() {
        std::thread::spawn(|| loop {
            std::thread::sleep(std::time::Duration::from_secs(1));
            let now = std::time::SystemTime::now()
                .duration_since(std::time::UNIX_EPOCH)
                .unwrap();
            // Emit one line per counter, prefixed with a fixed tag so
            // your harness can grep for it.
            println!(
                "[instr-counter] t={}.{:06} {} = {}",
                now.as_secs(),
                now.subsec_micros(),
                "LEVEL_CALLBACK_INVOCATIONS",
                LEVEL_CALLBACK_INVOCATIONS.load(Ordering::Relaxed)
            );
            // ...etc...
        });
    }
}

At the path you want to count (and only inside that closure / inlined helper):

#[cfg(feature = "instr-counters")]
crate::instr::counters::bump(&crate::instr::counters::LEVEL_CALLBACK_INVOCATIONS);

Build with --features instr-counters for instrumented runs; the feature gate keeps shipping builds clean.

8.3 What "good" counter output looks like

A counter that emits a wall-clock-timestamped line per second can be post-joined with the phase timeline (section 11.4) to produce per-phase rates without any uftrace involvement. The harness greps for the [instr-counter] tag in handy.stdout.log and emits a CSV row per (phase, counter) pair.

8.4 The failure mode to avoid

A counter installed in the wrong file scope, missing the feature gate, or with a typo in the path name produces silently-zero readings. Before trusting a counter, verify it produces non-zero output during a smoke test that you know fires the path.


9. Building the instrumented binary

Assembling the build from sections 5.1, 5.2, and 8 into one procedure:

cd src-tauri

# 1) Add the [profile.release-debug] block to Cargo.toml (section 5.1) and
#    temporarily add "devtools" to the tauri feature list (section 5.2).
#    Record the file's pre-edit sha256 so you can detect drift later.
sha256sum Cargo.toml > /tmp/cargo-toml-baseline.sha

# 2) Build with mcount + (optional) instr-counters.
#    NOTE: -Z instrument-mcount is unstable; requires `cargo +nightly`.
CARGO_TARGET_DIR=target-uftrace \
RUSTFLAGS="-Z instrument-mcount" \
cargo +nightly build \
  --profile release-debug \
  --bin handy \
  --features instr-counters       # omit if you are not using source counters

# 3) Verify the binary.
#    NOTE on the mcount check: under `set -uo pipefail`, the bare
#    `nm | grep -q ...` form is unsafe — once grep finds its match it
#    closes stdin, nm receives SIGPIPE, and pipefail surfaces that as a
#    pipeline failure (see section 14 pitfall 15). Use a counted form
#    instead, which avoids the SIGPIPE entirely:
BIN=target-uftrace/release-debug/handy
test -f "$BIN" || { echo "build failed"; exit 1; }
test "$(stat -c%s "$BIN")" -gt $((100*1024*1024)) || { echo "binary suspiciously small"; exit 1; }
# The actual symbol is `U mcount@GLIBC_X.Y.Z`; match on @-or-end-of-line.
MCOUNT_HITS=$(nm -D --undefined-only "$BIN" | grep -cE '^ *U mcount(@|$)')
[ "$MCOUNT_HITS" -gt 0 ] || { echo "no mcount symbol — flag silently no-op'd? Check nightly toolchain"; exit 1; }
# Real instrumented Handy builds show tens of thousands of mcount call sites.
CALL_COUNT=$(objdump -d "$BIN" | grep -c 'call.*<mcount@plt>')
[ "$CALL_COUNT" -gt 1000 ] || { echo "suspicious mcount call count: $CALL_COUNT"; exit 1; }
DEVTOOLS_HITS=$(objdump -d "$BIN" | grep -c 'webkit_settings_set_enable_developer_extras@')
[ "$DEVTOOLS_HITS" -gt 0 ] || { echo "no devtools call site"; exit 1; }

# 4) Build the Tauri-compatible wrapper layout (section 5.3).
WRAP=/tmp/handycap-wrap/target/release-debug
mkdir -p "$WRAP"
ln -f "$PWD/$BIN" "$WRAP/handy"
ln -snf "$PWD/target-uftrace/release-debug/resources" "$WRAP/resources"
touch "$WRAP/.cargo-lock"

# 5) Restore Cargo.toml after the build (the devtools edit should not
#    ride into a release build). Verify the restored sha256 matches the
#    baseline you saved in step 1.

The wrapper at $WRAP/handy is the binary to launch under uftrace

  • strace.

10. The coverage audit (a priori static enumeration)

Before you instrument, walk the source and enumerate every code path you expect to fire across the lifecycle. This produces coverage-map.csv, the static baseline against which the synchronized capture's coverage.csv is computed in synthesis.

10.1 Why do this first

A capture-only methodology suffers a fatal blind spot: paths that did not fire during a single cycle disappear from the trace, and you cannot tell the difference between "this path never fires in this configuration" (the alternative branch was taken; the feature is off) and "this path should have fired but didn't" (your trigger missed it; an upstream condition failed). The audit makes that distinction visible.

It also surfaces tool-coverage gaps before you capture: if a path is flagged "needs source counter" and you have not added one, you will know to either add the counter or accept the gap and document it in the gap audit.

10.2 The files to walk

Walk every Rust source file in src-tauri/src/. The set as of v0.8.3:

main.rs                      lib.rs                       overlay.rs
settings.rs                  utils.rs                     tray.rs
tray_i18n.rs                 input.rs                     clipboard.rs
actions.rs                   transcription_coordinator.rs audio_feedback.rs
signal_handle.rs             cli.rs                       portable.rs
llm_client.rs                apple_intelligence.rs

managers/audio.rs            managers/model.rs            managers/transcription.rs
managers/history.rs          managers/mod.rs              managers/transcription_mock.rs

audio_toolkit/mod.rs         audio_toolkit/constants.rs
audio_toolkit/text.rs        audio_toolkit/utils.rs
audio_toolkit/audio/device.rs    audio_toolkit/audio/recorder.rs
audio_toolkit/audio/resampler.rs audio_toolkit/audio/utils.rs
audio_toolkit/audio/visualizer.rs audio_toolkit/audio/mod.rs
audio_toolkit/vad/silero.rs  audio_toolkit/vad/smoothed.rs
audio_toolkit/vad/mod.rs

commands/audio.rs            commands/transcription.rs
commands/history.rs          commands/models.rs
commands/mod.rs

shortcut/mod.rs              shortcut/handy_keys.rs
helpers/clamshell.rs

And every TS file in src/:

main.tsx                     App.tsx                      bindings.ts
overlay/main.tsx             overlay/RecordingOverlay.tsx
stores/modelStore.ts         stores/settingsStore.ts
hooks/useSettings.ts         i18n/index.ts
components/...               (every component used during a recording lifecycle)

10.3 Cross-cutting sites to enumerate exhaustively

Independent of file walking, find and enumerate every:

  • tokio::spawn, std::thread::spawn, tauri::async_runtime::spawn, spawn_blocking (Rust thread/task creation)
  • .emit(, .emit_to(, .listen( (Rust → JS / JS → Rust events)
  • register_uri_scheme_protocol (custom schemes)
  • #[tauri::command] (commands)
  • Command::new (subprocess invocations)
  • invoke(, listen(, setInterval, setTimeout, requestAnimationFrame (TypeScript)

Re-derive these with grep so the list reflects current code, not a stale snapshot:

grep -rn "#\[tauri::command\]" src-tauri/src --include="*.rs"
grep -rn "\.emit(\|\.emit_to(\|\.listen(" src-tauri/src --include="*.rs"
grep -rn "register_uri_scheme_protocol\|Command::new" src-tauri/src --include="*.rs"
grep -rn "tokio::spawn\|thread::spawn\|async_runtime::spawn\|spawn_blocking" \
     src-tauri/src --include="*.rs"
grep -rn "invoke(\|listen(\|setInterval\|setTimeout\|requestAnimationFrame" \
     src --include="*.ts" --include="*.tsx"

10.4 Phase assignment

Every enumerated path is assigned to one or more lifecycle phases. The canonical phase set:

Phase Definition
boot Called during run() setup, Tauri builder, plugin init, initialize_core_logic
idle_pre_record After boot, before first trigger
trigger_start Direct result of the start trigger (signal, shortcut, CLI flag)
recording_init One-time per-cycle setup of audio (stream open, VAD load, visualizer construct)
recording_steady Fires per audio frame / poll tick during active recording
trigger_stop Direct result of the stop trigger
vad_finalize Resampler drain, VAD finalization at recording end
whisper_init Per-cycle model load (if not already loaded)
whisper_inference The blocking transcription call
transcription_postprocess Output filtering, custom-word substitution, history save, event emit
paste_invoke Clipboard write + the platform-specific paste path
post_paste Tray icon update, unload-watcher arming, overlay hide
idle_return After the cycle completes; idle-watcher polls; settings UI commands fired by user during idle

A function can fire in multiple phases (e.g., get_settings is called from many places). Record all applicable phases per row.

10.5 Tool assignment

For each row, indicate which tool(s) could observe it:

  • uftrace if the function has a stable Rust symbol (most non-closure functions).
  • strace if the function makes a syscall on the trace= list.
  • webinspector if it is a JS path.
  • source-counter if uftrace's symbol disappears (closure body, inlined helper). Mark the path explicitly; add or skip the counter per the policy in section 8.

10.6 The output schema

coverage-map.csv columns:

path,                  # fully qualified function name or event/scheme/syscall id
file,                  # source file relative to repo root
line,                  # line number of the definition
phase,                 # one phase from the table; multiple rows if multi-phase
observation_tool,      # uftrace | strace | webinspector | source-counter
notes                  # free-text; e.g. "closure body; uftrace sees parent only"

Save it at coverage-map.csv in your run directory. Hash it; the SHA256 will be part of the Phase 0 hard gate (section 11.1).


11. The synchronized capture

One Handy launch. All four tools recording simultaneously. Real audio. Real trigger. Real paste. Wall-clock anchors for every phase boundary.

11.1 The Phase 0 hard gate

Before launching anything, verify every input artifact exists at its expected path with its expected SHA256. This includes:

  • The instrumented binary (sha256 of the build output).
  • The wrapper-layout binary (same content, different path; the hardlink preserves the sha).
  • The coverage map you just produced.
  • The test audio file (a WAV of known content; PCM; a few seconds long with intelligible speech is enough to exercise VAD and Whisper).

Halt on any mismatch. A capture that proceeds past a failed hard gate is producing a measurement of a system you cannot identify.

expected_bin_sha=$(cat /path/to/expected-bin.sha)
actual_bin_sha=$(sha256sum /tmp/handycap-wrap/target/release-debug/handy | cut -d' ' -f1)
[ "$expected_bin_sha" = "$actual_bin_sha" ] || { echo "bin sha mismatch"; exit 1; }

expected_cov_sha=$(cat /path/to/expected-cov.sha)
actual_cov_sha=$(sha256sum coverage-map.csv | cut -d' ' -f1)
[ "$expected_cov_sha" = "$actual_cov_sha" ] || { echo "coverage map sha mismatch"; exit 1; }

# ...etc for every other input...

11.2 The launch nesting

The composed launch line is:

strace -f -yy -s 16384 \
       -e trace=execve,writev,write,sendmsg,sendto,recvmsg,recvfrom \
       -o /path/to/strace.log \
  uftrace record \
       -d /path/to/uftrace.data \
       --no-libcall \
       -- \
       /tmp/handycap-wrap/target/release-debug/handy --start-hidden

Why strace is outermost: if you reverse the nesting (uftrace record -- strace -- handy), uftrace inspects its first argv'd executable for mcount, sees strace (which has no mcount), and fails the instrumentation check rather than following the exec chain down to handy. With strace outermost, strace ptrace-attaches its direct child (uftrace), follows that exec chain via -f, and ends up tracing the syscalls of the final handy exec. uftrace, meanwhile, inspects the handy it directly invokes, finds the mcount symbol, and records via in-process LD_PRELOAD libmcount.so. The two layers do not interact — uftrace does not use ptrace, and strace does not intercept libmcount.

Environment for the launch:

export DISPLAY=:103
export WEBKIT_INSPECTOR_HTTP_SERVER=127.0.0.1:9230
# (Plus any others you would normally pass to Handy: RUST_LOG, etc.)

Launch:

nohup <composed-command> >/path/to/handy.stdout.log 2>&1 &
LAUNCH_PID=$!

11.3 The boot-complete wait

Do not start the cycle before the binary has finished booting. The canonical marker is "Shortcuts initialized" in stdout. Poll for it with a generous timeout (under composed instrumentation, boot can take 30–60 s on a slow host):

timeout 180 bash -c '
  while ! grep -q "Shortcuts initialized" /path/to/handy.stdout.log 2>/dev/null; do
    sleep 0.5
  done
'
phase_mark "shortcuts_initialized"

If this times out, the binary is broken (or your composition is introducing more overhead than the boot can absorb). Inspect handy.stdout.log for panics or unresolved permissions; do not proceed.

11.4 The phase marker log

Maintain a wall-clock-anchored log of phase boundaries. Every phase boundary writes one line <unix_ts>\t<iso_local>\t<label> to phase-timeline.log. The labels match the phases in section 10.4.

phase_mark() {
  ts=$(date +%s.%6N)
  iso=$(date -Iseconds)
  printf "%s\t%s\t%s\n" "$ts" "$iso" "$1" >> /path/to/phase-timeline.log
}

phase_mark "phase2-start"
phase_mark "launch"
# ...after boot wait...
phase_mark "shortcuts_initialized"

In a Python harness, the equivalent:

def phase_mark(label):
    ts = time.time()
    iso = time.strftime("%Y-%m-%dT%H:%M:%S", time.localtime(ts))
    phase_log.write(f"{ts:.6f}\t{iso}\t{label}\n")
    phase_log.flush()

11.5 Finding the main Handy PID

After boot, the Handy process is several levels down the strace → uftrace → wrapper-binary exec chain. Identify it by matching /proc/<pid>/exe:

HANDY_PID=$(
  for pid in $(pgrep -x handy); do
    exe=$(readlink /proc/$pid/exe 2>/dev/null)
    [ "$exe" = "/tmp/handycap-wrap/target/release-debug/handy" ] && echo $pid
  done | head -1
)
echo "HANDY_PID=$HANDY_PID"

You will need this PID for the Web Inspector connection (no — that connects via the HTTP port, not the PID), the SIGUSR2 trigger, and the PipeWire source-output move.

11.6 Connecting Web Inspector

In the harness (Python is convenient here for the WebSocket client):

  1. GET http://127.0.0.1:9230/, parse the HTML for targets (section 7.2).
  2. For each target, connect a WebSocket, wait for Target.targetCreated, extract targetId, and start Timeline + ScriptProfiler + Heap (section 7.4).
  3. Spin a background thread per session that accumulates records until the orchestrator signals stop.

11.7 Audio injection (PipeWire null sink)

Already covered in section 4.3. Two operational notes specific to the synchronized capture:

  • Start paplay BEFORE the SIGUSR2 trigger so a buffer exists at the moment recording begins.
  • The source-output move must happen AFTER recording starts (the source-output does not exist until Handy opens its input stream). Poll for it with a 10-second deadline.

11.8 The trigger sequence

phase_mark "paplay_start"
paplay --device="$SINK" --volume=65536 /path/to/test.wav \
  >/path/to/paplay.log 2>&1 &
sleep 2

phase_mark "sigusr2_start"
kill -USR2 $HANDY_PID

# Move Handy's source-output to the null-sink monitor (poll for it).
for i in {1..50}; do
  SO_ID=$(pactl list source-outputs |
          awk '/^Source Output #/{id=$3} /PipeWire ALSA \[handy\]/{print id; exit}')
  [ -n "$SO_ID" ] && break
  sleep 0.2
done
[ -n "$SO_ID" ] && pactl move-source-output "$SO_ID" "${SINK}.monitor"

phase_mark "recording_steady_window_begin"
sleep 10                # the recording-steady window
phase_mark "recording_steady_window_end"

# Capture a baseline count of paste-method log lines BEFORE sigusr2_stop.
# A naive "count > 0" check breaks for multi-cycle runs: cycle 1's marker
# remains in stdout, so cycles 2 and 3 see count > 0 immediately and skip
# the actual wait. Always track per-cycle increments by comparing the
# post-stop count to the pre-stop baseline.
BASELINE_HITS=$(grep -aciE 'paste method' /path/to/handy.stdout.log 2>/dev/null || echo 0)

phase_mark "sigusr2_stop"
kill -USR2 $HANDY_PID

# Wait for THIS cycle's transcription completion. The wait condition is
# that the paste-method line count has INCREASED past the baseline — not
# that it is simply nonzero.
#
# Avoid the `grep -q | pipefail` form (see section 14 pitfall 15); use a
# count-and-compare form which is SIGPIPE-safe.
timeout 60 bash -c '
  while :; do
    N=$(grep -aciE "paste method" /path/to/handy.stdout.log 2>/dev/null || echo 0)
    [ "$N" -gt "'$BASELINE_HITS'" ] && break
    sleep 1
  done
'
phase_mark "post_transcribe_paste"

# Trailing slack: Whisper inference + paste pipeline can run ~17 s on
# desktop CPUs/GPUs after the LAST cycle's sigusr2_stop. The historical
# `sleep 5` here is too tight for the final cycle. Use ≥ 20 s to ensure
# the last paste line is flushed to stdout before phase2-end.
sleep 20                # capture any straggler events including final paste
phase_mark "idle_end"

SIGUSR2 is a deliberate choice. The Unix signal handler invokes the in-process trigger function directly; no second Handy instance is launched. The CLI --toggle-transcription would launch a second process that initializes Whisper/Vulkan before handing off via single-instance — under instrumentation overhead, this can cascade into resource pressure that distorts the trace.

11.9 Teardown

# Stop paplay.
pkill -f "paplay.*$SINK" 2>/dev/null

# Signal stop to the Web Inspector sessions; harness joins them and writes
# per-target JSON to disk.

# SIGTERM the launch process tree to let uftrace flush its buffers.
kill -TERM $LAUNCH_PID
wait $LAUNCH_PID 2>/dev/null

# Reap any remaining wrapper-binary children.
pgrep -f handycap-wrap | xargs -r kill -TERM
sleep 2
pgrep -f handycap-wrap | xargs -r kill -KILL

# Unload the null sink.
pactl list short modules | awk -v sink="$SINK" '$0 ~ sink {print $1}' \
  | xargs -r -n1 pactl unload-module

# Stop Xvfb.
kill -TERM $XVFB_PID

phase_mark "phase2-end"

12. Post-hoc synthesis: producing the four deliverables

Now convert the raw artifacts into the four CSV / Mermaid deliverables.

12.1 Reading the phase timeline

import csv
phase_anchors = {}
with open("phase-timeline.log") as f:
    for line in f:
        ts, _iso, label = line.rstrip("\n").split("\t")
        if label not in phase_anchors:           # first occurrence wins
            phase_anchors[label] = float(ts)

t0 = phase_anchors.get("launch", phase_anchors["phase2-start"])
phase_offsets = {l: ts - t0 for l, ts in phase_anchors.items()}

phase_offsets is the map from label → seconds-since-launch you will use to assign observations to phases.

12.2 Building execution-trace.csv from uftrace

# Parse uftrace report (one line per function, columns: total_v total_u
# self_v self_u calls function).
import re

uftrace_rows = []
with open("uftrace-report.txt") as f:
    for ln in f:
        m = re.match(r"\s*([\d.]+)\s*(\w+)\s+([\d.]+)\s*(\w+)\s+(\d+)\s+(.*)$", ln)
        if not m: continue
        total_v, total_u, self_v, self_u, calls, fn = m.groups()
        # Convert (value, unit) → seconds.
        def to_s(v, u):
            return float(v) * {"ns":1e-9,"us":1e-6,"ms":1e-3,"s":1,"m":60}.get(u,0)
        uftrace_rows.append({
            "tool": "uftrace",
            "name": fn.strip(),
            "calls": int(calls),
            "total_s": to_s(total_v, total_u),
            "self_s": to_s(self_v, self_u),
        })

For each row, assign a phase using the chrome-dump timestamps (next section) or — coarser — by joining on the function's expected phase from coverage-map.csv. Add columns: phase, frequency_hz (calls / phase_duration_s).

12.3 Building call-graph.csv from the chrome dump

import json
from collections import Counter, defaultdict

edges = Counter()
per_thread_stack = defaultdict(list)
with open("uftrace-dump-chrome.json.txt") as f:
    for line in f:
        line = line.strip().rstrip(",")
        if not line.startswith("{"): continue
        try: ev = json.loads(line)
        except json.JSONDecodeError: continue
        ph, name, tid = ev.get("ph"), ev.get("name",""), ev.get("tid")
        if ph == "B":
            stack = per_thread_stack[tid]
            if stack:
                edges[(stack[-1], name)] += 1
            stack.append(name)
        elif ph == "E":
            stack = per_thread_stack.get(tid, [])
            if stack: stack.pop()

with open("call-graph.csv", "w") as out:
    w = csv.writer(out)
    w.writerow(["caller", "callee", "count"])
    for (caller, callee), c in edges.most_common():
        w.writerow([caller, callee, c])

12.4 Adding strace and Web Inspector rows to the execution trace

# strace: count syscall occurrences per phase using PID-level filtering
# (the WebKit subprocesses appear as distinct PIDs).
# Web Inspector: from each target's records list, count by record["method"]
# (Timeline.eventRecorded, ScriptProfiler.events, Heap.garbageCollected, etc.)
# and aggregate by inner record["params"]["record"]["type"] for the
# Timeline events.

# Append rows to execution-trace.csv with tool="strace" or tool="webinspector"
# and an appropriate name (e.g., "syscall:sendmsg", "Timeline.eventRecorded:EventDispatch").

12.5 Building coverage.csv from coverage-map.csv

# Read coverage-map.csv. For each row, determine whether it fired:
#  - uftrace rows: look up function name in execution-trace.csv
#  - strace rows: look up syscall / event name occurrences in strace.log
#  - webinspector rows: look up the event in the JSON record list
#  - source-counter rows: look up the counter name in handy.stdout.log
#
# Output coverage.csv with: path, file, line, phase, observation_tool,
# fired (bool), fire_count, reason (if not fired):
#   - "did not fire in this cycle (alternative branch / feature off)"
#   - "needs source-counter; not instrumented in this run"
#   - "expected only in <other phase>; phase not covered"
#   - "tool cannot observe (FFI internal)"

12.6 Building execution-flow.mmd

The Mermaid sequence diagram is the most opinionated of the four deliverables; it is the one a human will read first. Use this template and fill it from the empirical observations:

sequenceDiagram
    autonumber
    participant U as User
    participant H as Handy main<br/>(Rust)
    participant A as Audio thread<br/>(cpal)
    participant J as Overlay JS<br/>(WebKit subprocess)
    participant X as External<br/>subprocess

    Note over U,X: BOOT (<duration>s)
    U->>H: launch
    H->>H: <observed boot work>
    H->>J: spawn WebKitWebProcess + load webviews
    J-->>H: listen(...) registrations via Tauri IPC

    Note over U,X: TRIGGER START (<t>s)
    U->>H: SIGUSR2 (start)
    H->>A: <observed audio-start work>
    H->>J: emit("show-overlay") via wry InnerWebView::eval

    Note over A,J: RECORDING STEADY STATE (<window>s)
    rect rgb(245, 245, 235)
        loop <observed Hz> audio callback
            A->>A: <observed audio work>
            A->>H: <observed level path>
        end
        loop <observed Hz> JS work
            J->>J: <observed JS work>
            J->>H: <observed JS→Rust call, if any>
            H-->>J: <observed Rust→JS response, if any>
        end
    end

    Note over U,X: TRIGGER STOP (<t>s)
    U->>H: SIGUSR2 (stop)
    H->>A: <observed audio-stop work>
    H->>J: emit("hide-overlay") via wry InnerWebView::eval

    Note over H,X: TRANSCRIBE + PASTE (<duration>s)
    H->>H: <observed VAD finalize + inference work>
    H->>H: clipboard write
    H->>X: spawn <observed paste tool>
    X-->>H: subprocess exit

    Note over H,X: IDLE RETURN
    H->>H: idle-watcher polls

Fill the angle-bracketed placeholders from your execution-trace.csv and phase-timeline.log. Do not invent observations to fit the diagram.


13. Validation contracts (per tool and end-to-end)

Each artifact has a quantitative validation contract. A run that does not satisfy them is not a valid capture, regardless of how complete the synthesis CSVs look.

13.1 Per tool

Artifact Validation
uftrace.data/ Size >= 30 MB after a 10-second recording cycle; uftrace report produces a non-empty table; >= 100 distinct Rust symbols appear
strace.log Size >= 50 KB; at least one literal occurrence of every known event name registered by the frontend's listen() calls (a per-listener registration line appears in the boot IPC traffic); sendmsg calls annotated with UNIX:[...] confirm the SOCK_SEQPACKET socketpair
webinspector-recording_overlay.json >= 100 records; at least one Timeline.eventRecorded record; at least one Heap.garbageCollected or equivalent confirms the Heap domain accepted commands
webinspector-settings.json >= 50 records; the smaller threshold reflects an idle main webview during a recording cycle (no UI interaction triggered)
handy.stdout.log Contains "Shortcuts initialized" (boot complete); contains transcription-complete and paste-complete markers; if source counters were enabled, contains [instr-counter] lines for each declared counter
phase-timeline.log One row per phase marker; monotonically increasing timestamps; recording_steady_window_end - recording_steady_window_begin matches the intended recording duration within ±0.5 s

13.2 End-to-end

# A capture is end-to-end-valid iff:
#  1. Every artifact above passes its individual validation.
#  2. The cycle includes a transcription. handy.stdout.log contains
#     evidence of the transcription text being written.
#  3. Coverage of paths flagged "expected in recording_steady" is
#     >= 80%. (If you tightened the audit, raise this threshold.
#     If your audit is loose, lower it and document why.)
#  4. There is non-zero IPC traffic in the strace log during the
#     recording_steady window (filter by timestamp).
#  5. There is non-zero JS Timeline activity in the overlay webview
#     during the recording_steady window.

13.3 What "PASS" must mean

A phase is PASS only when:

  • The phase reached its terminal action without an exception.
  • The artifact it produces satisfies the validation contract above.
  • The reason string explains WHY it passed in terms of measured properties of the artifact, not in terms of "no errors were encountered."

A run that records "PASS" without a measured property is not really PASS; mark it FAIL and reproduce.


14. Common pitfalls and the failure modes they cause

These are time-saving warnings. Each is a real way the procedure can silently produce useless data.

  1. WEBKIT_INSPECTOR_SERVER instead of WEBKIT_INSPECTOR_HTTP_SERVER. First produces a binary-handshake socket; nothing HTTP-shaped can connect to it. Spend an hour debugging "empty reply from server" before realizing.
  2. --toggle-transcription instead of SIGUSR2. The CLI flag launches a second instance that performs heavy initialization before handing off via single-instance. Under instrumentation overhead this can cause an OOM cascade that kills your capture mid-cycle.
  3. strace -s set too low. The default truncates Tauri event payloads. You will see syscalls but not the event names that make them recognizable.
  4. strace attached after boot. strace -p follows forks from now; pre-attach WebKit subprocesses retain their pre-attach state. Always launch the target under strace.
  5. uftrace as outermost rather than innermost. Reverses the exec chain inspection; uftrace inspects strace (no mcount) instead of handy (has mcount) and fails.
  6. lto = true or strip = true in the release-debug profile. Inlined symbols disappear from the trace; backtraces show ??.
  7. devtools feature not enabled in the tauri Cargo dep. The Web Inspector port opens but no targets are listed because webkit_settings_set_enable_developer_extras is never called.
  8. Tauri's target/parts[len-3] resource-resolution quirk. A binary at target-uftrace/release-debug/handy will fail in tray init with an unhelpful error. The hardlink-into-wrapper fix (section 5.3) is the only path forward.
  9. bun run tauri dev for a capture run. Loads JS over Vite HMR port 1420 instead of the bundled dist/. The JS that runs is not the JS that ships. Always build via bun run build first and launch the binary directly.
  10. A still-running user-installed Handy. The single-instance plugin will block your launch silently. Always check first.
  11. Env-var-based per-app PipeWire/PulseAudio redirection. Does not work for cpal-ALSA-PipeWire on Handy. The only validated method is pactl move-source-output per stream cycle.
  12. Loaded null sinks survive process death. WirePlumber's stream-restore cache then binds the user's installed Handy to a sink that no longer exists, breaking their voice-to-text silently the next time they use it. Always register an unconditional cleanup that runs on every exit.
  13. Anchor-regex bulk source-counter injection. Anchors match inside multi-line function signatures, landing counter calls before &self, and producing 20+ compile errors. Edit source by hand, with the full signature visible.
  14. Trusting "no errors" as the success criterion. A capture that records zero observations is "error-free." Validate by measured properties of the artifact (size, content match), not by exception-absence.
  15. set -uo pipefail + grep -q after a pipe → SIGPIPE → false-fatal. Once grep -q finds its first match it closes stdin; the upstream command (nm, objdump, cat, anything streaming through the pipe) then writes to a closed pipe and gets SIGPIPE. Under set -uo pipefail the script treats that as a pipeline failure and bails, even though the grep succeeded. Symptom: build-verify gates fail on builds you can manually confirm are correct. Fix: replace every nm | grep -q ... / objdump | grep -q ... with a counted form (COUNT=$(nm | grep -c ...); [ "$COUNT" -gt 0 ]). The count form drains the pipe fully so no SIGPIPE happens.
  16. grep -q cumulative-count wait logic for multi-cycle runs. Writing a per-cycle "wait for transcription marker" loop as while ! grep -q PATTERN handy.stdout.log works for the first cycle but breaks for cycles 2+: cycle 1's marker is still in stdout, so the condition is already true at cycle 2's wait. The script proceeds immediately and the actual transcription may finish AFTER the script tears down — losing the last cycle's paste / transcription stdout. Fix: capture a baseline count BEFORE the trigger-stop signal, then wait for the count to increase past it. (Section 11.8 shows the corrected form.)
  17. Filtering Web Inspector targets by Tauri window LABEL instead of TITLE. <div class="targetname"> is the window title, not Tauri's internal label. A recording_overlay-labeled window may have an inspector target name of Recording Overlay. Filter on the title side and normalize case + _ ; otherwise the harness silently records zero records and you debug the wrong failure mode (lazy load, WS connection, etc.). See section 7.2.

15. Gap audit: what this procedure does not capture

A complete instrumentation report documents not just what it captured but what it could not. Template:

# Out-of-reach surface What would reach it
1 C/C++ internals of dylibs (libwhisper, libwebkit2gtk, libgtk, libonnxruntime, libcuda, libasound) A custom build of each dylib with -fno-omit-frame-pointer and either uftrace --libcall or a heaptrack run
2 WebKit subprocess internals (WebKitWebProcess, WebKitNetworkProcess) beyond the JS layer A heaptrack attach to the WebKit subprocess PID, or the WebKit Inspector connected to the WebProcess directly (vs. through the UIProcess proxy)
3 Sub-millisecond JS work below ScriptProfiler's sampling resolution Explicit performance.now() instrumentation in JS, or Heap.startSampling at a higher rate
4 React reconciler "why did this render" semantics React DevTools Profiler attached at the React commit phase boundary (visible here only as react-dom stack samples)
5 Kernel activity below the syscall boundary eBPF / bpftrace. Note that some systems disable unprivileged BPF (unprivileged_bpf_disabled=2); if so, this is unreachable without root
6 cpal-PortAudio internals and the ALSA hw-mmap path uftrace --libcall against a debug-built libasound, plus strace -e read on the snd_pcm_mmap FD
7 ONNX Runtime kernels invoked by the Silero VAD model The ONNX Runtime profiler API (SessionOptions.EnableProfiling) — a build-time switch
8 Source-counter closures not instrumented in this run Per section 8: add hand-edited counters at the specific closure body, build with --features instr-counters, re-run

This list is template; expand or contract based on what you decided to include in your composition. The point is to make the surface area explicit.


16. Reference: flags, env vars, paths, commands

16.1 The launch command, in full

strace -f -yy -s 16384 \
       -e trace=execve,writev,write,sendmsg,sendto,recvmsg,recvfrom \
       -o $RUN_DIR/strace.log \
  uftrace record \
       -d $RUN_DIR/uftrace.data \
       --no-libcall \
       -- \
       $WRAP/handy --start-hidden

With environment:

DISPLAY=:103
WEBKIT_INSPECTOR_HTTP_SERVER=127.0.0.1:9230
RUST_LOG=handy_app_lib=trace,handy=trace        # optional

16.2 Build commands

# Cargo.toml temporary edits: add [profile.release-debug] and the
# "devtools" feature on the tauri dep. Remember to revert.

CARGO_TARGET_DIR=target-uftrace \
RUSTFLAGS="-C instrument-mcount" \
cargo build --profile release-debug --bin handy \
  --features instr-counters

# Wrapper layout (Tauri target/parts[len-3] gotcha):
WRAP=/tmp/handycap-wrap/target/release-debug
mkdir -p $WRAP
ln -f src-tauri/target-uftrace/release-debug/handy $WRAP/handy
ln -snf $PWD/src-tauri/target-uftrace/release-debug/resources $WRAP/resources
touch $WRAP/.cargo-lock

16.3 Phase markers (the canonical set)

phase2-start
launch
shortcuts_initialized
paplay_start
sigusr2_start
recording_steady_window_begin
recording_steady_window_end
sigusr2_stop
post_transcribe_paste
idle_end
phase2-end

16.4 Synthesis output schema

execution-trace.csv:

phase, tool, name, file, line, calls, total_s, self_s, frequency_hz

call-graph.csv:

caller, callee, count

coverage.csv:

path, file, line, phase, observation_tool, fired, fire_count, reason

status-summary.md: per-phase status + the measured properties on which each PASS rests.

16.5 Validation thresholds (revisit per-host)

These are starting points; tune to your hardware:

  • uftrace.data size: >= 30 MB for a 10-second cycle on a desktop-class CPU. Under heavier instrumentation or longer cycles this scales linearly.
  • strace.log size: >= 50 KB; expect 10–100 MB for a full cycle.
  • webinspector overlay records: >= 500 for a cycle with an active level meter; >= 50 for an idle webview.
  • handy.stdout.log: presence of "Shortcuts initialized" is the boot marker; transcription / paste markers depend on app version.

16.6 Tools you need installed

uftrace >= 0.15
strace
pactl, paplay        (pulseaudio/pipewire utils)
Xvfb, xdpyinfo       (X virtual framebuffer)
nm, objdump, c++filt (binutils)
python3              (for the synthesis harness)

Optional:

heaptrack            (for the gap-audit follow-ups in section 15)
gdb                  (for the rare interactive forensics)

16.7 What you do NOT need

  • sudo. The entire procedure runs as the unprivileged user.
  • A modified kernel. uftrace uses LD_PRELOAD; strace uses ptrace at ptrace_scope <= 1; the Web Inspector is a localhost HTTP server.
  • A custom WebKit / wry / Tauri build. Stock libwebkit2gtk and the patched-but-released Tauri runtime in this repo are sufficient.

Closing

The procedure above produces a complete, validated execution-flow map of one transcription lifecycle. To extend to multiple cycles, wrap the trigger sequence (section 11.8) in a loop, phase-mark each cycle boundary, and re-run synthesis. The audit, the per-tool validation, and the gap audit remain unchanged.

When in doubt: instrument less, measure twice, validate every artifact before drawing any conclusion. A trace that you trust is worth a hundred traces you don't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment