latentflip/run-time-debugger-and-loupe.md

## run-time-debugger-and-loupe.md

      
    Raw
  

              run-time-debugger-and-loupe.md
            
          
    Run-time Debugger

This week I've been hacking around with some ideas about a runtime debugger for node (and probably browser JS too). These build upon some of the ideas loupe.
Run-time debugger is probably the wrong phrase, but I can't yet think of a better one - essentially what I mean is that all the code for building instrumenting/visualisating/debugging would be written in "userland" JS, rather than embedded in the JS engine itself.
This would obviously not be as powerful as a native engine debugger, with full access to the internals of v8 (or whichever engine), but I think the potential for hackability, and building things on top of it, is kinda interesting.
Loupe

Loupe had a few hurdles to clear to get it to work:

Instrumenting your code with the right hook/events to build a visualisation from, which involves:

Parsing the code with esprima into an AST
Walking through the AST, and finding things we are interested (in loupe's case mainly function calls), and modifying the ast to inject some code that will notify our visualisation when things happen at runtime
Turning the AST back into code, that can then be run


Slowing down the code as it runs, so that we can see what's happening:


This basically involved injecting this between every statement when we modified the AST. We can't inject setTimeouts, as we need to actually block the running code.
 var start = Date.now();
 while(Date.now() < start + delay) {/* do nothing */}


Getting your code to run in a webworker

Loupe essentially needs two separate, but communicating, runtimes to work. One to run the instrumented code in; and the other to run the visualisation in. All those while loops injected into the running code, will completely lockup the runtime that the instrumented code is running in, so we need a separate one to render the visualisation.
The only way to achieve this in a single webpage is web-workers (even iframes share the JS runtime under the hood). Given that the visualisation needs to touch the dom, it has to be in the main runtime (colloquially (if inaccurately) "the UI thread"); meaning the running code must be in the web-worker.
Actually running the code in a webworker: since it's custom code (code the user wrote into the editor, which has been instrumented), we need to basically eval the code in a webworker, and use post-message, I wrote weevil, an npm module to help with this.
In addition to all the instrumentation, we need to shim certain extra calls so that they can work in a webworker/we get information from them.


Shim a bunch of stuff in the running code:


This basically involved writing code that would run both in the web-worker and the UI thread, to communicate with each other over postMessage, so we can get the behaviour/visualisation we need:


console.log: doesn't work in a webworker, so we override it in the webworker code, and replace it with a function that posts data back to the visualisation with postMessage


setTimeout: we don't have to shim this, but if we want to know when timers are created, so we can render a visualisation of when they are "done", and queued in the callback queue we have to add some extra hooks to this


DOM querying: web-workers don't have access to the DOM, so you wouldn't be able to write code that attached DOM events. To handle this, I provide $.on, which is a simple jquery-like shim, that actually hacks stuff back and forth with postMessage to attach the event listeners in the UI thread, but get the right callback called inside your code. So you can attach handlers like:
 $.on("button", "click", function () { console.log("hello") }


ajax/other async things: I never quite got round to this. You wouldn't have to shim ajax calls to get them to work, but if you wanted to render a visualisation of when the call was started/completed/the callback was executed, you would have to.


Handling pause/resume/changing speed

This was something I hacked in at the last minute. In loupe's architecture there's no way for you to actually pause/change the speed of the running code. To do so would require changing the delay inserted between statements as the code was running (a pause is just a very long delay). To do that would require synchronous access to the code running in the webworker, which isn't possible.
So I faked it: when you hit pause, the code in the webworker continues to run, but the visualisation ignores any events after you hit pause, and records the exact delay which you hit pause on (the webworker code reports each delay id to the ui thread). When you hit resume, the visualisation restarts the web-worker, telling it to fast forward (skip all delays effectively) until it gets to the one that it was paused on.

This is even more messy/complex than it sounds, as you have to also re-inject any dom events that the user did (clicking stuff) as the code re-reruns, to make the states match up when they resume.


Changing speed is effectively a pause-restart-resume, but resuming with a different speed.


Improving on Loupe

There are a few things that I always wanted to be able to do/fix with loupe, that I couldn't see how to at the time. Ultimately I'd love to be able to run loupe, or something like it, on an actual codebase, instead of just on demo code.

Improved AST mangling: the combination of tooling I used to hack the AST was okay for a single file, but would never work across multiple files in a "real codebase "effectively. It'd also be great to be able to track, e.g., the values of variables in scope at various points.
Make it work for node code, not just browser code: this would require the visualisation running in a browser, and the node code running in, well node, and somehow communicating.
Remove the need to shim everything: having to shim/fake e.g. DOM code, means that the visualisation cannot be run on arbitrary code.

It's not clear whether we'd be able to get the same granularity of visualisation without shimming (for example, you can't know that a timer has finished running and it's callback "queued" until it's actually been de-queued and run, so it's hard to render the callback "queue").


Implement proper pause/resume: if only there was a way to actually implement pause/resume, without having to track state everywhere and restart the whole codebase everytime (which would be impossible on any real codebase).

I'm not sure yet whether I can achieve all of the above, and still have the same level of detail in the visualisations and get it all to run in the browser, but if I could write a loupe-for-node that was more powerful, that would make me pretty happy.