Skip to content

Instantly share code, notes, and snippets.

@groundwater
Last active August 29, 2015 14:05
Show Gist options
  • Save groundwater/942dad5c0c4cfae21af9 to your computer and use it in GitHub Desktop.
Save groundwater/942dad5c0c4cfae21af9 to your computer and use it in GitHub Desktop.

Tracing Summit

On Aug 8, 2014 the following group gathered in Vancouver, BC to discuss the state and future of node.js tracing. The goal of this ground was to summarize their current approaches to tracing, their respective goals in tracing, and what points were particularly painful in today in node.js versions 0.10/0.11.

Problems

Many users of node have both performance and business questions about their applications. There are several companies all wishing to provide insight into node applications, however node.js makes several important questions particularly difficult to answer without a great deal of dynamic monkey-patching, source-rewriting, or native (c++) modules.

The current methods can be error prone, for example monkey-patching is not necessarily stable across module updates. There is also a performance impact with low-level tracing. In addition, not all environments can host native modules easily.

  • A user schedules a callback with setTimeout. If an error occurs in the callback, the user would like to know details about how the callback was scheduled. This is similar to the problem solved by long stack traces.
  • A user has an http server, and an analytics module would like to transparently measure the request/response latency of each connection.
  • An incoming http connection kicks off a series of background tasks. The http response is complete, but we would like to know when all the spawned background tasks have also completed.
  • We want to measure the latency of a web request that traverses a redis connection pool.
  • An event emitter emits an error. We want to capture and record the error without altering the behavior of the application.

General Requirements

  1. Track events related to I/O resources.

    The async nature of node makes observing cause-and-effect relationships difficult over time. For example, a web service may kick off a number of external requests and background tasks due to an incoming request. There is a real need to associate those actions together, and to know when those actions have all completed.

  2. Loosely coupled, but structureable data.

    Many tracing modules attempt to build a structured representation of the applicationg, where async callbacks are associated with their initiating contexts. These boundaries however can be fluid, and are often not well-defined. Each tracing module needs the freedom to structure data, and draw boundaries as it sees fit.

  3. Dynamic capture of arbitrary metadata.

    During any set of events, we wish to gather arbitrary metadata about the current program. The type of data is highly dependent on the goals of the module doing the tracing, and may include request-specific information such as SQL queries and POST parameters, or aggregate information like CPU level or memory usage.

  4. User-facing API.

    We would like both JavaScript and C++ modules to be able to emit events into a unified API.

Problems

  1. Monkey-patching slows execution and performance
  2. Async back-traces must be exposed through monkey-patching
  3. Breaking continuations by multiplexing async activity (e.g. redis connection pipeline)
@brycebaril
Copy link

In terms of Problems: it is also almost impossible to fully maintain all possible API constructs with monkey-patching. E.g. maintaining function identity (function.length) or comment-based dependency injection, or content checks/checksums, etc.

@Qard
Copy link

Qard commented Aug 19, 2014

For the sake of having everything in one place, here's my own prototype I put together before the meeting: https://github.com/Qard/stacks-concept

@brycebaril I like the state transition ideas in there. I'd like to see a bit more explicit conventions around how to handle continuation from one transaction into another. You could simply pass the parent id into the metadata of the child, but I feel like it might need to be standardized a little more to be something the community can pick up and run with without needing to coordinate.

I'd argue that, considering uses like zones, giving the decision of where a transaction starts and ends to module writers might not work too well. I'd prefer an event emitter that I can read event metadata from to derive meaning and build structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment