Skip to content

Instantly share code, notes, and snippets.

What would you like to do?

Tracing Spec

Tracing is about tracing the structure of an IO computation. Basically, giving some insight into the graph formed by the IO. Tracing is accumulated as the graph is constructed (e.g. by instantiating an IO.Bind node) and reified at the following points:

  • Whenever requested by backtrace: IO[Trace] or printBacktrace: IO[Unit]
  • Whenever an error case is instantiated (IO.Error)

Whenever an exception is rethrown by the interpreter (most commonly in unsafeRunSync()), the associated trace information for that exception should be put into a TracedException wrapper (with cause = the original exception) and that exception should be thrown. This allows backtrace information to participate in side-effectful error recovery and reporting mechanisms. We should consider whether or not we want to also do this with unsafeRunAsync. I'm leaning toward "yes". We should not wrap in TracedException on attempt, raiseError, or any of the other pure error reification mechanisms. Not only would that not be lawful, it would be immensely annoying for users.

Tracing should have three modes:

  • slug
    • Disables caching and attempts to gather more information from stack traces (e.g. skipping over monad transformers and abstracted implementations, such as FlatMap#productR)
  • rabbit
    • Fast convergent global identity cache with nominal amortized cost. When there's a cache miss, walks the stack trace from new Throwable().getStackTrace and attempts to find the "best fit" stack frame for the type of the closure that it was passed (will probably require unmunging the class name of the thunk a bit). For example, something like org.http4s.StarterKit$setup$$anonfun$$Lambda$427 should be a best fit to the first stack frame involving StarterKit#setup if at all possible, but if there's an earlier stack frame involving StarterKit, then we should probably take that instead. Alternatively, if we fail to find StarterKit#setup or even StarterKit anywhere in the stack trace, then we should take the highest element involving the package, hierarchically (i.e. highest element involving http4s, and failing that, the highest element involving org). I realize this heuristic is in conflict with the one involving the containing type name, which takes precedence when it's higher. We can resolve this by relying on capitalization as a convention, or we can just unify the heuristics and look for setup first, and failing to find that, go with the "highest hierarchical fit" answer.
  • disabled
    • No tracing at all. Doesn't touch the cache. Skips all code. Should be a single boolean conditional jump at each call site, for better pipeline analysis.

Tracing should be configured by functions on IO. For example, ioa.debugTrace: IO[A], ioa.fastTrace: IO[A] (the default), ioa.noTrace: IO[A]. This should be threaded through the run loop, so it is in fact lexical around the entire execution. Inner-most will naturally win in this case in the event of conflicts (e.g. ioa.fastTrace.noTrace will have rabbit mode tracing enabled). I expect normal usage of these functions would be to call them immediately prior to evaluation via an unsafe function, or equivalently, immediately prior to handing off an effect to some enclosing framework (such as via an http4s Response).

It occurs to me that the definition sites for flatMap and map don't have access to IORunLoop state. Hmm. Maybe this has to be a global? We need to think about this more. If it has to be a global then it should be IO.fastTrace: IO[Unit], IO.noTrace: IO[Unit] etc, since users would need to sequence the effect of mutating the global. I feel like lexical scoping is way saner though, so perhaps the right approach is a mixture of the two:

  • A system property to set the default global trace mode (we probably need to cache this, since I think system property access is relatively slow)
  • A lexical property exposed in some way via the IORunLoop. The easy answer here is a ThreadLocal, but that's going to be very slow. I'm not 100% sure what to do about this.

Explicit Trace Frames

We could support something like IO.recordTrace: IO[Unit], which takes a source position implicit macro thingy (probably copy/pasted to avoid dependencies), and which explicitly records a location in the trace. That would probably be cool.

A Note on Map Fusion

We probably do not want to have it at all. We should try a quick and dirty fs2 benchmark with and without it enabled just to see, but I'm willing to bet that it has almost no material impact, and obviously it makes this sort of stuff slower and more annoying because it forces us to contain a List[Trace] in the IO.Map node. If it turns out to be a non-impactful optimization (as I suspect), then we should remove it along with adding the tracing.

Supported Entry Points

The following major functions (and equivalents) should be traced in rabbit mode:

  • delay/apply
  • async/cancelable
  • map
  • flatMap
  • …other thunked functions, such as >> (note that we can't do *> because it's eager)

In slug mode, we should trace all of the above, in addition to:

  • *> / <*
  • raiseError (note that we might be able to do this in rabbit mode, too, if we allow the cache key to be either a IdentityContainer[Throwable] or a Class[_])
  • start?
  • ...other eager structural functions
Copy link

RaasAhsan commented Apr 26, 2020

I took a stab at trying to implement this with ThreadLocal for lexically-scoped tracing, here's how far I got: . To put it bluntly, it doesn't seem like it's going to be easy or feasible.


  • Performance: if we want to be able to capture stack traces at every IO instruction, there's obviously going to be a cost. ThreadLocal.get is expensive, so we could protect it by a global boolean flag like you mentioned. But if we only want to trace a small portion of our application, most instructions are going to have to test two conditionals before determining whether or not it's going to capture a stack trace. So however you slice it, its not going to be very performant (though I haven't personally benchmarked it), regardless of whether you want it or not.
  • Cancellation: tracing status is kept in ThreadLocal, which doesn't work well with cancellation. If an actively traced fiber is concurrently cancelled, the thread it is bound to needs to reset its tracing status before interpreting the next one. Unfortunately, the canceller invokes the finalizer rather than the cancellee, so that thread is left in an inconsistent state. My solution to this was to have each thread reset its tracing status before it begins interpreting a new IO, but there is no well-defined entrypoint for a brand new IO.
  • Bracket: bracket is expressed in terms of IO.Async, which is a bit problematic because it means that for fiber locality applications, asynchronous operations should be able to manipulate run-loop state (in this case, the trace storage). There needs to be some way to let IOBracket pass back the new state to the run-loop. I tried to augment the callback signatures a bit but it's a really invasive change. Note that the fiber-local state PoC I have open has this same problem, which I'm trying to address now. I could use a concurrent collection to make it work, but that's admitting defeat.
  • Asynchronous boundaries: the run-loop inserts asynchronous boundaries at many points, we have to be careful to update the ThreadLocal accordingly.
  • JavaScript: not sure if ThreadLocal breaks scala.js builds


  • Macros/bytecode manipulation: inject code at well-defined points to capture a stack trace or an implicit source location like you mentioned. Not sure how to get the lexically-scoped property here.
  • Bytecode analysis: Given an instance of a lambda, using reflection, its straightforward to find out where in a class it is invoked. This strategy is implemented in Akka for some applications, and ZIO ported it for fiber tracing as well. One limitation is that we can only trace instructions that accept a lambda (Map/Async/FlatMap), and we also don't have much detail around what which combinator is actually being used. e.g. we couldn't find the source location for a start call (but the interpreter could certainly mention that it happened).
  • Decorating/subclassing IO: Something like SyncIO for a TracedIO, and have some switch between the two
  • Netty has a FastThreadLocal type that supposedly improves the performance of gets. However it requires a custom Thread type, so it may not be used if users supply a custom ExecutionContext-backed ContextShift

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment