Skip to content

Instantly share code, notes, and snippets.

View djspiewak's full-sized avatar

Daniel Spiewak djspiewak

View GitHub Profile

Thread Pools

Thread pools on the JVM should usually be divided into the following three categories:

  1. CPU-bound
  2. Blocking IO
  3. Non-blocking IO polling

Each of these categories has a different optimal configuration and usage pattern.

Quick Tips for Fast Code on the JVM

I was talking to a coworker recently about general techniques that almost always form the core of any effort to write very fast, down-to-the-metal hot path code on the JVM, and they pointed out that there really isn't a particularly good place to go for this information. It occurred to me that, really, I had more or less picked up all of it by word of mouth and experience, and there just aren't any good reference sources on the topic. So… here's my word of mouth.

This is by no means a comprehensive gist. It's also important to understand that the techniques that I outline in here are not 100% absolute either. Performance on the JVM is an incredibly complicated subject, and while there are rules that almost always hold true, the "almost" remains very salient. Also, for many or even most applications, there will be other techniques that I'm not mentioning which will have a greater impact. JMH, Java Flight Recorder, and a good profiler are your very best friend! Mea

Understanding Comparative Benchmarks

I'm going to do something that I don't normally do, which is to say I'm going to talk about comparative benchmarks. In general, I try to confine performance discussion to absolute metrics as much as possible, or comparisons to other well-defined neutral reference points. This is precisely why Cats Effect's readme mentions a comparison to a fixed thread pool, rather doing comparisons with other asynchronous runtimes like Akka or ZIO. Comparisons in general devolve very quickly into emotional marketing.

But, just once, today we're going to talk about the emotional marketing. In particular, we're going to look at Cats Effect 3 and ZIO 2. Now, for context, as of this writing ZIO 2 has released their first milestone; they have not released a final 2.0 version. This implies straight off the bat that we're comparing apples to oranges a bit, since Cats Effect 3 has been out and in production for months. However, there has been a post going around which cites various compar

Git DMZ Flow

I've been asked a few times over the last few months to put together a full write-up of the Git workflow we use at RichRelevance (and at Precog before), since I have referenced it in passing quite a few times in tweets and in person. The workflow is appreciably different from GitFlow and its derivatives, and thus it brings with it a different set of tradeoffs and optimizations. To that end, it would probably be helpful to go over exactly what workflow benefits I find to be beneficial or even necessary.

  • Two developers working on independent features must never be blocked by each other
    • No code freeze! Ever! For any reason!
  • A developer must be able to base derivative work on another developer's work, without waiting for any third party
  • Two developers working on inter-dependent features (or even the same feature) must be able to do so without interference from (or interfering with) any other parties
  • Developers must be able to work on multiple features simultaneously, or at lea

Schedulers

As Cats Effect is a runtime system, it ultimately must deal with the problem of how best to execute the programs which are defined using its concrete implementation (IO). Fibers are an incredibly powerful model, but they don't map 1:1 or even 1:n with any JVM or JavaScript construct, which means that some interpretation is required. The fashion in which this is achieved has a profound impact on the performance and elasticity of programs written using IO.

This is true across both the JVM and JavaScript, and while it seems intuitive that JavaScript scheduling would be a simpler problem (due to its single-threaded nature), there are still some significant subtleties which become relevant in real-world applications.

JVM

IO programs and fibers are ultimately executed on JVM threads, which are themselves mapped directly to kernel threads and, ultimately (when scheduled), to processors. Determining the optimal method of mapping a real-world, concurrent application down to kernel-level thr

Fibers

Fibers are an abstraction over sequential computation, similar to threads but at a higher level. There are two ways to think about this model: by example, and abstractly from first principles. We'll start with the example.

(credit here is very much due to Fabio Labella, who's incredible Scala World talk describes these ideas far better than I can)

Callback Sequentialization

Consider the following three functions

A Study in Multi-Point Deadlocks

Deadlocks are extremely difficult to reason about sometimes. We are used to thinking about them in terms of contention over shared resources, with the pair of exclusive locks being a good and relatively canonical example of this phenomenon. However, sometimes you can find yourself in deadlock scenarios which are caused not so much by an improper sequencing of exclusivity, but rather by insufficient buffer capacity!

These kinds of scenarios are a lot rarer and much more difficult to diagnose and describe, which is why I found this particular puzzle so incredibly fascinating. The following is a screenshot from Cities Skylines (I added textual markers and arrows to make things easier to follow). All vehicles pictured are stationary and unable to move, indefinitely:

Do you see the deadlock? It took me a bit to understand it, but this situation can and does arise in software resource contention where it is dramatically harder to conc

What follows are some of my (very) rough thoughts on what we can and should do with respect to CPS transformation in Scala at the language level. I'll try to start with some motivation behind my thinking, as well as some rambling observations on the nature of the problem space, but don't expect too much coherence here. :-)

The Problem

Async programming is hard.

Okay let's actually be more specific than that. High-performance I/O is hard. Signal multiplexing is a powerful technique for achieving high(er) performance I/O, particularly network I/O, but the tradeoff is that, in order to utilize it, the user-space programming model must allow for suspension and resumption of sequential continuations (often called "fibers" or "coroutines"). Achieving this type of programming model without significant tradeoffs in usability is what is exceptionally hard.

If that wasn't bad enough though, these problems are inextricably conflated with another set of problem spaces which are, themselves, very difficult. In

/*
* There are two fundamental modes here: sequential and parallel. There is very little overlap
* in semantics between the two apart from the submission side. The whole thing is split up into
* a submission queue with impure enqueue and cancel functions which is drained by the `Worker` and an
* internal execution protocol which also involves a queue. The `Worker` encapsulates all of the
* race conditions and negotiations with impure code, while the `Executor` manages running the
* tasks with appropriate semantics. In parallel mode, we shard the `Worker`s according to the
* number of CPUs and select a random queue (in impure code) as a target. This reduces contention
* at the cost of ordering, which is not guaranteed in parallel mode. With sequential mode, there
* is only a single worker.
@State(Scope.Thread)
class FilesBenchmark {
private[this] var target: Path = _
@Setup
def setup() = {
val file = File.createTempFile("fs2-benchmarks-", "-walk")
file.delete()
file.mkdir()