Skip to content

Instantly share code, notes, and snippets.

@djspiewak
Created December 12, 2020 19:21
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save djspiewak/78e7b607502b18e04365e0abd558101b to your computer and use it in GitHub Desktop.
Save djspiewak/78e7b607502b18e04365e0abd558101b to your computer and use it in GitHub Desktop.

A Study in Multi-Point Deadlocks

Deadlocks are extremely difficult to reason about sometimes. We are used to thinking about them in terms of contention over shared resources, with the pair of exclusive locks being a good and relatively canonical example of this phenomenon. However, sometimes you can find yourself in deadlock scenarios which are caused not so much by an improper sequencing of exclusivity, but rather by insufficient buffer capacity!

These kinds of scenarios are a lot rarer and much more difficult to diagnose and describe, which is why I found this particular puzzle so incredibly fascinating. The following is a screenshot from Cities Skylines (I added textual markers and arrows to make things easier to follow). All vehicles pictured are stationary and unable to move, indefinitely:

Do you see the deadlock? It took me a bit to understand it, but this situation can and does arise in software resource contention where it is dramatically harder to conceptualize and a lot less visual. Let's walk through this...

All of the vehicles involved in the deadlock are denoted with red letters. The ones which are critical to the deadlock are A through D, while two other vehicles affected by the deadlock but not participating in it are X and Y (note that I'm considering all of the cars to be a single vehicle, since they mostly behave as such). For the purposes of software analogies, you can think of each vehicle as an independent fiber.

The destinations for each of the relevant vehicles are marked with cyan numbers 1 through 5, with 1 through 3 being the most significant destinations for our problem space.

Hopefully the junctures are self-explanatory, and I didn't mark them. The most relevant ones in this scene are between 4 and 3, as well as the one between 1, 2, and 5. Also note the distances involved, as each train is a set length (similar to an atomic section of a fiber). Also for context, all railroad tracks pictured here are bidirectional and right-hand drive. In other words, all of these resources are asymmetrically biased depending on pairing of pathways.

With all of that context out of the way, let's look at the exact deadlock…

Scenario

Note that this is (intentionally) not minimal. I could have engineered something smaller, but it's fun to see the ancillary chaos which ensues.

  • Train A is attempting to go to destination 1
  • Train B is attempting to go to destination 2
  • Train C is attempting to go to destination 3
  • Train D is attempting to go to destination 2 (but it really doesn't matter)
  • Cars X are attempting to escape toward destination 5
  • Train Y is attempting to go to destination 4

Note that destinations 4 and 5 are, themselves, entirely free of contention! This whole mess is being caused by the contention on destination 3 and, unintuitively enough, the distance between the two railroad junctions. It is this last property that I find fascinating.

Analysis

The exact dependency chain here is as follows:

  • Train A is blocking train B
  • Train B is blocking train D
  • Train D is blocking train C
  • Train C is blocking train A (thus, system deadlock)
  • Train C is also blocking cars X and train Y

Clearly train C is the troublemaker here, but what is absolutely fascinating is the fact that we could fix this not by removing train C, but rather by lengthening the distance between the two railroad junctions!

Trains in Cities Skylines are a fixed length. This means that trains sitting at 3 (such as D) will always use a specific amount of space, and trains waiting to enter 3 (such as C) will behave similarly. 3 is a scarce resource since it can only unload a single train at a time, and thus we can conceptualize C as sitting in a buffer awaiting access to that resource. The buffer itself is a shared resource, since train Y would similarly need to control it in order to reach destination 4, but the more relevant thing is that the buffer is too small to accommodate bidirectional usage, as train A demonstrates.

The fact that the distance between the junctures is too short to accommodate the buffered train C means that A is unable to make forward progress, despite itself having no meaningful need for the resource represented by destination 3. This also amplifies the problems with the cars at X.

The Lesson

Of course, Cities Skylines is just a game with its own internal rules and structures, but this scenario is a great way to think about resource scarcity and starvation leading to deadlocks in real applications. It also points to how, sometimes, you can resolve a resource contention issue just by inserting a larger buffer (usually in the form of a Queue).

A simple example of this can be seen in the following program:

IO {
  IO.sleep(100.millis).unsafeRunSync()
}

This program demonstrates exclusive control of a scarce resource where, if the resource is sufficiently scarce, the result will be deadlock despite the fact that the program itself has an obvious linearization.

The resource in question here is the underlying Thread(s). Specifically, if this program is evaluated on a single-threaded IORuntime (such as ScalaJS), it will immediately deadlock since the inner unsafeRunSync() is holding the only thread, meaning that the sleep is unable to wake up the continuation necessary to complete the fiber and, thus, release the thread.

Much like how the distance between the railroad junctions is too short to accommodate more than one train at a time, so too is the number of threads in a JavaScript runtime too low to accommodate this blockage. In a very real sense, these kinds of situations are why we do asynchronous-style programs in the first place: threads are a very scarce resource on all runtimes, and we simply can't have enough of them to allow any meaningful amount of blocking.

We can abstract away from this lesson though and think about larger and more complicated situations (such as the railroad deadlock from my screenshot). My takeaway from this is that right-sizing your queues is incredibly important, and a properly sized network of buffers guarding your scarce resources can save you from deadlock in scenarios which would have halted all forward progress otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment