Skip to content

Instantly share code, notes, and snippets.

@elliot42
Last active April 28, 2016 08:37
Show Gist options
  • Save elliot42/831b93883c9b33c1c53ddb855fab41cc to your computer and use it in GitHub Desktop.
Save elliot42/831b93883c9b33c1c53ddb855fab41cc to your computer and use it in GitHub Desktop.

Managing side effects and coupling

Any system that's not a pure calculation has effects: one thing causes another thing to happen, in causal temporal sequence.

There's four broad categories of how to do this:

  1. Synchronously and tightly coupled
  2. Synchronously and loosely coupled
  3. Asynchronously and tightly coupled
  4. Asynchronously and loosely coupled

Synchronous, tightly coupled effects

This is calling something directly, so that foo directly causes bar by calling it.

def foo()
  puts "0"
  bar()
end

def bar()
  puts "1"
end

This is very simple but tightly coupled code:

  1. foo necessarily needs to know about bar
  2. bar needs to happen directly after foo; foo cannot return until bar completes.

Code coupling is an obvious and well known problem: that in many cases foo should not know about / include any references to the concept of bar. But still, code often becomes tightly coupled because like this because foo needs to cause bar, and the easiest way is to hardcode it in there.

A more complex problem that occurs in synchronous, tightly coupled side-effecting code is that you end up with deeply connected call chains.

A deeply connected call chain looks like this:

def foo()
  puts "0"
  bar()
end

def bar()
  puts "1"
  baz()
end

def baz()
  puts "2"
end

This has a lot of complexity of jumping through many deep hoops of code. Remember in this example that bar and baz are not part of foo; they're just other separate things that are supposed to happen after foo. But foo itself cannot stand alone and return until all of its chain of effects completes.

This becomes especially significant when you consider failures and interruptions. Exceptions in bar and baz can break foo.

The fundamental vulnerability in this situation is that foo needs to directly reference bar, which can be both conceptually wrong and systemically a source of complexity and bugs. Synchronous decoupling mechanisms can solve part of this prblem.

Synchronous, loosely coupled effects

Suppose you want to have foo happen, and then have bar happen. There are at least two ways to do this in a decoupled manner, i.e where foo then bar happens, but without foo knowing about it. They both boil down to the caller having the secondary bar effect, but through a layer of indirection that isolates foo from bar.

Managers

The first way to get foo then bar, without foo knowing about bar, is to have a higher-level party coordinate their ordering:

def foo()
  puts "0"
end

def bar()
  puts "1"
end

def manager()
  foo()
  bar()
end

This is actually fairly clean, but requires defining a wrapper manager around everything that you want to compose. Furthermore manager becomes an ill-defined concept because it's just "everything that happens in the sequence"--imagine the test cases for manager, it's just the combination of all the cases from foo and bar, which is often a big unrelated mess of effects.

Also, technically speaking, this is sort of just moving the coupling into manager, which must know about everything (even though foo does not).

Observers/callbacks/events

The second major way to get synchronous, decoupled side effects is with a pattern that is called observers, callbacks or effects.

class MyCallback
  def call
    puts "1"
  end
end


# style 1: callbacks internal to `foo`
def foo(callbacks)
  puts "0"

  callbacks.each do |callback|
    callback.call()
  end
end

foo([MyCallback.new])


# style 2: callbacks external to `foo`
def foo()
  puts "0"
end

def baz(callbacks)
  foo()
  
  callbacks.each do |callback|
    callback.call()
  end
end

baz([MyCallback.new])

Although these look extremely similar, you can argue that style 1, where foo calls its effects even though it doesn't know what they are, is "dependency inversion:" foo knows it has dependencies, but it calls them in a generic manner without being hard-coupled to how exactly they work.

Style 2, on the other hand, could arguably be "inversion of control," where foo does not even know it has effects, both foo and the effects are triggered by the "framework" (baz in this case). In otherwords, the functions in style 2 do not call each other, they are all passively called by other code that is in control (which actually makes this structurally similar to a "Manager"), and it's this other code taking responsibility for the coordination that allows the callees themselves to not know about each other.

In effect, all of the decoupling mechanisms above are the same: all of the callbacks run without foo knowing anything about what they are or how they work.

Synchronous decoupling gets you very far

As we can see, there's some very straightforward and effective ways to sequence effects happening one after another, without each of the effects needing to know about or be defined in terms of each other. Where does this break down, or where would you ever need anything different?

The primary answer is in synchronicity. In all of the synchronous examples above, the chain of effects starts because some caller knows to call right now and set off the effects right now, essentially within the same call stack.

However, not all effects happen in the same known call stack as an existing caller. This is where we can shift to asynchronous code, and then see how that code can be tightly or loosely coupled.

Asynchronous, tightly coupled effects

Asynchronous code is not intrinsically decoupled code. A simple example is Sidekiq jobs:

class BarJob < ActiveJob::Base
  def perform
    puts "1"
  end
end

def foo
  BarJob.perform_later
end

Despite BarJob happening at a totally separate time, on potentially a totally separate machine, nevertheless foo had to directly know about it and call it.

Other types of async message-passing code can also have this coupling, e.g. an AJAX POST call despite being completely asynchronous still has to know and explicitly depend upon its callee.

Asynchronous, loosely coupled effects

Asynchronous code can be decoupled using just by keeping a single constraint in mind: bar happens in a completely process at a later time, and thus needs to be told or discover that foo happened.

This sounds a little general or mysterious, but it has a simple solution in practice. In practice, the most simple solution is for foo to make a durable record of itself, that bar will discover at a later time:

def foo
  puts "0"
  FooRecord.create!()
end

# style 1: "reactive"-code receives messages in a real-time
# evented/streaming fashion.  Assume you had some server that
# would receive and then respond to messages POSTed from `foo`
def bar
  receieve_messsage do |foo|
    puts "1"
  end
end


# style 2: "log"-style code handles log entries indefinitely later
# after they were written
# Assume you had some async job that would fire up every once in a while
# and read off the log of what had previously been written.
def bar
  FooRecord.each do |foo|
    puts "1"
  end
end

Note that you can do either style with an external framework to further decouple bar from how bar is triggered:

def foo
  puts "0"
  FooRecord.create!()
end

def bar
  puts "1"
end

def baz
  FooRecord.each do |foo|
    bar()
  end
end

You'll notice that in the async case, it's not that there's no coupling between foo and bar, but rather than foo knowing to cause bar, bar operates on its own and either waits for evidence that foo occured, or periodically looks up evidence about whether foo has occurred. Across process boundaries, you do seem to need these bits of communication/coordination, otherwise with no coordination then it's impossible for foo to cause bar directly or indirectly.

In a certain sense, asynchronous decoupling looks quite similar to synchronous decoupling, when the decoupling is done through inversion of control in both cases. In both sync decoupled and async decoupled, some other party/framework handles linking foo and bar, without them being hard-coded to cause each other. In the asynchronous case, additional machinery (the durable log entry) is required to reliably coordinate across processes, time and machines--the DB or disk storing the log serves as the coordinator between two otherwise completely separate processes.

It's possible to make logs in a variety of different ways, from very specific SQL-table-as-log entry, to generic log platform, etc., but fundamentally they're all just using storage to durably communicate/ coordination across time and process boundaries.

Asynchronous vs. synchronous

So if you can decouple both synchronous and asynchronous code, what's the difference between synchronous and asynchronous? For better or worse:

  • Asynchronous

    • More general; it can cover both synchronous and asynchronous use cases
    • Albeit at a higher machinery cost (although this machinery can largely be written once and reused--it just consists of writing a log entry, and iterating through log entries)
    • Perhaps more naturally resilient in a large distributed system since cross-process communication will naturally want to be durable and restart-proof
  • Synchronous

    • More simple and default by far
    • Less machinery
    • Potentially more vulnerable to crashes if it's not constantly storing state and recovering from crashes

Finally, perhaps the most central difference is literally that some code cannot be written synchronously/"proactively", it must be written asynchronously/reactively, e.g. a server waiting to receive messages, because the reaction effects cannot occur straightforwardly in the same call stack as the caller.

In whatever these required cases may be, then clearly one would have to switch over to writing the best async code one could. In other cases perhaps then the safer default would be to stick with synchronous code, unless for example one was in a programming environment that was default/pervasively async and concurrent--Go and Erlang certainly are built for first-class idiomatic usage of concurrent CSP as the standard programming paradigm. Ruby of course is not.

The cases above should broadly cover the landscape of common possibilities, so hopefuly this makes the decision matrix a little simpler moving forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment