Any system that's not a pure calculation has effects: one thing causes another thing to happen, in causal temporal sequence.
There's four broad categories of how to do this:
- Synchronously and tightly coupled
- Synchronously and loosely coupled
- Asynchronously and tightly coupled
- Asynchronously and loosely coupled
This is calling something directly, so that foo
directly causes
bar
by calling it.
def foo()
puts "0"
bar()
end
def bar()
puts "1"
end
This is very simple but tightly coupled code:
foo
necessarily needs to know aboutbar
bar
needs to happen directly afterfoo
;foo
cannot return untilbar
completes.
Code coupling is an obvious and well known problem: that in many cases
foo
should not know about / include any references to the concept of
bar
. But still, code often becomes tightly coupled because like
this because foo
needs to cause bar
, and the easiest way is to
hardcode it in there.
A more complex problem that occurs in synchronous, tightly coupled side-effecting code is that you end up with deeply connected call chains.
A deeply connected call chain looks like this:
def foo()
puts "0"
bar()
end
def bar()
puts "1"
baz()
end
def baz()
puts "2"
end
This has a lot of complexity of jumping through many deep hoops of
code. Remember in this example that bar
and baz
are not part of
foo
; they're just other separate things that are supposed to happen
after foo
. But foo
itself cannot stand alone and return until
all of its chain of effects completes.
This becomes especially significant when you consider failures and
interruptions. Exceptions in bar
and baz
can break foo
.
The fundamental vulnerability in this situation is that foo
needs
to directly reference bar
, which can be both conceptually wrong
and systemically a source of complexity and bugs. Synchronous
decoupling mechanisms can solve part of this prblem.
Suppose you want to have foo
happen, and then have bar
happen.
There are at least two ways to do this in a decoupled manner, i.e
where foo
then bar
happens, but without foo
knowing about it.
They both boil down to the caller having the secondary bar
effect,
but through a layer of indirection that isolates foo
from bar
.
The first way to get foo
then bar
, without foo
knowing about
bar
, is to have a higher-level party coordinate their ordering:
def foo()
puts "0"
end
def bar()
puts "1"
end
def manager()
foo()
bar()
end
This is actually fairly clean, but requires defining a wrapper manager
around everything that you want to compose. Furthermore manager
becomes an
ill-defined concept because it's just "everything that happens in the
sequence"--imagine the test cases for manager
, it's just the combination of
all the cases from foo
and bar
, which is often a big unrelated mess of
effects.
Also, technically speaking, this is sort of just moving the coupling
into manager
, which must know about everything (even though foo
does
not).
The second major way to get synchronous, decoupled side effects is with a pattern that is called observers, callbacks or effects.
class MyCallback
def call
puts "1"
end
end
# style 1: callbacks internal to `foo`
def foo(callbacks)
puts "0"
callbacks.each do |callback|
callback.call()
end
end
foo([MyCallback.new])
# style 2: callbacks external to `foo`
def foo()
puts "0"
end
def baz(callbacks)
foo()
callbacks.each do |callback|
callback.call()
end
end
baz([MyCallback.new])
Although these look extremely similar, you can argue that style 1,
where foo
calls its effects even though it doesn't know what they
are, is "dependency inversion:" foo
knows it has dependencies,
but it calls them in a generic manner without being hard-coupled
to how exactly they work.
Style 2, on the other hand, could arguably be "inversion of control,"
where foo
does not even know it has effects, both foo
and
the effects are triggered by the "framework" (baz
in this case).
In otherwords, the functions in style 2 do not call each other,
they are all passively called by other code that is in control
(which actually makes this structurally similar to a "Manager"),
and it's this other code taking responsibility for the coordination
that allows the callees themselves to not know about each other.
In effect, all of the decoupling mechanisms above are the same: all of
the callbacks run without foo
knowing anything about what they are
or how they work.
As we can see, there's some very straightforward and effective ways to sequence effects happening one after another, without each of the effects needing to know about or be defined in terms of each other. Where does this break down, or where would you ever need anything different?
The primary answer is in synchronicity. In all of the synchronous examples above, the chain of effects starts because some caller knows to call right now and set off the effects right now, essentially within the same call stack.
However, not all effects happen in the same known call stack as an existing caller. This is where we can shift to asynchronous code, and then see how that code can be tightly or loosely coupled.
Asynchronous code is not intrinsically decoupled code. A simple example is Sidekiq jobs:
class BarJob < ActiveJob::Base
def perform
puts "1"
end
end
def foo
BarJob.perform_later
end
Despite BarJob
happening at a totally separate time, on potentially
a totally separate machine, nevertheless foo
had to directly know
about it and call it.
Other types of async message-passing code can also have this coupling, e.g. an AJAX POST call despite being completely asynchronous still has to know and explicitly depend upon its callee.
Asynchronous code can be decoupled using just by keeping a single constraint
in mind: bar
happens in a completely process at a later time, and thus needs
to be told or discover that foo
happened.
This sounds a little general or mysterious, but it has a simple
solution in practice. In practice, the most simple solution
is for foo
to make a durable record of itself, that bar
will discover at a later time:
def foo
puts "0"
FooRecord.create!()
end
# style 1: "reactive"-code receives messages in a real-time
# evented/streaming fashion. Assume you had some server that
# would receive and then respond to messages POSTed from `foo`
def bar
receieve_messsage do |foo|
puts "1"
end
end
# style 2: "log"-style code handles log entries indefinitely later
# after they were written
# Assume you had some async job that would fire up every once in a while
# and read off the log of what had previously been written.
def bar
FooRecord.each do |foo|
puts "1"
end
end
Note that you can do either style with an external framework
to further decouple bar
from how bar
is triggered:
def foo
puts "0"
FooRecord.create!()
end
def bar
puts "1"
end
def baz
FooRecord.each do |foo|
bar()
end
end
You'll notice that in the async case, it's not that there's no
coupling between foo
and bar
, but rather than foo
knowing to
cause bar
, bar
operates on its own and either waits for evidence
that foo
occured, or periodically looks up evidence about whether
foo
has occurred. Across process boundaries, you do seem to need
these bits of communication/coordination, otherwise with no
coordination then it's impossible for foo
to cause bar
directly or
indirectly.
In a certain sense, asynchronous decoupling looks quite similar to
synchronous decoupling, when the decoupling is done through inversion
of control in both cases. In both sync decoupled and async decoupled,
some other party/framework handles linking foo
and bar
, without
them being hard-coded to cause each other. In the asynchronous case,
additional machinery (the durable log entry) is required to reliably
coordinate across processes, time and machines--the DB or disk storing
the log serves as the coordinator between two otherwise completely
separate processes.
It's possible to make logs in a variety of different ways, from very specific SQL-table-as-log entry, to generic log platform, etc., but fundamentally they're all just using storage to durably communicate/ coordination across time and process boundaries.
So if you can decouple both synchronous and asynchronous code, what's the difference between synchronous and asynchronous? For better or worse:
-
Asynchronous
- More general; it can cover both synchronous and asynchronous use cases
- Albeit at a higher machinery cost (although this machinery can largely be written once and reused--it just consists of writing a log entry, and iterating through log entries)
- Perhaps more naturally resilient in a large distributed system since cross-process communication will naturally want to be durable and restart-proof
-
Synchronous
- More simple and default by far
- Less machinery
- Potentially more vulnerable to crashes if it's not constantly storing state and recovering from crashes
Finally, perhaps the most central difference is literally that some code cannot be written synchronously/"proactively", it must be written asynchronously/reactively, e.g. a server waiting to receive messages, because the reaction effects cannot occur straightforwardly in the same call stack as the caller.
In whatever these required cases may be, then clearly one would have to switch over to writing the best async code one could. In other cases perhaps then the safer default would be to stick with synchronous code, unless for example one was in a programming environment that was default/pervasively async and concurrent--Go and Erlang certainly are built for first-class idiomatic usage of concurrent CSP as the standard programming paradigm. Ruby of course is not.
The cases above should broadly cover the landscape of common possibilities, so hopefuly this makes the decision matrix a little simpler moving forward.