Some thoughts on building software
Lately I have been busy reading some new books on Domain Driven Design (DDD) and software architecture -- including a short yet eye-opening one in Python and a great one in F#. At the same time, it seems that more people in the Functional Programming world are looking at more formal approaches to modelling -- some examples here. This has brought some thought from the background of my brain about how we should model, organize, and architect software using the lessons we've learnt from functional programming.
Before moving on, let me be clear about this being just a dump of some thoughts, not always well-defined, definitely not always right. My goal is to get feedback from our broad community, spawn discussion, and hopefully end up with a bit more knowledge than before. So feel free to comment below!
Whenever I read about DDD, I feel like this is a great match for functional programming. Ideas like bounded contexts have a strong resemblance to domain specific languages, that is, the idea that instead of directly solving a problem, you should create a small language which makes it easier to talk and specify your solution. Racketeers have even coined the term language-oriented programming!
DDD also teaches us that we should distinguish between entities and value objects: the former have an inherent identity, whereas the latter do not. Value objects are immutable pieces of data, and we should consider any two value objects with the same data as equal. This sounds pretty much as the kind of data types we usually define in FP languages, right? In fact, I find quite interesting that in OOP languages entities are often easier to define -- reference comparison comes for free, after all -- and one needs to have enough will power to use value objects correctly; whereas in languages like Haskell value objects are the default, and entities are harder to implement. I would argue that the latter option is better anyway, since once databases enter the game one needs that special handling anyway.
At this point I always ask myself: OK, where do my awesome sum types enter the game? Since most books assume a OOP-like language, where they are not as directly available as in Haskell or OCaml, we have few examples on how to model using them. However, many value objects lean themselves to such a description; my favorite example is the set of events or commands that may arise in a React-like application, which are modelled as a big data type in Elm. Right now, this is where I think we should stop: modelling entities as sums seem somehow wrong, even though I cannot really express why.
Another important lesson from DDD is that we should think about our integrity boundaries: aggregates forces us to define which objects change together, the unit of work (UoW) pattern brings the idea of transaction to our models. UoW seems to be a contested topic in DDD, but the essence is that we should think hard about the guarantees we should have at each moment, and how to handle different consistency models when our systems become distributed.
Here is where I think that formal modelling could shed some light. The current state of affairs is that we develop models mostly on whiteboards, but never really explore or formalize them. Tools like Alloy are great to document those invariants, and figure out possible scenarios we hadn't thought of. You might think "hey, are you proposing to go back to waterfall?" Not at all! The fact that the model is documented means that we can update it whenever our understanding of the domain changes, and get clues about where our actual software needs to be updated. If your system works in a distributed fashion TLA+ can help you detecting possible race conditions, deadlocks, or breakages of eventual consistency. These two tools are examples of lightweight formal methods, which do not require a big learning investment.
Up to now I've discussed how DDD and FP have many things in common. Something which I feel is unique to the (typed) FP community is the treatment of effects, that is, the idea that we should not only care about properties of values but also of computations. The sharpest distinction can be found in Haskell, where pure and side-effectful values take completely different types (and even get different syntax!), but even there we often talk about making a more fine-grained hierarchy of effects. How should we translate this idea to our modelling table? Or is this something which does not belong to the model at all?
Modelling and coding
The role of models is to help us understand better the domains we are talking about. The end goal, however, is to produce a (working) software artifact. I firmly believe that you should choose a language which allows you to translate as many of the invariants as possible from your model into proper checks in your code.
For this matter, our FP community has come with several powerful techniques:
- I have already mentioned Racket as part of the Lisp tradition of creating linguistic abstractions to develop software.
- Clojure bundles a spec module to specify the structure of data and its invariants.
- Strong static types, as found in Haskell. Those can be taken even further with refinement types.
Even though in some communities we stress the importance of some abstractions like functors and monads, in the grand scheme of things those are one particular way to ensure effect tracking and the integrity of our data. For example, my colleagues at 47 Degrees working on Arrow use a completely different approach towards the same goal.
If you dive further in DDD, you will surely end up reading about hexagonal, onion, and ports and adapter architectures (spoiler: they are all variations on the same theme). At the other side we find the functional core, imperative shell (FCIS) architecture. If you are like me, you'll be very confused.
This is another place where the language and techniques of the typed FP community can help us understand what is going on, by talking about initial and final encodings. Really condensed, we can represent data and computation both as data types we construct and manipulate (initial encoding):
data Tree a = Leaf a | Node (Tree a) (Tree a) data StatefulComputation s a = Get (s -> StatefulComputation s a) | Set s (StatefulComputation s a) | Return a
or as a set of methods we can call (final encoding):
class Tree t where leaf :: a -> t a node :: t a -> t a -> t a class Monad m => Stateful s m where get :: m s set :: s -> m
It is quite common to use the initial style when thinking about data, and the final style when thinking about computations.
Architecting a whole software system using the initial approach looks pretty much as the FCIS architecture. Stealing an example from Architecture Patterns with Python, if we need to develop a system which performs some operations over our file system, we can divide the task into the logic which decides what to do and the part which actually performs the operations. To bridge those parts we have to reify the operations as data:
data FSOperation = Move Path Path | Copy Path Path | NewFolder Path
This data type is an initial encoding in disguise. The second part which performs the operations can be thought of as an interpreter for that data type.
Final encoding is related to dependency injection (DI). Passing a bunch of functions as arguments could be considered as a very primitive form of DI; the type class mechanism alleviates the need for manual handling and makes the functionality available wherever it's required.
Correctly used, both techniques lead to good modularity -- swapping means writing another interpreter or another instance -- which in turn leads to good testability -- you can easily create fake handlers which detect that the behavior is correct. Unfortunately, apart from the technical details, I am not aware of guidelines on when to use one approach over the other.
We need to talk about how to build software using "native FP" approaches -- most books and models seem to follow the older OOP tradition. Let's do it!