sviperll/anti-if-case.markdown

## anti-if-case.markdown

      
    Raw
  

              anti-if-case.markdown
            
          
    On layered architecure and naturalness of internal data-types

After reading Make Everything The Same
post by Sandi Metz I've started thinking about one of my own experiences with software design.
During the evolution of one system I've faced a challenge of extending billing subsystem.
I've actually spent couple of days thinking about the problem and this process in foresight seems very similar
to one described be Sandi.
At first everything seemed clear about problem, but I hated conditionals that have sprung into life.
This raise of conditionals seemed both inevitable, but somehow artificially inflicted.
I felt that I just miss something.
I'd like to stress out that my problem is totally different than outputing roman numerals.
And like mentioned in
one of reddit comments
roman numerals is not really interesting problem in itself.
But somehow I feel striking similarities between our thinking process.
I've planed to write about it long time ago, but was never comfortable with the moral of the story.
I've felt that there should be some general principle or lesson that can be infered.
But I've never felt comfortable with an explanation that I was able to come up with.
Having the context of Sandi's example I'd like to explain my encounter with this design idea
and to reflect on possible interpretation of this and possible general rules that can be infered.
My Case

Here is my problem. We had a customer facing system with billing.
System was run in pre-paid mode.
Every customer purchased a package of internal expendable units.
A price of expendable units was determined at an event of purchase and was actually dependant on package-size.
After purchase customer spends internal units in the application to get required functionality.
An act of spending some quantity of internal units was written into database as a single expense record.
Corresponding table was defined like this
    CREATE TABLE expenses (
        id INTEGER,
        user_id INTEGER,
        ...
        n_units INTEGER,
        price_in_cents INTEGER,
        ...
        created_at DATETIME
        ...
    )
Price_in_cents in the above declaration is a known monetary price.
Having such a record we can visualize user's expenses per calendar period and create corresponding financial documents.
Now one of big customers demanded a post-paid mode of operation.
He can spend reasonably limited amount of internal units during the month and will be billed at the end of the month.
The take away from the above requirements is that a price of an internal unit is not known at the time of spending and
is determined post-factum at the end of the month.
Now the question is how to modify above table to store such expenses.
At first I was thinking about adding a flag to mark expenses as either known price or unknown price like this.
    CREATE TABLE expenses (
        id INTEGER,
        user_id INTEGER,
        ...
        n_units INTEGER,
        price_in_cents INTEGER,
        ...
        created_at DATETIME,
        ...
        is_price_known INTEGER
    )
Another option is to store some special price like -1 to mark records with unknown price.
But the result of this addition was an introduction of conditionals into code.
Special price value is specially notorious since it is easy to miss places where special value is not handled properly.
But the solution that I've come with is different.
Instead of introducing some predicate on expense record that results in conditionals springing into code
I've added another quantity column. Here is a table that I've finally used.
    CREATE TABLE expenses (
        id INTEGER,
        user_id INTEGER,
        ...
        known_price_n_units INTEGER,
        price_in_cents INTEGER,
        ...
        created_at DATETIME,
        ...
        unknown_price_n_units INTEGER
    )
The cool thing about this solution is that processing of expenses records is extended, but not
split between two modes of operation.
Before adding post-payment feature you can calculate per month expenses like this
    SELECT
        user_id,
        SUM(n_units * price_in_cents) AS sum,
        SUM(n_units) AS n_units,
        AVG(price_in_cents) AS price_in_cents
    FROM expenses
    WHERE user_id = <user_id>
    GROUP BY YEAR(created_at), MONTH(created_at)
After the change your query changes like this
    SELECT
        user_id,
        SUM(known_price_n_units * price_in_cents + unknown_price_n_units * <predicted_price>) AS sum,
        SUM(known_price_n_units) AS known_price_n_units,
        AVG(price_in_cents) AS price_in_cents,
        SUM(unknown_price_n_units) AS unknown_price_n_units
    FROM expenses
    WHERE user_id = <user_id>
    GROUP BY YEAR(created_at), MONTH(created_at)
Having a flag you'll need to write CASE-expressions to extract relevant information from each record.
Analysis

There is another recent blog post
about avoiding conditional statements and problems that araise with them.
And there is a lot of wisdom in being suspicious about conditionals.
But I'd like to point out to another aspect of these design problems.
I think this aspect is the reason why such solutions seem surprising or unnatural
and the reason why they give birth to blog posts.
Solution adopted by me have something unnatural.
The unnatural thing is that each and every record will have only one of n_units attributes set to nonzero value.
An expense is either pre-paid of post-paid, but never mixed.
However, defined data-type allows mixed (both pre-paid and post-paid) records to be introduced and
this "feature" is never actually used. And this seem unnatural.
The same thing can be said about Sandi's roman numerals example.
Sandi's code cope with numerals like VIIII wich is "wrong" roman numeral and that's why this solution seems
unnatural and somehow hard to arrive at.
I think the first time I've encoutered such design is with programming language compilers.
Most of compilers employ some kind of intermidiate language.
Intermidiate language is usually used during optimization phase.
A number of transformations are performed on intermidiate laguage before
generation of actual machine code.
And the thing with intermidiate language is that it can express programs that are inexpressible in
surface language. This seems similary unnatural at first.
But it's turtles all the way down.

But if we stick with programming language example we can see that
every layer allows programs inexpressible in an above layer.

Some untypes programs are not allowed by type-checker.
Some programs in intermidiate language are not expressible in surface language.
You can not generate all possible machine-code programms from intermidiate language.

What makes intermidiate laguage special here is that both surface language and machine-code language
are both rigorously specified, but intermidiate language is mostly an arbitrary compiler construction
suited for internal compiler needs and this is the only difference.
Wrong roman numerals, like VIIII can be considered an intermidiate language suited for roman numerals code.
Mixed pre-post-paid expences records can be likewise considered an intermidiate laguage for expenses processing code.
And here, I think, an actual pattern lies.
Here is my take on a rule of layered architecture.

Every layer should be more powerful than demanded from above layer

or inverse

Every layer should restrict functionality provided by below layer

Above is the list of examples for this rule at work. But why should it be like this.
The point of layered architecture is to simplify implementation.
And the way it is done is by relaxing some requirements and/or assumptions to make layer implementation simpler.
If we relax some requirements we will arrive at an implementation for more general rules than required.
To implement actual requirements we restrict functionality provided by below layer to follow missing rules.
We can interate this process to get a right number of layers and to have each layer implementation
reasonably simple.
If we do not restrict/generalize from layer to layer than we can't simplify lower-layer implementations.
In such case an above layer is just a faqade for a lower layer that does nothing of it's own and
all the complexity creeps to a lower layer.
Here is a good blog post about multiple levels of abstraction
that presents some of the above ideas.
I've through my career encountered many unnatural manifistation of this rule.
I've implemented a social network which engine allows to express social relationships that are not available through presented UI.
And this again allowed to simplify this engine implementation.
Another way to look at this process is mathematics.
What mathematics does is it looks at some messy real world "implementation" and
tries to identify some rules that can be relaxed and omitted to gain simple formal system (lower layer).
Next step for mathematics is to get known formal system and to inflict a set of alternative rules than those used in real world
and see what can be archieved.
Having the same principle underlie you layered architecture allows you to do the same.
You can easily iterate on the above layer to prototype some ideas.
Having social network engine to be more powerfull than it's GUI allows easy iteration on GUI.
What should the original two examples teach us here?
I think we should get comfortable with custom internal data-types and
have internal data-types suit our implementation.
The role of internal data-types or intermidiate languages is an interface between layers and as such they
should follow the same restrict/generalize dynamics.
To paraphrase someone

There is nothing natural about natural numbers. It's all construction.