olivergeorge/clojure-spec-vs-type-checking.md

## clojure-spec-vs-type-checking.md

      
    Raw
  

              clojure-spec-vs-type-checking.md
            
          
    This is a bit of a thought exercise. I doubt it’s perfect and I’m hoping for opinions and corrections with the goal of a well reasoned practical approach.
Motivation...

One way to look at type declarations in a static language is as a test which picks up potential incompatible code paths. E.g. data passed is incompatible with code.
In static languages the effort to write the test is reduced by virtue of being declared inline with the code and inference allows a few annotations to permeate - having said that we can achieve a similar results in Clojure.
Quick scan of tools at hand...


compiler analysis
pre/post function assertions
code assertions
generative testing
spec assertions
function specs
function instrumentation
spec based generative testing

The compilation process will throw warnings in some cases.
Coding in pre/post conditions and asserts has always been an option. It doesn’t help with writing test to exercise the function but does pick up cases where the code is exposed to something it isn’t intended to handle. Assert errors aren’t very informative themselves but they do show where it hurts.
The test.check library has been around for a while. It provides a way to generate random data for use in testing and where test fails it can attempt to find the simplest input which caused the error. There’s effort required to write tests which cover a full range of interesting inputs. It can be fiddly to ensure good code coverage. Testing is computationally intensive due to work required to generate data and the number of times code is executed.
Clojure spec builds on these ideas. It provides a way to describe the data which can be used to assert data is valid like pre/post assertions but with more informative errors (via instrumentation or s/assert). It can be used to generate data useful for testing, in fact it builds on test.check. It provides clojure.spec.test/check to exercise a function with generated arguments as part of testing. It can also replace functions allowing you to stub out side-effecting code to isolate code being tested.  All of this is implemented with reuse in mind. Once we describe our data and functions with specs we have a range of tools available to us.
So with these tools how should we write our “type” tests?
Goals...


Ensure data passed between functions is compatible
Ensure functions return expected valid data

First implementation attempt...


Write spec describing inputs and outputs of our functions
Use clojure.spec.test/check to look for bugs

Without instrumentation we don’t get checking & errors when passing bad data. Tests will fail if data generated causes an exception or invalid return values are produced.
Second attempt...


Write spec describing inputs and outputs of our functions
Turn on instrumentation
Use clojure.spec.test/check to look for bugs

Now our calls are checked and reported. We still have challenges getting coverage of all code paths and potentially a lot of code is being executed aside from the function being tested. Any side effecting code is going to be a complication to setting up tests and getting repeatable errors.
Third attempt...


Write spec describing inputs and outputs of our functions
Turn on instrumentation and stub all side-effecting functions
Use clojure.spec.test/check to look for bugs

Now data passed to other functions are checked but side-effecting code is not executed. Instead a random return value is generated in place of those calls. This avoids complications associated with side-effecting code.
Additional challenges...


Some data types hard to generate - computationally intensive .
Some data types are hard to express - in code

Notes...

Calling side-effecting functions. Defining a spec and stubbing them out works. If it is third party code then consider adding an interop namespace to isolate the code and providing a place to hook up specs.
Working with higher order functions. Passing immutable data around is easy but passing functions is trickier. There are spec features for describing anonymous functions but generating them is a bit limited (in my limited experience)
What would be cool...

A service which tracked what tests have already been run and a way to only run generative tests for the bits which might have changed. This would be more efficient and opens up the idea of pushing testing cycles to other resources (not my laptop)
Ways to make generators smarter. Goal being ensuring function is tested with a good range of data and ensuring good code coverage.
A way to check code coverage as part of generative testing.
Being able to stub branch statements like “if” so that both paths can be exercised without “getting lucky”... some branches require very specific data to be generated.
Using specs in static analysis to pick up problems without needing to generate examples. Implies fancy inference.  Requires someone willing to take pure type inference ideas and adapt them to an impure predicate based world - statistical or imperative type soundness? I’m guessing. Seems like there is a PhD in this but I am not an academic .
IDE features which use specs to guide the developer - warn when args violate function args spec, hover over symbol to see spec, suggest specs for functions...
Efficient data generation for clojurescript. Complex specs crash my tests. Max call stack exceeded. (in my limited experience)
IDE affordances. Since specs are intentionally decoupled from function implementation it’s harder to see the code and spec at the same time or work on both. If you don’t have tests running then specs can easily fall out of date. No doubt discipline help but...