bhb/blockchain-w-spec.md

## blockchain-w-spec.md

      
    Raw
  

              blockchain-w-spec.md
            
          
    Building a blockchain, assisted by Clojure spec

In an effort to gain at least a superficial understanding of the technical implementation of cryptocurrencies, I recently worked my way through "Learn Blockchains by Building One" using Clojure.
This was a good chance to experiment with using spec in new ways. At work, we primarily use spec to validate our global re-frame state and to validate data at system boundaries. For this project, I experimented with using instrumentation much more pervasively than I had done elsewhere.
This is not a guide to spec (there are already many excellent resources for this). Rather, it's an experience report exploring what went well, what is still missing, and quite a few unanswered questions for future research. If you have solutions for any of the problems I've presented, please let me know!
You don't need to know or care about blockchains to understand the code below - I hope the general lessons apply regardless of the specifics of my implementation. The full source code is available on GitHub.
Beyond using spec instrumentation, I attempted to keep my development workflow pretty much the same. I've written about my workflow in detail elsewhere but the summary is: I do REPL-driven-development in Emacs+CIDER, and then separately write a mix of example-based tests and property-based tests that run whenever I save my files.
Libraries of note:

org.clojure/spec.alpha of course
com.jakemccrary/lein-test-refresh to auto-run tests
io.aviso/pretty for pretty-printing of test failures
com.gfredericks/test.chuck for the excellent generative testing macros and clojure.test integration
orchestra to instrument args and return types (clojure.spec only checks args by default during instrumentation. I've written previously about why I prefer to enable "complete" instrumentation)
expound to pretty-print spec errors

Writing specs

In my implementation, blockchain is a map containing nodes, transactions, and a sequence of blocks called the "chain". Don't worry about the specifics, but the general shape of this data is a map with required keys:
(s/def :bc/chain (s/coll-of :bc/block :kind vector?))

(s/def :bc/node :bc/sha)
(s/def :bc/nodes (s/coll-of :bc/node :kind set?))

(s/def :bc/bc (s/keys :req [:bc/transactions
                            :bc/nodes
                            :bc/chain]))
I'm still experimenting with how to best name and namespace specs. There's a tension between long names which are unique and descriptive but tedious to type and to read when debugging vs short names which have the opposite properties (as you can see, I've gone with succinct abbreviations like "bc" instead of "blockchain"). I could use keywords like ::bc to put them in the current namespace, but that doesn't really help with the verbosity when printing them out. In any case, I find myself manually creating namespaced keywords to avoid tying the specs to the code namespaces, which have an independent set of considerations for naming.
Leaving aside the question of whether I'm being too concise with the names, I'm also not sure about how best to name related properties and the entity itself. Spec names like :bc/nodes and :bc/transactions seem clear to me, but how to name the blockchain itself? I went with :bc/bc here, but it's redundant. Would :bc/entity be better? :bc/ent? Or should I introduce another namespace (e.g. foo) so the entity is less nested than its components?
(s/def :foo/bc (s/keys :req [:foo.bc/transactions
                             :foo.bc/nodes
                             :foo.bc/chain]))
Collections are a bit awkward in spec. I had a few coll-of specs that ended up actually being sets or vectors. When looking at the spec, I found the (s/coll-of :bc/node :kind set?) to be fairly verbose and requires scanning the entire line. Special macros for vec-of and set-of would read much more clearly IMO. I briefly attempted to write the macros, but it doesn't appear possible without relying on spec internals.
fdef

Writing the fdef directly above the function made it simple to locate the spec and was useful documentation.
(s/fdef valid?
        :args (s/cat :bc :bc/bc)
        :ret boolean?)
(defn valid? [bc]
  ,,,)
I like the power and flexibility of fdef, especially that you can spec functions you did not write. However, when speccing my own functions, most of the time I had a simple list of args with no optional args. fdef seemed unnecessarily verbose here, plus I needed to eval the fdef form, the defn form, and invoke (s/instrument) just to get an instrumented function I could try. In the future, I'll try to write a more succinct macro for this common case. Perhaps something like:
(fdefn valid? [:bc/bc] boolean?
  ,,,)
Or perhaps borrow some syntax from Schema's defn?
(fdefn foo :- boolean?
 [bc :- :bc/bc]
 ,,,)
I suspect that this will be a fairly common request - I do worry a bit that without a concise macro in the spec library for this common case, we'll see several competing versions of essentially the same macro.
Navigation

I'm looking forward to editors becoming more spec-aware. I understand that Cursive can already "jump to definition" for fully-qualified keywords (and therefore specs), which would have been really helpful. Also, showing the spec definition when I place my cursor on a keyword will be really useful. I'm on a slightly old version of CIDER, so perhaps some of this already works.
Instrumentation

In core.clj, I added a comment block
(comment
  (require '[orchestra.spec.test :as st])
  (s/check-asserts true)
  (set! s/*explain-out* expound/printer)
  (st/instrument))
When I booted by REPL, I'd execute the first three lines. However, every time I added or modified an fdef, I needed to re-invoke (st/instrument), which was somewhat tedious (and I forgot a few times as well). Perhaps my workflow can be improved with an editor shortcut?
For the tests, I set up a fixture:
(defn instrument [f]
  (set! s/*explain-out* expound/printer)
  (st/instrument)
  (f))

(use-fixtures :once instrument)
Unsurprisingly, instrumentation did what I'd expect: helped me quickly identify errors I'd made, for instance when I accidentally passed a function instead of passing the result of calling that function (whoops). This was useful at the REPL, but even more so for tests, since often an apparently unrelated test would break, and instrumentation errors would help me quickly identify the issue.
On the other hand, sometime instrumentation would cause tests to fail that would not have otherwise. For instance, many of my functions had the blockchain map as the first parameter. When I added a new required key, all old tests would break, even if the functions didn't use the new key.
Several possible solutions:

Update the tests (which is what I did). Arguably the tests now match the real world more closely.
Stop using the common blockchain spec for each function. Rather, declare the subset of necessary keys for each function.
Define some common required set of keys, then use s/merge to add new required keys for functions that need them.

Generative tests

I didn't use check for this project, for a few reasons:

I haven't yet come up with a way to integrate check into clojure.test in a way I like
I haven't started using check during REPL-driven development to quickly try out functions.
It's tricky to generate valid blockchains, so I think check would mostly amount to fuzz testing - just sending bad data to functions to see if they would error, and I wasn't worried about making this toy implementation bulletproof. Then again, this may be a case of an "unknown unknown" - perhaps there are really interesting bugs that check would have discovered, if I had tried.

Nonetheless, instrumentation enabled a different kind of generative test, one in which I could run arbitrary sequences of operations on a blockchain and then assert properties about the result. For instance:
(s/def :bc/ops #{`bc/add-tx `bc/mine-fast `bc/add-node})

(deftest test-block-ops
  (checking
   "all ops result in valid blockchain"
   10
   [;; Generate a random sequence of operations
    ops (s/gen (s/coll-of :bc/ops))
    ;; Grab ':args' specs defined in `fdef`, generate random args
    op-args (apply gen/tuple (map #(s/gen (:args (s/spec %))) ops))
    :let [op+args (map vector ops op-args)
          ;; Apply all operations
          result (reduce
                  (fn [bc [op args]]
                    (apply @(resolve op)
                           bc
                           (rest args)))
                  (blockchain)
                  op+args)]]
   ;; Check that resulting blockchain is valid
   (is (s/valid? :bc/bc result)
       (expound/expound-str :bc/bc result))))
(Using s/gen to generate data from a spec is really powerful and, to my eyes, often more readable than the equivalent test.check generator)
I wrote a few generative tests with the same pattern:

for all op sequences, result is valid blockchain (see above)
for all op sequences, money balances
for any op sequences to a set of blockchain nodes, you can get consistent resolution by "connecting" all nodes (unless two blockchains happen to be of equal length)

My generative tests are still very hard to read and when they fail, it's often hard to understand why. Just like example-based tests, writing good generative tests takes practice, but I think that spec can encourage the community as a whole to invest more time to become skilled at generative testing.
Summary

I learned quite a bit with this exercise. Spec provides powerful tools with instrumentation and generative testing. I suspect the community will continue to build on these tools with additional best practices, libraries, and editor support, so I'm quite excited about the future.