Skip to content

Instantly share code, notes, and snippets.

Created May 27, 2014 18:14
Show Gist options
  • Save anonymous/ac6e8703daf1ea378326 to your computer and use it in GitHub Desktop.
Save anonymous/ac6e8703daf1ea378326 to your computer and use it in GitHub Desktop.
The core of what the system is about is:
- Having rough/flexible data strutures
- Performing transformations on those data structures
- Finding the core "must be correct" structures we want to reason about
- Defining and naming those structures with relative precision,
hardening them into definite declarations
- Validating and verifying that input and output conform to those
now-hardened structures.
As the system started developing, it was all about pervasively flexible
maps, of varying degrees of requirements for the definitions of what was
in each map.
Now that we've got a somewhat-maturing system, that we want to be able
to apply rather large scale restructuring on, and we want to have
relatively reliable rule-based reasoning about it, we're coming up on
the problem of defining and validating those core data structures
and their transformations.
We have a couple of tools to fulfill these roles:
- maps are the basic, flexible, embryonic structure that are
fast and flexible to work with
- records that start to reify dynamically tuped tuples
with definite field sets that support abstract protocols
- prismatic/schema describes compound data structures at runtime
and allows you to define functions and validators in terms
of those structures. It is not "pervasive," in the sense of,
you don't need to annotate an entire namespace, and the validation
is not always active, and the validation is always in terms of
"does this one given value to conform to the schema?"
Another way to think about this is as a specific type of
runtime contracts library, where the contracts are structural.
- core.typed describes values and functions at compile time. It
is pervasive in the senses that a) you need to annotate an entire
namespace at a time, and its dependencies, b) the reasoning that
core.typed attempts to do is, like traditional static type checking,
attempting to reason _for all possible values of the given types_.
This provides stronger guarantees earlier, but in practice is
harder to build out, especially if your namespace is changing, or it
deals with flexible manipulations of complex maps.
In our system, we're applying these tools to:
- the backend record structures, which have been around for a little
while
- the web API, which we want both:
- validation that incoming EDN to the API is well structured
- testing that outgoing EDN is well structured
- and in general, that functional transformations we expect are indeed
still happening how we want them, even in the face of code change
(e.g. compile-time checking and test suite)
We've previously been accumulating a disorganized mishmash of concepts
and usage. This commit starts to rectify the situation:
1. It roughly reconciles the prismatic/schema entries with the core
defrecord entries in the `report` namespace. schema provides
a macro to literally define both in the same declaration.
2. It attempts to do the data transformation to/from the web API in
terms of these records and schemas
3. It starts adding some basic tests asserting that the transformation
functions work in terms of these structures.
This should give us a much stronger foundation to start standing on in
terms of solid data structures and data validation.
Futhermore, a few hard-won understandings came out of the process:
1. Nothing is more consise or more flexible than Clojure's built-in
map support. Both schema and core.typed add non-trivial structural
scaffolding that is, while useful for testing things are still
the way you want them, time-consuming and annoying to lay down for
everything, *especially* if you are trying to rapidly experiment
with, try out, move things around.
Everything should begin as maps, and persist that way for quite
a while, until you're *sure* that you want to solidify a structure.
2. schema is nontrivially easier and faster to use than core.typed,
and more flexible for common cases (e.g. structuring plain maps).
It is also, I think, more clearly written and documented. It also
allows for anonymous schemas, which can come in handy for say,
unit tests, without having to fully reify a schema.
3. However core.typed delivers a more complete reasoning structure
about the code AND importantly runs at compile time AND reinforces
the important point that *the correctness of the code should be able
to be reasoned about at comile time*. It is pretty good for
most data types, with the most complex being complex manipulations
of heterogenous maps, which unfortunately makes up a lot of
common Clojure code before the maps get reified into records.
What ti
4. In terms of laying down initial security layers, I would recommend
small sets of plain unit tests
and then simple prismatic/schema
validators. Both of these only check *small sets of cases*, but they
do it easily. schema gives you a more clearly structured way to
define and validate a structure (reusable for a number of cases),
which is very good, BUT does require that you reify a struture,
which quickly leads to a proliferating numbers of closely-related
schemas for non-essential data structures. This is bad. You want
to keep the number of named concepts low and powerful.
6. The new generative testing tools (test.generative and test.check)
may be useful in conjunction with schema--since they will generate
a lot of domain check data, and then you can run all of those
cases through the schema. This is still "check by case" technically,
but it's blanketing a lot more cases.
7. Finally, core.typed is very powerful but very slow and hard to
change, and the reasoning and debugging must be very careful.
I would only recommend this for the most fixed and immutable
parts of the codebase.
In addition, core.typed may be able to play a useful role
in eliminating the need for some mocking and stubbing, if you
just want to be able to assert through inference that the proper
types of output come from the proper types of input, without manually
trying to shove in mock objects in as inputs.
To recap, start and persist with maps and mild unit tests as far as they
can get you. Then consider moving up to schema and possibly generative
testing if necessary. Only finally move up to core.typed when you're
really sure it'll be worth the time and security EXCEPT PERHAPS if your
namespace contains low-hanging fruit--e.g. code is both important
and simple enough to cover with core.typed in an efficient manner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment