aphyr/gist:5198654

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    Some people think that configuration should not be "Turing complete"; others
think the configuration language should be as expressive as possible--so much
so that it amounts to a programming language itself.
I assert that configuration files which are not Turing complete (or, perhaps a
more useful distinction, are not "sufficiently complex languages") are in
practice extended to be that complex; and when that happens, the result is
often more difficult to understand than if the language were more powerful to
start with.
What do I mean?
To start, let's rule out easy problems. If the problem you're solving is simple
(relative to the domain of the program), your configuration can also be
simple. It won't really matter whether the configuration is expressed in a
powerful or basic language, because it just won't be big enough to make life
difficult. In this post, I'm talking about big problems. Nagios configs with
thousands of nodes and millions of node/service combinations, for instance.
I argue that the problem of managing configuration complexity is very much akin
to managing code complexity. In both configuration and code you're specifying
the behavior of a complex program. Both are concerned with names, values,
identities, and repetition.
In code, we've developed techniques to manage this complexity:

Divide problems into isolated parts
Re-combine those parts to solve large problems
Bind expressions to names to make the relationships between parts clear
Condense low-entropy expressions with higher-level language constructs

One of my goals in writing Riemann was to make these techniques available to
both programmers and operations engineers. Let's say you write a bunch of
configuration relating to monitoring your AMPQ systems. In Riemann, not only
can you break up configuration into multiple files...
(include "amqp.clj")
... but you can organize your configuration into Clojure namespaces, protocols, and functions:
(ns foo-corp.amqp)

(defn monitor-queues
  "Returns a stream which monitors AMQP queues for signs of stress, and emits
warning events to an alerting stream."
  [alerts]
  (where (service "ampq")
    ...))
And we can use Clojure's existing faciities to organize and refer to those
functions--by importing an entire namespace, only particular functions, or
using fully-qualified names.
(use ['foo-corp.amqp :only 'monitor-queues])

(streams
  (monitor-queues (email "queues@ops.net"))
  (foo-corp.www/check-haproxy)
  (graph-things (graphite "1.2.3.4")))
Having access to variables, functions, namespaces, and so forth not only makes
configuration smaller, but makes it easier to organize and reason about.
Not only is this configuration composable, but it has documentation built in.
We can introspect the config at the REPL. We can generate HTML documentation.
We can learn the language by reading widely available documentation, instead of
learning the syntax and scope rules of a particular software tool. In short, we
have access to all the tooling provided by a mature language, for free.
Simple configuration

By "simple", I mean "less powerful". Clojure is arguably simple, but it is also
very complex. I'm using the word "simple" to refer to languages which are
limited to basic constructs, like key-value lists, trees, etc., with no
facility for higher-level expression.
In order to be simple, many configuration languages make these strategies for
code organization impossible. Everything is concrete. Others embed specific
hierarchical constructs like hostgroups or ACL roles, which makes it possible
to move up a level in the complexity hierarchy, but because these constructs
are not first class elements of the language, you can't abstract any further.
Constructs to reduce complexity in config files come with their own syntax and
their own rules, which means the poor woman trying to maintain these systems
has to learn how these constructs work--and sometimes, they work in unexpected
ways.
This has been the situation for configuring software for many years. And yet,
people need to solve complex problems (again, relative to the domain of the
software tool.) We know this because they take one of two paths:

Invent a new tool which more closely represents their problem.
Embed the configuration language in another, more powerful language.

The first strategy, to write a new program, is often a good choice for internal
tools. Maybe your arrangement of hosts and users looks really different from
how other users have arranged their systems. A homegrown system solves your
problem, using your language, and often embeds constructs specific to your
domain, like customers, accounts, operational groups, etc. The configuration
language for this system can be simple, because the software itself maps
closely to your problem.
The downside to writing your own is that this is the same process which created
the software you couldn't use in the first place. If your problem is shaped
the same way as everybody else's, this works great and your tool is broadly
applicable. If your problem is idiosyncratic (which tends to happen as
organizations grow larger), the software may not generalize well to other
problems.
Option 2 is also widespread. I assert that almost every ops group has already
converted their configuration language into a Turing-complete system.
Chef, for
example, embeds your configuration files in Erb, which also embeds Ruby.
node[:fqdn] = "latte"
template "/tmp/foo" do
  source "foo.erb"
  variables({
    :services => ["www", "mysql"]
  })
end
<node>
  <name><%= node[:fqdn] %></name>
  <services><% for service in @services %>
     <service><%= service %><service>
  </services>
</node>

Now we have three interacting languages: the config language (XML), the
templating language (ERB), and the generating language (Ruby). At every
boundary between these languages, there is an impedance mismatch. For example:

XML uses < and > to denote tags. ERB uses <% and %> to denote
escaping. Your editor may not understand the difference in context between
these two, which makes it easy to make syntax errors.
Ruby serializes objects to strings in one way. XML has a very different set
of rules about strings. For instance, an ampersand is a perfectly valid string in Ruby. It's not a valid string in XML.
Unless you're inside CDATA or a comment. ERB has no syntactic awareness of
these contexts, so it's up to the user to pick the escaping function that
correctly translates between these types.
Unless you want to actually emit a tag in XML. XML actually has two levels
of language: the literal text, and the logical node tree. You might want to
interact with both levels.
Unless an ERB tag crosses a contextual boundary in XML, like generating a
tag with attributes. Now it's the responsibility of distant Ruby code to
correctly manage the XML syntax.

These problems are not specific to XML, Ruby, and ERB. It applies to almost any
templated environment. JSON, YAML, DNS zone files, Apache configs, Nginx
configs, Haproxy configs, Scala files, EDN, property lists... all have their
own syntactic rules which require translation. Writing any one of these files
requires a careful understanding of the interaction between the two languages.
-1.23e-15 is a perfectly valid Ruby number. Is it a valid number in your config
language?
"Well, we don't use unusual strings in this variable, so this isn't a
problem." I say this to myself all the time, when writing a templated config,
and I use it to justify not writing the correct escaping code. A few weeks
later someone (likely me) copies that line into a new context without thinking,
or adds a property with a quote in it, or a password with an ampersand, and
things explode. In my experience, making errors at these boundaries is common
in config management.
Moreover, we've split the config file into two tightly-coupled pieces in
different files. The behavior of nginx now depends on the template and the
structure of a hashmap in a Chef template resource--created by arbitrary
code--which in turn depends on the structure of a node and all sorts of other
implicit variables brought into existence by Chef. Maybe there are databags or
LDAP involved. You've got all sorts of interacting components involved in the
behavior of a system. This is not reduced complexity. In order to reason about
what nginx does, you have to reason about the connections between all these
moving parts.
Static Verification

The advantage, one could argue, of simple configuration files is that they are,
in theory, statically verifiable. I can run nginx --test on a config file, and
it'll tell me whether I got it right.
Well, not quite. Nginx will tell me whether the file appears syntactically
consistent. It may also tell me, to some extent, whether the file represents a
logically consistent configuration. It cannot (just like all static analysis)
tell me whether the config is isomorphic to the ideas in my head, or in my
boss' head.
Just like programming languages, configuration languages are statically
verifiable to different degrees. One could imagine a configuration language in
Haskell, where the type system could statically verify the syntactic, and to a
large extent the semantic, correctness of the config before running it. Riemann
uses Clojure for its configuration language, and it will catch syntactic and
logical errors in its test phase as well. For instance:
(let [graph (graphite {:host "1.2.3.4"})]
  (streams
    (where (service "foo")
      grap)))
This is a syntactically valid Clojure file. Riemann will also refuse to load
it: if you kill -hup Riemann and it looks at this file, it'll tell you "I won't
apply these changes, because there is no such variable "grap" on this line."
But Riemann can't prevent you from making all sorts of errors, like
(where (service "foo")
  (where (service "bar")
    prn))
This is a valid program. It's also wrong: no events will ever reach prn. Every
system has this boundary, to varying degrees. Nginx will happily proxy to the
wrong server. Grep will search for a mis-spelled word. In programming, we need
both static checks and runtime tests to verify the program's correctness. The
same applies to configuration.
Many operations folks call this a staging enviromnent--and I think gradually
introducing a new configuration (just like new code) into production systems is
important. But there might also be a role for an in-between verification:
automated tests for configuration. This... isn't something I think is
commonly addressed, insofar as most programs make it difficult to test their
behavior in isolation. Testing an nginx config means actually running nginx.
One of the interesting consequences of generating your config from a
configuration management system means that your test can no longer operate on
either of those systems in isolation. You need to run the full toolchain to
find a mistake.
If there's something we've learned from software development, I think it's that
end-to-end testing is an important part of a balanced breaktest, but it's not
the end-all-be-all of testing. We need a way to test components by
themselves, and to compose those components into reliable systems as a
whole.
To that extent, I think expressing your configuration in a first-class language
with constructs like variables, namespaces, and functions is really quite
promising. Not only can we use these techniques to break up and manage
complexity in the configuration--but those same divisions can be tested in
isolation. Since Riemann streams are functions, you can test a stream you've
written in your configuration by calling it. Testing streams at multiple
levels, as they compose, makes it possible to gradually verify larger and
larger parts of your config, without having to reason about the entire system
at once.
This is not something which is common. This is not something Riemann is
particularly good at. But it's something I think we can, as developers of
configurable systems, provide. I've put a ton of work into designing simple,
reproducible automated tests for Riemann's streams, and I think that work can
be adapted to test configuration as well.