Skip to content

Instantly share code, notes, and snippets.

@aphyr
Created March 19, 2013 18:17
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save aphyr/5198654 to your computer and use it in GitHub Desktop.
Save aphyr/5198654 to your computer and use it in GitHub Desktop.
Thoughts on configuration file complexity.

Some people think that configuration should not be "Turing complete"; others think the configuration language should be as expressive as possible--so much so that it amounts to a programming language itself.

I assert that configuration files which are not Turing complete (or, perhaps a more useful distinction, are not "sufficiently complex languages") are in practice extended to be that complex; and when that happens, the result is often more difficult to understand than if the language were more powerful to start with.

What do I mean?

To start, let's rule out easy problems. If the problem you're solving is simple (relative to the domain of the program), your configuration can also be simple. It won't really matter whether the configuration is expressed in a powerful or basic language, because it just won't be big enough to make life difficult. In this post, I'm talking about big problems. Nagios configs with thousands of nodes and millions of node/service combinations, for instance.

I argue that the problem of managing configuration complexity is very much akin to managing code complexity. In both configuration and code you're specifying the behavior of a complex program. Both are concerned with names, values, identities, and repetition.

In code, we've developed techniques to manage this complexity:

  • Divide problems into isolated parts
  • Re-combine those parts to solve large problems
  • Bind expressions to names to make the relationships between parts clear
  • Condense low-entropy expressions with higher-level language constructs

One of my goals in writing Riemann was to make these techniques available to both programmers and operations engineers. Let's say you write a bunch of configuration relating to monitoring your AMPQ systems. In Riemann, not only can you break up configuration into multiple files...

(include "amqp.clj")

... but you can organize your configuration into Clojure namespaces, protocols, and functions:

(ns foo-corp.amqp)

(defn monitor-queues
  "Returns a stream which monitors AMQP queues for signs of stress, and emits
warning events to an alerting stream."
  [alerts]
  (where (service "ampq")
    ...))

And we can use Clojure's existing faciities to organize and refer to those functions--by importing an entire namespace, only particular functions, or using fully-qualified names.

(use ['foo-corp.amqp :only 'monitor-queues])

(streams
  (monitor-queues (email "queues@ops.net"))
  (foo-corp.www/check-haproxy)
  (graph-things (graphite "1.2.3.4")))

Having access to variables, functions, namespaces, and so forth not only makes configuration smaller, but makes it easier to organize and reason about. Not only is this configuration composable, but it has documentation built in. We can introspect the config at the REPL. We can generate HTML documentation. We can learn the language by reading widely available documentation, instead of learning the syntax and scope rules of a particular software tool. In short, we have access to all the tooling provided by a mature language, for free.

Simple configuration

By "simple", I mean "less powerful". Clojure is arguably simple, but it is also very complex. I'm using the word "simple" to refer to languages which are limited to basic constructs, like key-value lists, trees, etc., with no facility for higher-level expression.

In order to be simple, many configuration languages make these strategies for code organization impossible. Everything is concrete. Others embed specific hierarchical constructs like hostgroups or ACL roles, which makes it possible to move up a level in the complexity hierarchy, but because these constructs are not first class elements of the language, you can't abstract any further. Constructs to reduce complexity in config files come with their own syntax and their own rules, which means the poor woman trying to maintain these systems has to learn how these constructs work--and sometimes, they work in unexpected ways.

This has been the situation for configuring software for many years. And yet, people need to solve complex problems (again, relative to the domain of the software tool.) We know this because they take one of two paths:

  1. Invent a new tool which more closely represents their problem.
  2. Embed the configuration language in another, more powerful language.

The first strategy, to write a new program, is often a good choice for internal tools. Maybe your arrangement of hosts and users looks really different from how other users have arranged their systems. A homegrown system solves your problem, using your language, and often embeds constructs specific to your domain, like customers, accounts, operational groups, etc. The configuration language for this system can be simple, because the software itself maps closely to your problem.

The downside to writing your own is that this is the same process which created the software you couldn't use in the first place. If your problem is shaped the same way as everybody else's, this works great and your tool is broadly applicable. If your problem is idiosyncratic (which tends to happen as organizations grow larger), the software may not generalize well to other problems.

Option 2 is also widespread. I assert that almost every ops group has already converted their configuration language into a Turing-complete system.

Chef, for example, embeds your configuration files in Erb, which also embeds Ruby.

node[:fqdn] = "latte"
template "/tmp/foo" do
  source "foo.erb"
  variables({
    :services => ["www", "mysql"]
  })
end
<node>
  <name><%= node[:fqdn] %></name>
  <services><% for service in @services %>
     <service><%= service %><service>
  </services>
</node>

Now we have three interacting languages: the config language (XML), the templating language (ERB), and the generating language (Ruby). At every boundary between these languages, there is an impedance mismatch. For example:

  • XML uses < and > to denote tags. ERB uses <% and %> to denote escaping. Your editor may not understand the difference in context between these two, which makes it easy to make syntax errors.
  • Ruby serializes objects to strings in one way. XML has a very different set of rules about strings. For instance, an ampersand is a perfectly valid string in Ruby. It's not a valid string in XML.
  • Unless you're inside CDATA or a comment. ERB has no syntactic awareness of these contexts, so it's up to the user to pick the escaping function that correctly translates between these types.
  • Unless you want to actually emit a tag in XML. XML actually has two levels of language: the literal text, and the logical node tree. You might want to interact with both levels.
  • Unless an ERB tag crosses a contextual boundary in XML, like generating a tag with attributes. Now it's the responsibility of distant Ruby code to correctly manage the XML syntax.

These problems are not specific to XML, Ruby, and ERB. It applies to almost any templated environment. JSON, YAML, DNS zone files, Apache configs, Nginx configs, Haproxy configs, Scala files, EDN, property lists... all have their own syntactic rules which require translation. Writing any one of these files requires a careful understanding of the interaction between the two languages. -1.23e-15 is a perfectly valid Ruby number. Is it a valid number in your config language?

"Well, we don't use unusual strings in this variable, so this isn't a problem." I say this to myself all the time, when writing a templated config, and I use it to justify not writing the correct escaping code. A few weeks later someone (likely me) copies that line into a new context without thinking, or adds a property with a quote in it, or a password with an ampersand, and things explode. In my experience, making errors at these boundaries is common in config management.

Moreover, we've split the config file into two tightly-coupled pieces in different files. The behavior of nginx now depends on the template and the structure of a hashmap in a Chef template resource--created by arbitrary code--which in turn depends on the structure of a node and all sorts of other implicit variables brought into existence by Chef. Maybe there are databags or LDAP involved. You've got all sorts of interacting components involved in the behavior of a system. This is not reduced complexity. In order to reason about what nginx does, you have to reason about the connections between all these moving parts.

Static Verification

The advantage, one could argue, of simple configuration files is that they are, in theory, statically verifiable. I can run nginx --test on a config file, and it'll tell me whether I got it right.

Well, not quite. Nginx will tell me whether the file appears syntactically consistent. It may also tell me, to some extent, whether the file represents a logically consistent configuration. It cannot (just like all static analysis) tell me whether the config is isomorphic to the ideas in my head, or in my boss' head.

Just like programming languages, configuration languages are statically verifiable to different degrees. One could imagine a configuration language in Haskell, where the type system could statically verify the syntactic, and to a large extent the semantic, correctness of the config before running it. Riemann uses Clojure for its configuration language, and it will catch syntactic and logical errors in its test phase as well. For instance:

(let [graph (graphite {:host "1.2.3.4"})]
  (streams
    (where (service "foo")
      grap)))

This is a syntactically valid Clojure file. Riemann will also refuse to load it: if you kill -hup Riemann and it looks at this file, it'll tell you "I won't apply these changes, because there is no such variable "grap" on this line."

But Riemann can't prevent you from making all sorts of errors, like

(where (service "foo")
  (where (service "bar")
    prn))

This is a valid program. It's also wrong: no events will ever reach prn. Every system has this boundary, to varying degrees. Nginx will happily proxy to the wrong server. Grep will search for a mis-spelled word. In programming, we need both static checks and runtime tests to verify the program's correctness. The same applies to configuration.

Many operations folks call this a staging enviromnent--and I think gradually introducing a new configuration (just like new code) into production systems is important. But there might also be a role for an in-between verification: automated tests for configuration. This... isn't something I think is commonly addressed, insofar as most programs make it difficult to test their behavior in isolation. Testing an nginx config means actually running nginx.

One of the interesting consequences of generating your config from a configuration management system means that your test can no longer operate on either of those systems in isolation. You need to run the full toolchain to find a mistake.

If there's something we've learned from software development, I think it's that end-to-end testing is an important part of a balanced breaktest, but it's not the end-all-be-all of testing. We need a way to test components by themselves, and to compose those components into reliable systems as a whole.

To that extent, I think expressing your configuration in a first-class language with constructs like variables, namespaces, and functions is really quite promising. Not only can we use these techniques to break up and manage complexity in the configuration--but those same divisions can be tested in isolation. Since Riemann streams are functions, you can test a stream you've written in your configuration by calling it. Testing streams at multiple levels, as they compose, makes it possible to gradually verify larger and larger parts of your config, without having to reason about the entire system at once.

This is not something which is common. This is not something Riemann is particularly good at. But it's something I think we can, as developers of configurable systems, provide. I've put a ton of work into designing simple, reproducible automated tests for Riemann's streams, and I think that work can be adapted to test configuration as well.

@pims
Copy link

pims commented Mar 20, 2013

Does “configuration as code” work well with multiple environments? Would I need to package different jars/wars for each environment because the configuration changes? Or do you recommend handling the difference within the code itself?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment