Skip to content

Instantly share code, notes, and snippets.

@danielsdeleo
Last active June 17, 2017 01:20
Show Gist options
  • Save danielsdeleo/7c55ebe39639928134df to your computer and use it in GitHub Desktop.
Save danielsdeleo/7c55ebe39639928134df to your computer and use it in GitHub Desktop.
Alternative Implementation of Environments Discussion

Rethinking Environments

So, I didn't think of any of this until just about the time that we released the environments feature, which was way too late to do anything about it. Anyway, here's my alternate conception of environments.

Basic Interactions

  • Cookbooks always get their version from a checksum of their contents. The way the checksum is generated isn't particularly important, but it's probably a checksum of all the individual files' checksums, like a Merkle tree without the tree.
  • knife cookbook upload always uploads to a particular environment. The command line invocation can be the same as it is now, or more git-like, e.g., knife cookbook upload ENV COOKBOOK. The "_default" environment is assumed if no environment is given.
  • knife cookbook upload always constrains the environment to that exact version of the cookbook.

What You Lose

  • You can't have two active versions of a cookbook in a single environment. If cookbook A needs cookbook Z at version 1 and cookbook B needs cookbook Z at version 2, and you need both A and B to run your infrastructure, then you need to modify one or more of the cookbooks to fix the incompatibility.
  • The x.y.z version field of cookbook data may no longer have meaning to the chef-server, as environments would only care about the cookbook's checksum version. See next bullet point for discussion of how version constraints might be handled.
  • Versioned dependencies in cookbook metadata must either be ignored, enforced by chef-client after fetching cookbook updates, or verified by the server upon upload. Each of these approaches has unfavorable trade-offs. Enforcing version constraints on the client side adds considerable delay between when a user causes and error and when they detect it; enforcing version constraints on the server may make it annoying to upload multiple cookbooks with "interlocking" version constraints. Ignoring version constraints entirely will likely confuse users who expect version constraints to be respected and potentially allow conflicting cookbook versions to be used together.
  • Environments become less versionable (i.e., it's more difficult to track them as files in git); most likely, you'd upload environment attributes separately from version restrictions, which would in many cases be changing pretty frequently.
  • Environments always constrain a cookbook to exactly one version. There's no greater than/less than/pessimistic greater than.
  • Compared to tracking an environment as a file in a revision controlled repo, it's trickier to snapshot what version of everything is in use at a given time. Some tooling could help, for example by tagging a git repo with a cookbook's checksum at a particular commit.

What You Get

  • Every cookbook is always frozen, since if it changes it'll have a different checksum. Since cookbooks are individually added to environments, there's much less chance of a dev cookbook being pushed to production.
  • Workflow is much simpler. You push a cookbook to the environment where you want it to go. Compare to editing environment files or JSON or setting up Ci to compile an environment. Even if you use -E and --freeze with knife cookbook upload you still have to fiddle the cookbook's version numbers by hand.
  • More efficient on the server-side: the current implementation requires chef-server to load every version of every cookbook (including dependency information) to compute the correct solution to the various version constraints. This uses lots of database IO and memory, and the resource requirements increase as you have more cookbook versions.
  • It's easy to understand what version of a cookbook will be used on a node. If you add a dependency "B" to cookbook "A", then upload a new version of "A" without uploading any version of "B", chef-server will act as if the new version of "A" doesn't exist, because its dependencies can't be satisfied. Though knife cookbook upload guards against this particular case, I've observed similar "why isn't the new version of cookbook X being used" issues that could never be traced to a particular cause.
  • It's efficient to add a concept of promotion of cookbooks between environments, since you're simply copying one cookbook's checksum to a different environment's constraints.

Computation of Version Constraints

There are two possibilities for selecting the final set of version constraints in an environment. The first option is an overlay model, where any cookbooks without explicit constraints fall back to using the "_default" environment's constraints. The second is to explicitly version every cookbook in every environment.

In the first model (fallback to default), the version constraints of the "_default" environment are mutable, but only by the Chef server; the Chef server will set them whenever a cookbook is uploaded with no explicit environment.

The set of version constraints is then computed by overlaying the constraints of the node's environment, e.g., "production" on top of the "_default" environment's constraints.

The downside to this model is that the user could introduce a cookbook that should be version constrained into, say, production without setting a constraint. A future update to the non-constrained cookbook could then break production.

In the second model, a cookbook without a constraint in a particular environment would appear not to exist in that environment. This fixes the accidental update issue. On the other hand, it adds friction to user interactions around creating new cookbooks, since a new cookbook has to be added to each environment.

Tooling Changes and API Features

  • Add promotion as a first class concept at either the UX level or API level. Single cookbook, multiple cookbook (e.g., single cookbook plus dependencies), and whole environment promotion should be possible.
  • Checksums don't sort temporally, so there needs to be automatic metadata to help track/sort which version is when. Timestamp, creator, scm id, etc. could all help here.
  • Checksums don't convey semantic meaning, so they should be displayed along with automatic metadata in UIs. Highlighting which versions are assigned to a particular environment would also be helpful.
@danielsdeleo
Copy link
Author

@aglarond

  1. Workflow simpler: I don't see how this proposal prevents you from doing uploads from Ci. Just have Ci do the uploads to your prod environment. Contrarily, If chef is not usable without setting up a ton of related tooling, that's a pretty big deficiency.
  2. Server Side Efficiency. Solving a dependency graph is a NP complete problem. There's no getting around that. In the general case we've made it work okay, the solving part generally takes a few milliseconds; however, some dependency graphs can take a minute or more to solve with the same code. You can investigate berkshelf's or bundler's solver if you want to understand this by looking at working code.

@lamont-granquist
Copy link

I think a problem with this is that people carve up their production environment into a lot of different zones for how they do deployments so that the concept of 'environment' doesn't map well to how they do deployment. We can either try to change how people use environments so that it maps better onto deployment zones, or we can introduce another manageable thing into chef. There's obvious upsides and downsides to each approach.

If you have 80 different pods of servers in production that you can deploy to, you have to change your app config or your deployment cookbooks to map your app environment to your chef environment correctly, you also need to change your searches so that they can reach cross-environment but be able to select resources that are in the old concept of the 'production' environment. A best practice might be to name production environments 'production_(appname)', and then cross-environment search would just look for wildcarded 'chef_environment:production*', then either you need to fix your environment names in all your app config or else you need to modify the application cookbook to use the 'production' prefix to find the environment.

Alternatively you could introduce a concept of zones or pods or something which would be a way of slicing up your hosts according to how you deploy changes. This would let people remain having 'production' as their environment for everything running in production. Then the cookbook pinning you are talking about would apply to zones/pods instead of to environments. I suspect you wind up demoting environments to being not much more than a tag in this case and wind up with the new zone/pod concept doing much of the work that environments used it do.

I think I'm tentatively more in favor of changing how environments are used and addressing it with best practices rather than introducing more primitives, but I think that is going to introduce more cognitive load on users, and needs to be fleshed out more with docs and examples. I can extrapolate a bit to how you'd handle managing the details of 80 environments to push to, but I think that may be way too much for most users to just figure out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment