Skip to content

Instantly share code, notes, and snippets.

@danielsdeleo
Last active June 17, 2017 01:20
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save danielsdeleo/7c55ebe39639928134df to your computer and use it in GitHub Desktop.
Save danielsdeleo/7c55ebe39639928134df to your computer and use it in GitHub Desktop.
Alternative Implementation of Environments Discussion

Rethinking Environments

So, I didn't think of any of this until just about the time that we released the environments feature, which was way too late to do anything about it. Anyway, here's my alternate conception of environments.

Basic Interactions

  • Cookbooks always get their version from a checksum of their contents. The way the checksum is generated isn't particularly important, but it's probably a checksum of all the individual files' checksums, like a Merkle tree without the tree.
  • knife cookbook upload always uploads to a particular environment. The command line invocation can be the same as it is now, or more git-like, e.g., knife cookbook upload ENV COOKBOOK. The "_default" environment is assumed if no environment is given.
  • knife cookbook upload always constrains the environment to that exact version of the cookbook.

What You Lose

  • You can't have two active versions of a cookbook in a single environment. If cookbook A needs cookbook Z at version 1 and cookbook B needs cookbook Z at version 2, and you need both A and B to run your infrastructure, then you need to modify one or more of the cookbooks to fix the incompatibility.
  • The x.y.z version field of cookbook data may no longer have meaning to the chef-server, as environments would only care about the cookbook's checksum version. See next bullet point for discussion of how version constraints might be handled.
  • Versioned dependencies in cookbook metadata must either be ignored, enforced by chef-client after fetching cookbook updates, or verified by the server upon upload. Each of these approaches has unfavorable trade-offs. Enforcing version constraints on the client side adds considerable delay between when a user causes and error and when they detect it; enforcing version constraints on the server may make it annoying to upload multiple cookbooks with "interlocking" version constraints. Ignoring version constraints entirely will likely confuse users who expect version constraints to be respected and potentially allow conflicting cookbook versions to be used together.
  • Environments become less versionable (i.e., it's more difficult to track them as files in git); most likely, you'd upload environment attributes separately from version restrictions, which would in many cases be changing pretty frequently.
  • Environments always constrain a cookbook to exactly one version. There's no greater than/less than/pessimistic greater than.
  • Compared to tracking an environment as a file in a revision controlled repo, it's trickier to snapshot what version of everything is in use at a given time. Some tooling could help, for example by tagging a git repo with a cookbook's checksum at a particular commit.

What You Get

  • Every cookbook is always frozen, since if it changes it'll have a different checksum. Since cookbooks are individually added to environments, there's much less chance of a dev cookbook being pushed to production.
  • Workflow is much simpler. You push a cookbook to the environment where you want it to go. Compare to editing environment files or JSON or setting up Ci to compile an environment. Even if you use -E and --freeze with knife cookbook upload you still have to fiddle the cookbook's version numbers by hand.
  • More efficient on the server-side: the current implementation requires chef-server to load every version of every cookbook (including dependency information) to compute the correct solution to the various version constraints. This uses lots of database IO and memory, and the resource requirements increase as you have more cookbook versions.
  • It's easy to understand what version of a cookbook will be used on a node. If you add a dependency "B" to cookbook "A", then upload a new version of "A" without uploading any version of "B", chef-server will act as if the new version of "A" doesn't exist, because its dependencies can't be satisfied. Though knife cookbook upload guards against this particular case, I've observed similar "why isn't the new version of cookbook X being used" issues that could never be traced to a particular cause.
  • It's efficient to add a concept of promotion of cookbooks between environments, since you're simply copying one cookbook's checksum to a different environment's constraints.

Computation of Version Constraints

There are two possibilities for selecting the final set of version constraints in an environment. The first option is an overlay model, where any cookbooks without explicit constraints fall back to using the "_default" environment's constraints. The second is to explicitly version every cookbook in every environment.

In the first model (fallback to default), the version constraints of the "_default" environment are mutable, but only by the Chef server; the Chef server will set them whenever a cookbook is uploaded with no explicit environment.

The set of version constraints is then computed by overlaying the constraints of the node's environment, e.g., "production" on top of the "_default" environment's constraints.

The downside to this model is that the user could introduce a cookbook that should be version constrained into, say, production without setting a constraint. A future update to the non-constrained cookbook could then break production.

In the second model, a cookbook without a constraint in a particular environment would appear not to exist in that environment. This fixes the accidental update issue. On the other hand, it adds friction to user interactions around creating new cookbooks, since a new cookbook has to be added to each environment.

Tooling Changes and API Features

  • Add promotion as a first class concept at either the UX level or API level. Single cookbook, multiple cookbook (e.g., single cookbook plus dependencies), and whole environment promotion should be possible.
  • Checksums don't sort temporally, so there needs to be automatic metadata to help track/sort which version is when. Timestamp, creator, scm id, etc. could all help here.
  • Checksums don't convey semantic meaning, so they should be displayed along with automatic metadata in UIs. Highlighting which versions are assigned to a particular environment would also be helpful.
@danielsdeleo
Copy link
Author

Addition to "tooling changes": we possibly would want to add ad hoc environment creation, e.g., knife cookbook upload -E A_NEW_ENV. With super-frictionless environment use (plus faster cookbook upload round trip time), environments could be a viable model for sandboxed cookbook development (e.g., in vagrant).

@seth
Copy link

seth commented Jan 4, 2013

First, I really appreciate the detailed description of the idea. More like this, please.

I'm not sure I'm following the implications in terms of versioned cookbook dependencies not having much meaning. Limiting environments to a single version of a cookbook makes it possible to "solve" dependencies at cookbook upload/delete time. But it still would be possible to describe versioned cookbook dependencies.

@seth
Copy link

seth commented Jan 4, 2013

Another question is how cookbook versions are shared across servers. The cookbook digest is fixed, but if you add timestamp or just about anything else for ordering then you have to maintain the ability to get just the digest so that you can know that a cookbook on two different servers is actually the same -- or even for same cbv uploaded to different environments in same server at different times, etc.

@danielsdeleo
Copy link
Author

Thanks!

  1. Probably a poor choice of words re: "not much meaning" for cookbooks' x.y.x version numbers. But the idea is that if environments specify cookbooks by checksum, then the x.y.z version number becomes more of a thing for humans to look at and less of a thing that affects responses to API calls.
  2. As for solving cookbook deps at upload time, there are some tricky edge cases, such as uploading two new cookbooks with a circular dependency on each other. Depending on the value of the sanity check, we could modify the upload APIs to make it work, refuse to support circular deps (and other nasty edge cases), add a warning concept to our API responses, or do something else. We'd also have to figure out what to do if a user attempts to upload a cookbook that does not meet the dependency requirements of an existing cookbook; e.g., if A depends on ~> 2.0.0 of B, and I attempt to upload version 3.0 of B, what happens? These are all answerable questions, though my intuition is that a chef-client runtime check would be the least frustrating way of enforcing cookbooks level dependencies.

@danielsdeleo
Copy link
Author

RE: Sharing cookbooks across servers

I've thought a bit more about how the cookbook digest would be computed, and my current thinking is that you'd need to include some parts of the metadata as part of the digest. For example, adding a dependency changes the runtime behavior of the cookbook, so you'd want to make that part of the digest. Contrarily, the (proposed) automatic metadata doesn't alter the behavior of the cookbook, so it wouldn't be considered when computing the digest.

As for uploading an existing cookbook version to a new environment, there are at least two ways to go about it. One is that the server recognizes that the cookbook version already exists and simply does a promotion, with no modifications to any automatic metadata. Another is that a new cookbook version object is created and the automatic metadata is updated. In either case, you definitely want to expose the cookbook digests to the user. I'm thinking that, at minimum, a GET of an environment should show the version constraints in terms of cookbook digests, and a GET to cookbooks/:cookbook should list the known versions in terms of the digest (and include other metadata as useful for sorting, etc.).

@dysinger
Copy link

I really like the immutable checksum of contents idea like Git/Nix there. Good stuff. I don't care about versions too much. I care that the content is correct. ... even better if I can push my chef-repo with git.

I like environments as a total sandbox (roles, data_bags, cookbooks, everything). I can then create as many as I like - even an environment for my laptop's vms without fear of bumping another stage in the pipeline like QA.

@yfeldblum
Copy link

Either add multinenancy and tenant isolation, or don't. In a multi-tenant model, cookbooks, roles, data-bags, nodes, and clients would all be belong to an environment; fetches and searches would be single-environment by default but there could be a pan-environment fetch and search as well. Perhaps node and client names would be globally unique across all environments; but cookbook, role, and data-bag names would be locally unique only within each environment. Cookbook dependency-resolution for a node would be done across the cookbooks in the node's environment.

But this in-between-ness is going to cause problems, no matter where the line falls in the gradient between a single-tenant model and a multitenant model with internenant isolation. It's the mixing-and-matching that causes the problems because it constitutes fake, broken abstractions.

Right now, an environment has no clear meaning; the same is true under this gist. "Cookbook promotion" and "environment promotion" sound to me like hacks designed to take advantage of something that environments have going for them, but to work around environments currently being a fundamentally broken abstraction. Make environments map to isolated tenants, and the problem goes away.

The isolated tenants which environments would map to would be less isolated than organizations in Hosted/Private Chef. There would be no pan-organization fetch and search allowed, and node and client names would be unique only to a given organization. Commands like knife node list would be pan-environment but single-organization. The environment isolation would be non-strict in that it can be escaped on-demand but is enforced by-default, while organization isolation would be strict in that it cannot be escaped.

@realityforge
Copy link

I would love to see versions of cookbooks generated as a digest of relevant (or all?) metadata. However we also regularly use multiple versions of the same cookbook in the same environment. It is not uncommon for us to lock a top level cookbook in the environment file and have it explicitly list the versions of it's dependencies. It would make QA significantly harder if we had to test upgrades of one cookbook across multiple application stacks just to support an upgrade in one application stack. It would be nice to be able to lock the versions of dependent cookbooks on upload - effectively uploading a dependency tree starting from a/some top-level cookbooks (TLCs), much like the model of git. It would also be nice to be able to promote the TLCs across environments and take all their dependencies across.

We currently model environments as that traditional environment notions prod, staging, uat, integration-test, dev and chef-dev. We have about 11 different stacks and several teams all independently evolving their stacks. Under the new regime we would end up having to model 80-90 environments just to maintain effective change control. This does not seem right.

This proposal seems to be doing is creating a new feature and giving it the same name as an existing one. I agree with @yfeldblum - environments have no clear meaning and I don't think this approach solves that. I would also hate to lose the tools to effectively model complex scenarios.

@danielsdeleo
Copy link
Author

@realityforge - Thanks for sharing your use case, certainly a valuable data point that I'll consider. Do you make extensive use of environment attributes as part of your dev/qa/prod/etc. separation?

Outside of that issue, I think managing 80-90 environments actually wouldn't be a problem because it would be really frictionless to update constraints. For an example of what the tooling might look like, you can see my prototype implementation here: https://github.com/danielsdeleo/knife-boxer

@realityforge
Copy link

We make very limited use of environment attributes. Generally the only thing we do is pass along "tagging" information that is later interpreted by a recipe.

For example, for each environment we specify a 'datacenter' attribute, this is interpreted in several recipes. i.e. The nameservers (which are not chef managed) are configured based on the datacenter, the mail server (also not managed by chef) are configured based on the mail server.

We also set a 'cookbook_environment' attribute, so that if it is set to 'stable', then all cookbooks must be locked, and have an even version.

knife-boxer looks interesting. But does that not force you down the path of having a separate branch (or cookbook path etc) for each environment? While we store all our reusable cookbooks in separate repos we braid them into a central repo. Under the knife-boxer model, how would you expect to maintain that many separate environments? Long running branches?

@danielsdeleo
Copy link
Author

@realityforge that's not a workflow I've seen before. You said you have 80-90 cookbook sets now between various environments and applications--how do these get uploaded and moved through the various envs? Do you have a way to get a view of the cookbooks used by a particular app/env combination on disk on your workstation?

@aglarond
Copy link

aglarond commented Apr 3, 2013

First of all, I'd like to thank you for opening this discussion up to the public. I appreciate the opportunity to see some of the thoughts behind possible future paths for Chef, and being able to comment on them.

I'm just going to comment on two of the "What You Get" points, followed by some general thoughts.

"Workflow is much simpler."

I disagree. Going this route with changing the cookbook <-> environment interaction, you force a workflow in which "what I have on my workstation" is king. We have a different workflow in which a production-ready cookbook first has to be checked-in to revision control. If you notice while pushing that someone else has already made some changes, they can be merged/discussed/etc. before the upload. Using your proposal, that interaction is now missing. I could have a stale cookbook that just happened to get uploaded to "prod" because I was sloppy in executing the 'knife cookbook upload' from my shell history. Freezing a cookbook and locking that version to an environment by hand is a good thing.

"More efficient on the server-side"

(Please forgive any misunderstandings stemming from my ignorance of the implementation details.) I don't know why the current implementation requires loading every version of every cookbook, or if this is even still the case with Chef 11. Isn't this just metadata? There shouldn't be much I/O or memory requirements to load version and dependency information. The whole cookbook doesn't need to be loaded into memory for this. If the current implementation is inefficient, change the implementation, not the methodology and associated workflow.

General Thoughts

I like having one set of cookbooks and multiple environments. It makes managing the "active set" much easier. knife-boxer looks like a great way to manage a development workflow; I look forward to integrating this into our current workflows. But, it shouldn't be the new way of managing cookbook versions and environments.

@danielsdeleo
Copy link
Author

@aglarond

  1. Workflow simpler: I don't see how this proposal prevents you from doing uploads from Ci. Just have Ci do the uploads to your prod environment. Contrarily, If chef is not usable without setting up a ton of related tooling, that's a pretty big deficiency.
  2. Server Side Efficiency. Solving a dependency graph is a NP complete problem. There's no getting around that. In the general case we've made it work okay, the solving part generally takes a few milliseconds; however, some dependency graphs can take a minute or more to solve with the same code. You can investigate berkshelf's or bundler's solver if you want to understand this by looking at working code.

@lamont-granquist
Copy link

I think a problem with this is that people carve up their production environment into a lot of different zones for how they do deployments so that the concept of 'environment' doesn't map well to how they do deployment. We can either try to change how people use environments so that it maps better onto deployment zones, or we can introduce another manageable thing into chef. There's obvious upsides and downsides to each approach.

If you have 80 different pods of servers in production that you can deploy to, you have to change your app config or your deployment cookbooks to map your app environment to your chef environment correctly, you also need to change your searches so that they can reach cross-environment but be able to select resources that are in the old concept of the 'production' environment. A best practice might be to name production environments 'production_(appname)', and then cross-environment search would just look for wildcarded 'chef_environment:production*', then either you need to fix your environment names in all your app config or else you need to modify the application cookbook to use the 'production' prefix to find the environment.

Alternatively you could introduce a concept of zones or pods or something which would be a way of slicing up your hosts according to how you deploy changes. This would let people remain having 'production' as their environment for everything running in production. Then the cookbook pinning you are talking about would apply to zones/pods instead of to environments. I suspect you wind up demoting environments to being not much more than a tag in this case and wind up with the new zone/pod concept doing much of the work that environments used it do.

I think I'm tentatively more in favor of changing how environments are used and addressing it with best practices rather than introducing more primitives, but I think that is going to introduce more cognitive load on users, and needs to be fleshed out more with docs and examples. I can extrapolate a bit to how you'd handle managing the details of 80 environments to push to, but I think that may be way too much for most users to just figure out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment