Skip to content

Instantly share code, notes, and snippets.

@trlinkin
Last active September 17, 2019 21:40
Show Gist options
  • Save trlinkin/33a8aa6b6fee9ddad8f0cc4dd8b40abf to your computer and use it in GitHub Desktop.
Save trlinkin/33a8aa6b6fee9ddad8f0cc4dd8b40abf to your computer and use it in GitHub Desktop.

Bolt Apply

In developing Puppet code, as with most things, the ability to iterate rapidly is essential. The ability to quickly experiment enables one to not only find the best solution, but challenge assumptions, and explore new options. Experimentation comes at a cost though, and we typically measure that in time. To make matters worse, we typically only have so much time to develop code or provide our solution before we must move on. Thus, experimentation begins to fall to the wayside as it's cost in time exceeds what we have alloted. As a result, while we may arrive a solution, the benefit of testing multiple hypothesis and the intuition for future solutions it creates is lost. Ultimately, we want to reduce the cost of experiments by removing the perceived cost in time.

The cost of an experiment is not just in the time it takes to create and run the test, but the potential cost of failing and needing to reset for the next attempt. Ideally, we want to cut the cost of time spent for the setup of the experiment. By doing this we can reinvest the time to iterate over more potential solutions, and curiously enough we will naturally try new and more innovative solutions. When failure might mean large quantities of time being diverted away from actually finding the solution and instead spent resetting for a new attempt, people naturally start avoiding solutions they perceive as having the potential to need to reset afterwards. They stick to "safe" solutions that won't cause much of a wake in their testing environment. Even if other potential solutions have a lot of credibility and potential to be the right solution, if they carry a perceived potential to cost time spent on the tedium of resetting then the solution will be unconsciously rejected before it is even attempted.

This is the trouble with the current testing structure at Duke Energy. The "private" Puppet Enterprise testing infrastructure is actually restricting the ability to develop strong solutions quickly! The testing environment is something that can become corrupted by the wrong experiment. When that happens, there is the need for a rebuilding (a reset) of the environment before any further work can be continued. This is not only tedious, but not how anyone wants the time to be invested. Additionally, talking about cost, there is the cost of the resources itself just to get Puppet Enterprise up in this capacity for multiple testing environments. While there is a case for eventually testing Puppet code in the context of a fully operational Puppet Enterprise infrastructure, it isn't enough to justify the cost of effort and time that is dedicated to the "private" Puppet Enterprise environments.

Chances are, and this is especially true for new Puppet users, that most of the development efforts pertain to a small section of code, or can be tested in isolation. Isolation in this case meaning that the whole Puppet code base need not be run, and the "correct" configuration from Hiera for the actual running environment is not fully required to be used. Ideally, if I modify one module, I can probably run that one module in a limited capacity to gain confidence that my code is producing the desired results. This even goes for profiles and even likely most roles. Production data only matters in production, and it should be tested and validated separate from the code design, structure, and development. After the basic testing, if there is a situation where the Puppet code happens to be dependent on some aspect of a Puppet Enterprise master, we can then test on a PE master and validate those aspects of code. Also, by time we bring the Puppet code to testing in the context of a fully functional PE infrastructure, we have a base level of confidence it will not corrupt our expensive (in a time sense) testing environment.

How do we do this then? Enter Bolt and more specifically bolt apply to save the day!

The Foundation

The goal is to perform experiments rapidly, and with a low cost. By "low cost" we of course refer to the time it takes to setup the experiment, either the first time or after an attempt completely destroys everything in a non-recoverable manner. Ideally, most of the development, especially for exploratory experimental work, needs little more than a single host that we can run Puppet code (catalogs really) against. With that goal in mind, we want to provide the ability to, on demand, claim a blank virtual machine on which we can perform our testing. These systems under test (SUT) should be considered disposable. Claim a node, do your work, and when you're complete or destroy the node through your testing, throw it away. As such, these nodes do not need to be fully "secure" or fully privileged within the greater environment. They may need some resources such as package repositories, access to systems with test data, lab environment user identity management, but absolutely do not need any real world or production data or even access to contact those systems on a network level.

These test nodes should not even be managed by a PE infrastrcture, it's simply not needed. When they are created, it's possible you may want Puppet installed and some basic code applied via puppet apply but even that isn't essential. Exactly why it isn't essential will be explained when we delve into the details of the bolt apply process. For now, it's important to be able to, on request, be allocated a node for testing. One that always arrives the same, repeatably, predictably, ready to test. Some of the only configuration these nodes should have is a default non-root user that can escalate to root with a known password and SSH key that is considered insecure and is passed around as part of testing. If there is concern about abuse or bloat, a reasonable policy is that these nodes have a TTL of 24 hours (or whatever makes everyone comfortable) after which they are absolutely destroyed with no option to prevent destruction. nobody should treat these nodes in any permanent sense.

Bolt for Puppet Code Testing

Aside from the awesome ad-hoc task orchestration of Bolt, which you should look into if not familiar, there is a feature totally aimed towards taking advantage of Bolt's ability to orchestrate and uses the powers for the good of Puppet code testing. The feature bolt apply is aimed at allowing bolt to compile catalogs based on Puppet code, and then apply that catalog on whatever remote nodes are specified. Bolt already knows how to do most of the hard part. Bolt can process a Puppetfile as well as compile the catalog, and Bolt will even install the Puppet agent on the remote machine so that the generated catalog can be processed and applied. While the Puppet code parsing and catalog creation would happen on the node from which bolt apply is executed, it is then shipped over to the remote node and actually processed as though it had come from a PE infrastructure, but without all the effort to stand one up.

The Bolt project directory

Any directory can become a Bolt project directory by adding a bolt.yaml to it. The mere presence of the file tells Bolt "yeah, this is the place." From there you can use standard Bolt commands and run them in the context of the directory assuming it's your present working directory. This file also allows Bolt to be configured for the context of this working directory which is important if there is special behaviors you're looking to set Bolt up with. Bolt projects tend to have a number of things in them, including modules that contain tasks and Puppet code, usually referenced and pulled in via a Puppetfile. Now, you may be able to see where this is going, Control Repos also have Puppetfiles in them. What this means is to enable our code base for bolt apply we will be adding a bolt.yaml to our Control Repo. More information on Bolt project directories can be found here.

Setting up The bolt.yaml

The bolt.yaml contains options to modify the behavior of Bolt. This document will not be going into all the setting as they are detailed here. Instead we will look at some setting to get you started with.

---
ssh:
  host-key-check: false
  private-key: ./development_insecure_private_key
  run-as: root
  user: insecureuser
#plugin_hooks:
#  puppet_library:
#    plugin: task
#    task: 'puppet_agent::install'
#    parameters:
#      yum_source: 'yumrepo.duke.com'

Lets breakdown the example above. The SSH setting are done to ensure that we connect to the remote node correctly. In this example we're connecting as insecureuser to the remote machine, but once we connect we escalate to root for the work we want to do. We've turned off host key validation as these nodes are coming and going so quickly in theory. We've also set the private key to one that in theory has been embedded into the root of the control repo and can access only SUT nodes that are allocated for an experiment. There are more settings to use, but for basic Linux testing, this should work just fine. Refer to the documentation for more settings and details. Alternatively, if you had an insecure text password for the nodes, you could put that in here instead of using a key to authenticate SSH.

The plugin_Hooks section is commented out above, but it should be used in the event the SUT nodes cannot access the Internet directly. This configuration informs bolt apply how to "prepare" the SUT with the Puppet agent. By default, Bolt uses the internet as the source, but if you were to uncomment the lines above, you could redirect the source used for Yum packages. More parameters for puppet_agent::install can be found in the documentation. Other parameters include where to get the package for Windows. Your PE infrastructure is actually a great source of installation packages if configured properly.

Basic Process

Once you're all set up and you have both a reasonable SUT allocated as well as a Control Repo properly setup with a bolt.yaml you can start using bolt apply. To do so you need to follow the basic process below:

  1. Checkout your Puppet code to your workstation
  2. from the root of your Puppet code repo, where the bolt.yaml is, execute bolt puppetfile install
  3. Execute bolt apply against your SUT with a declaration of the code you wish to test a. For example: bolt apply -e "include role::sample_website" -n ec2-54-174-158-94.compute-1.amazonaws.com

Assuming you're all setup, things should "just work" and you'll see some feedback on the Puppet run (truncated for brevity):

[root@tom-pelabnix0 control-repo]# bolt apply -e "include role::sample_website" -n ec2-54-174-158-94.compute-1.amazonaws.com
Starting: install puppet and gather facts on ec2-54-174-158-94.compute-1.amazonaws.com
Finished: install puppet and gather facts with 0 failures in 7.41 sec
Starting: apply catalog on ec2-54-174-158-94.compute-1.amazonaws.com
Finished on ec2-54-174-158-94.compute-1.amazonaws.com:
  Notice: /Stage[main]/Ntp::Config/File[/etc/ntp.conf]/content: content changed '{md5}dc9e5754ad2bb6f6c32b954c04431d0a' to '{md5}95ed2f8040a4a3fe85acf5acceecf208'
  Notice: /Stage[main]/Ntp::Config/File[/etc/ntp/step-tickers]/content: content changed '{md5}9b77b3b3eb41daf0b9abb8ed01c5499b' to '{md5}02e4b00ee48539fdefd90eb6c5c25117'
  Notice: /Stage[main]/Ntp::Service/Service[ntp]/ensure: ensure changed 'stopped' to 'running'

...

  Notice: /Stage[main]/Apache::Service/Service[httpd]/ensure: ensure changed 'stopped' to 'running'
  changed: 130, failed: 0, unchanged: 56 skipped: 0, noop: 0
Finished: apply catalog with 0 failures in 77.7 sec
Successful on 1 node: ec2-54-174-158-94.compute-1.amazonaws.com
Ran on 1 node in 1 min, 25 sec

The above example shows a run of Puppet code using a "role" as an entry point by simply passing -e "include role::sample_website" on the command line. While the example above was an example of running a complete role, there are other ways this could be run, for example against a single module or class such as -e "include ntp". It depends on what is being tested and what the goal is, but the -e option essentially says "process and apply this code." While there are more options and feature you should explore and discover on your own, there is also --noop available with bolt apply as well. Additionally, what was done here does not need to be run out of the Control Repo. Anywhere can be given the proper key, a bolt.yaml, and some code to run or perhaps a Puppetfile to pull in code. Establishing the Control Repo with the bolt.yaml is more of an exercise in convenience, but these options are open.

Be Advised

There are a handful of limitations with this approach. Typically such "limitations" are the point where you would want to run your Puppet code from a fully functional PE infrastructure to fully check all the features. Overall these limitations should not impede the ability to use bolt apply for a majority of the development and experimentation. The notable limitations are:

Exported Resources

As might be expected, if your Puppet code is utilizing exported resources, you won't be able to prove how that works with bolt apply. This is of course due to the lack of PuppetDB and full Puppet infrastructure services when running in this capacity. While you won't be able to prove if exported resources are working as desired, you'll at least be able to demonstrate that the code works well enough to proceed to testing on fully functional infrastructure. The bolt apply command should still work in the face of failing exported resource calls.

Hiera Configurations

Once of the things bolt apply will do when attemtping to compile the catalog is use the Hiera configuration (hiera.yaml) that you have in your Bolt project Directory. If your project directory is a Control Repo then you'll absolutely have a hiera.yaml in it (or at least you should). If you're using any special plugins, such as eyaml, and don't have everything configured properly to use it, such as the decryption keys, on your local workstations then you may experience failure when compiling catalogs.

To work around this, you could keep an alternate Hiera configuration. Something like a hiera-bolt.yaml that excludes the eyaml (or other special) configuration. You could then use your bolt.yaml to instruct Bolt to use it. Top level directives such as hiera-config would come into play here.

In addition to the limitations detailed above, you need to watch for certain pitfalls that will hinder your ability to use this process. The pitfalls include:

No Access to Certain Modules

For example, if a user running bolt puppetfile install does not have at least read access to all the modules mentioned in the Puppetfile there may be a failure. At least it could prevent collecting all the modules needed to test at a "role" level. This is easy to rectify, the important part is to be aware of whats happening.

Watch That Modules Dir

After running for a long time, it's possible that the modules installed by bolt puppetfile install might be out of date. Be sure to make running bolt puppetfile install a regular discipline. Additionally, be sure to add the modules directory to the .gitignore file so that people are not checking in full module directories after they test a little.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment