Currently, there is an explosion of tools that aim to manage secrets for automated, cloud native infrastructure management. Daniel Somerfield did some work classifying the various approaches, but (as far as I know) no one has made a recent effort to summarize the various tools.
This is an attempt to give a quick overview of what can be found out there. The list is alphabetical. There will be tools that are missing, and some of the facts might be wrong--I welcome your corrections. For the purpose, I can be reached via @maxvt on Twitter, or just leave me a comment here.
There is a companion feature matrix of various tools. Comments are welcome in the same manner.
My approach is "if a feature is not described in the documentation, it does not exist". This applies in particular to setting up a high availability secret storage service. I will not read the source or spend significant time experimenting with a particular tool to figure out if something can be supported.
Index of Tools
- Ansible Vault
- Chef Data Bags
- Chef Vault
- Configuration Storage Systems (Consul, etcd, Zookeeper)
- Red October
- Vault (Hashicorp)
This is Ansible's built in secret management system, based on encrypting secrets into a file. Its usage can be more general than Chef's encrypted data bags, as it can be applied to tasks, handlers, etc. and not just to variables; but it is not transparent, in the sense that some tasks will be configured differently when encryption is used. A command line tool is provided to manage the process, and the suggested workflow is to check the encrypted files into source control. There does not appear to be a way to have more than one password for a file, or to define different types of access to a secret, or to audit access.
If you are using Ansible and your main goal is to get the secrets out of plaintext files, this would probably be your natural choice.
Currently an OpenStack project, Barbican is used to cater to secret storage needs of other OpenStack services. It is meant to contain certificates, encryption keys, and other secrets, replacing the multitude of methods used by individual OpenStack projects (encrypted files at rest, database tables, and so on). It takes the "enterprisey" approach (centralized secret management, interoperability between clouds, auditing and compliance support, simple integration with legacy applications, and even HSM support). This sounds great; however, there is a "platform tax" as Barbican both relies on and integrates best with other OpenStack components (such as identity management / authentication, implemented by Keystone). "Use OpenStack tools, processes, libraries, and design patterns" is a key design principle.
- https://speakerdeck.com/jraim/secret-as-a-service-barbican (overview presentation)
- http://docs.openstack.org/developer/barbican/api/index.html (REST API documentation)
Chef Data Bags
This is a built in capability of Chef. A data bag is a JSON file stored on a Chef server and accessible by clients. Neither Chef server nor client care about the exact format of contents of the data bag. An encrypted data bag is the same entity encrypted with a symmetric key. Anyone who wants access to an encrypted data bag's contents needs to have the corresponding key available to them.
The simplest use case is to have a single secret key. The key would be available to all Chef clients (for example, by dropping onto new machines during bootstrap) and some privileged users. Only those users can update the secrets. This approach is very simple conceptually and is effective at keeping secrets out of repos, but does not allow any advanced functionality and does not permit non-privileged users to update secrets. One could use a different key for every data bag, achieving some access separation and allowing self service. Since the data source is on a Chef server, it is possible to log API access for audit. However, all the attendant infrastructure around deciding which keys to drop, rotating keys when group membership changes, and so on would have to be built.
The easiest way to understand Chef Vault is as a framework to use a different shared secret key for every Chef encrypted data bag. Chef Vault addresses the problem of distributing data bag secret keys by encrypting them with each client's public key (used by Chef for client authentication already) and storing the encrypted keys in a separate data bag. A Chef Vault client would fetch the data bag containing keys first; if the client is allowed to access a particular data bag, its corresponding keys data bag would have an entry containing the shared secret and the client would be able to decrypt it using its private key. From that point on, the interaction is the same as with a regular encrypted data bag.
While addressing a particular pain point around key distribution, and avoiding a single-secret-for-all pitfall, Chef Vault still sees secrets themselves as opaque, same as plain Chef. Furthermore, the common use case is making secrets available during a Chef run. Therefore none of the desirable advanced features around generation, non-Chef related presentation, and audit are available in Chef Vault.
A Chef cookbook that retrieves secrets as files from an AWS S3 bucket and relies on AWS IAM policies to enforce access to individual secrets / files. Needless to say, this solution is AWS only.
Documentation is extremely sparse and automation of provisioning access to new machines or humans is nonexistent (for VMs, CloudFormation could help with that). Auditing and versioning are supported by S3 but there is no visible tooling that would present a unified view of all Citadel-related data. The limitation of an EC2 VM having only one IAM role prohibits overlapping groups, making it impossible for one machine to serve multiple roles.
A tool to manage secrets built by Lyft. It is an AWS-only solution using DynamoDB for storage and AWS KMS for both encryption and access control. The service is written in Python.
Interestingly, there is no written documentation whatsoever on how to consume secrets from Confidant, only a mention of a Flask-based API. This is one of the very few services that explicitly says app servers are stateless and therefore can be easily spun up in a highly available constellation. The secret store is write only and the user interface allows viewing changes and rolling back versions straight from the GUI.
Configuration Storage Systems (Consul, etcd, Zookeeper)
There is a bunch of tools for storing configuration that is not necessarily a secret in a highly available, datacenter scale setup. Most of those systems have a tree-like data structure, version their edits, support ACLs and some even offer notification on changes-–something no secret management system provides.
The limitations of these systems are common and are related to not being security or password oriented. Specifically, they do not offer password generators and they do not have first party support for presentation of decrypted secrets. Support for authentication and encryption (in transit or at rest) has not been a given but is becoming a commonplace optional feature. Audit functionality is generally not available.
Hashicorp Vault supports Consul (as first party), zookeeper and etcd as high availability backend storage while adding the missing security oriented features.
Conjur is a closed-source appliance that does secret management as well as generic directory and access management with a RBAC model. The appliance is self-contained and provided as a Docker or AWS AMI image. UI and CLI interfaces are provided to the core REST API exposed by the appliance. As a directory, Conjur also provides a LDAP endpoint to integrate with other directory-consuming applications. For secrets, Conjur offers a Summon plugin to present secrets as environment variables.
Reading the developer docs for plugins, I'm guessing the implementation is in Ruby as well. The server documentation lists Postgres and Nginx as other services within the container. It is possible to run multiple appliances in a master-follower setup, but it is unclear if automated failover is included in the base setup or must be done externally. The main Conjur website has been completely taken over by marketing and sales gobbledygook; the developer documentation is a much more useful source of information.
If you prefer a spartan approach but would like to use a (hopefully highly available) key-value store instead of files, Crypt could be a solution for you. It relies on etcd or Consul for persistence, storing arbitrary data (which might be a single secret or any structured format like JSON) encrypted using OpenPGP's public key crypto. Multiple recipients are supported by specifying a set of public keys at the time the secret is written, and the reader must have one of the corresponding private keys to successfully decrypt.
The implementation is a thin client-side Go glue layer binding together gzip, OpenPGP, and backend access. It may be used as a CLI tool for both reading and writing, or as a library. Management of encryption keys on OpenPGP keyrings, including having to find and re-encrypt all the affected secrets if one of the keys needs to be rotated, is left as an exercise to the user.
Designed by Shopify and one of the simplest solutions available, EJSON is a command-line tool (and library?) to encrypt secrets inside of JSON files (turning them into EJSON files) using public key crypto, probably NaCl. There is only one secret key for a particular EJSON file, and that key is required to decrypt the secrets. A decrypted file is the only supported secret presentation, in plaintext JSON.
In the "introduction" Shopify blog post, they mention that the production usage re-encrypts all secrets with a single infrastructure wide "master key" when building containers; the master key is temporarily given to the container to decrypt the secrets. With secrets baked into containers and a single key, this is a relatively inflexible and not particularly secure usage model. Nevertheless, it is effective at keeping plaintext secrets out of repos while retaining a tight link between secrets and their projects and allowing line-by-line change tracking and "blaming" with Git.
Keywhiz comes from Square and helps them distribute infrastructure secrets to services. It is a Java service with a JSON API, backed by a MySQL or Postgres store. A separate client provides a FUSE presentation for secrets. Authentication is performed using mutual TLS using a client certificate, so some kind of a PKI that can provision certificates to services and humans is necessary to use Keywhiz. There is a CLI tool and a Keywhiz server exposes a web interface for user-friendly secret management.
There are some interesting limitations (eg. "secrets have a globally unique name"). Groups cannot inherit from other groups. Square considers the code to be alpha quality, but the service is used internally. While the system is quite recent, the API endpoint list is already somewhat confusing and has multiple versions of similar-looking functions.
A brand new solution from Pinterest, Knox has plenty of rough edges but gets one important concept right, namely the separation of versions of the same secret into three groups: primary (recommended/most recent), active (still working) and old (no longer working). It is certainly possible to rotate secrets smoothly without implementing such a feature (even in systems that only store one version of the secret, simply by keeping the previous version active until after the new one is deployed), but making this classification explicit helps.
Knox follows a client-server architecture backed by a persistent store. Stored secrets are encrypted but ACLs and all the rest of the metadata are not, so the store must still be trusted. The encryption key is a file on each Knox server. Servers expose the Knox API to a client daemon that runs in the background to cache up-to-date secrets locally and to a CLI management tool. Installation requires modifying the Go code and compiling from source even for configuring a particular database type. Authentication for machines is done with client certificates, and for humans via GitHub credentials or OAuth – switching between those and configuration appears to require code changes as well. Client setup requires multiple steps and is not automated.
Secrets are presented as files (plain ones, not FUSE) in a hardcoded location. It follows that within one machine, there is no separation between different services and they can all read each other's secrets.
Red October is a CloudFlare-designed system to automate a two-person rule for secret storage (two keys belonging to two different users are needed to decrypt a secret). A server implemented in Go and exposing a JSON API implements encryption and decryption workflows, and allows a user to "delegate" their key to the server to perform a time- or use-limited number of decryption operations. Remarkably, the server does not store encrypted secrets, only a catalog of users, delegations and other metadata.
The implementation creates all possible user pairs and encrypts the secret's symmetric key with public keys of both users sequentially. The number of possible permutations is O(n^2) and is therefore suitable only for rather limited user counts for a given secret. There is no significant tooling around the system (all examples in the repo are raw JSON curling).
Trousseau is a Go tool that manages secrets in a single OpenPGP-encrypted file. The creator of the file can specify who can open and modify the store. As far as I can see, the access control is global. It seems to be more suitable for personal secret storage or a small project than an enterprise rollout. Support for several storage backends (S3, scp, Gist) is built in.
Project page: https://github.com/oleiade/trousseau
Vault is perhaps the most commonly heard name in secret storage for infrastructure these days. Developed by Hashicorp, it is not a surprise that Vault suggests other Hashicorp infrastructure (for example, Consul is the only high availability backend supported by Hashicorp). Secrets are arranged in a tree with ACLs limiting access and allowed operations. A "lease" concept is used to recommend clients to refresh secrets once in a while, and in this manner implement a poll-based rollover functionality. When used with "generic" secrets, the concept of a lease is advisory since secrets can be revoked at any time regardless of existing leases, and continue to remain valid indefinitely unless changed by some external agent. For authentication, Vault offers a large variety of backends including GitHub and certificates.
For some specific services, Vault offers password generators where a new secret can be created for each use request and revoked once its lifetime ("lease") expires. However, the list of the services is fixed and there is no current extensibility for adding new generators. First party tooling for Vault includes a CLI client, Ruby and Go libraries, and presentation utilities such as envconsul and consul-template. There is also a growing number of third party tools and libraries. Notably, no first party user interface besides a CLI is available. Having Consul as the only built-in HA backend is problematic in multi-datacenter environments, since by default Consul is consistent only within a single datacenter. Consul replication has been attempted successfully to solve this, but is not officially supported. No first-party tooling to reconcile multiple Vault clusters exists or is planned. Support for key versioning is not currently planned due to backend limitations.
Other Comparisons, Reviews, and Suggestions
- https://coderanger.net/chef-secrets/ (August 2014)
- https://github.com/pinterest/knox/wiki/Similar-Solutions (September 2016)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Some corrections & additions for HashiCorp Vault section: