timfallmk/Vault 0: Components and Design.md

## Vault 0: Components and Design.md

      
    Raw
  

              Vault 0: Components and Design.md
            
          
    Vault 0: Components and Design
Vault From the Beginning
Most of the software we deal with day to day fits into a few basic categories: user applications, service applications, assistive tools, etc. When we download something new and open it up, we usually know which general category it fits into and where it should fit in our workflow. We might expect the same from Vault, but there is a key difference. Vault is a security application, designed specifically to provide a number of features for use in a security workflow. Oftentimes, the most difficult part of learning to use Vault is that it doesn’t fit into any of the traditional categories that we are used to interacting with, and therefore lacks a common set of concepts that we might already be familiar with. We will attempt to provide this context and all the background necessary for using Vault.
In this series of posts we will cover Vault in its entirety, from nose to tail, and walk through each step of understanding and effectively using Vault. These guides are designed to provide a fully detailed, and end-to-end, set of information on how Vault works and how best to use it. If you’re impatient, or already familiar with Vault, you can jump straight over to the "getting started" [link] section.
"‘Security’ You Say? I’m listening…"
In order to get a good idea of how to use Vault, we should first rewind a few steps. It’s important to understand how Vault is built before putting it to use. In this guide we’ll start at the beginning and explore the overall design and the components that make up Vault. This will include the theory of how each piece contributes to the application and how it interacts with the others.
Servers and Clients
At its most basic, Vault is designed to operate like a number of other services you might be used to. It has two components, a server for processing and performing actions, and a client for making requests to that server. As is the case with other HashiCorp [link] tools (as well as a lot of others), both the server and client are compiled into a single static binary that can be run anywhere. Typically a (human) user will make use of the binary to interact with Vault servers via the cli, and the same binary will be deployed and run in server mode in multiple places to form a cluster (we’ll talk about cluster design and deployment at a later stage)[link].
It is important to note that all requests to Vault servers are done via the REST [link] API. The importance of this will become clear later on, but both programmatic requests (from other services) and requests from the CLI are routed to the same API, meaning all functionality of Vault is available to anything that interfaces with it.
The Guts: Backends for Miles and Miles
The Vault server is comprised of a number of components, each one responsible for a specific set of functions. For many of these pieces, there are different "backends" that provide support for different interactions between Vault and supporting systems (e.g. storing tokens in memory vs. physical storage or a database). You can think of these backends as somewhat analagous to different drivers, allowing different types of systems to provide a certain function for Vault. We’ll go over the different backends available when looking at each component.
Core Keeps on Spinning
The core of Vault’s functionality is, ironically, inside a compnent called "Core". The core includes most of the logic that directs Vaults functions as well as the key steps in the crytograpic movement process. Core does not provide any of the actual cryptographic functions nor does it do any of the direct interaction with system components outside Vault. Instead core moves requests to the apporpriate componenets for processing, based on what that function requires. You can think of core as analgous to a traffic cop, directing the flow of traffic around the inside of Vault and making sure policies and routes are enforced.
Path Routing For All the Things
Vault is designed, from the ground up, to interact with most things in the form of "paths". If you’re familiar with paths in a filesystem, its roughly the same concept. For example, you might find a file at:
/Users/Home/awesomesauce/Documents/secretplans.pdf
which represents a unique place in the filesystem. In a similar way, you might store a "secret" (more on what this is later)[link] at the path:
/secrets/awesomesauce/thingsnottoshowpeople/worlddomination
in Vault. Most of the components in Vault use paths to designate where their information lives and who can access it, including both secrets and backend mounts (more on this later)[link].
All of these paths, and how they are handled by each component, are determined by the Path Routing engine. Since pathing is an integral part of a number of components, it is handled for everything within the Path Routing system. This allows for a single place where ACL’s, permissions, and canonical mount points can be managed. We’ll explore how this works with setting access a little bit later [link].
To the Backends!
Earlier on, we came across the concept of "backends" and now we can explore how they interact with the other parts of Vault. There are a number of different types of backends available and the difference between each, as well as how they’re used, can be a little bit confusing at first. We’ll go over each one in detail.
System Backend
Probably the easiest to understand is the "System Backend". Vault needs to be able to run on, and interact with low level system components. Obviously a single function can be different depending on the platform so Vault must be able to handle all of the different platforms it’s designed to run on (for a list of supported platforms, see the download page [link]). The differences between platform (say Linux x86_64 and FreeBSD ARM) are abstracted out into the system backend. This keeps the function within Vault the same between all distributions, and only requires that commands specific to the system be available within its specific system backend. Additionally, this allows for easy porting of Vault to new platforms, only requiring that a new system backend is incorporated.
Secret Backend
One of the most important functions of Vault is, of course, storing secrets. Secrets might be anything from a password to a private key or credit card data. The rule of thumb for determining what is a secret and what is not is "do I care if it gets out?". If the answer is “yes”, then that’s a secret.
Of course we have to store these secrets somewhere, and where they’re stored and organized is the job of the secret backend. There are a number of different types of secret backends (for a full list you can hop over to the docs [link]) and they may function differently so let's take a few examples.
Generic - The generic backend is available by default when Vault is started and is always mounted at /secrets. It provides a very simple storage mechanism for direct access to storing secrets accessible with an authorized token (more on tokens later on [link]). Any secrets stored in the generic backend at a certain path can be accessed with a token that has to appropriate policy.
Cubbyhole - The cubbyhole backend is a more sophisticated method of storing secrets than generic. Cubbyhole, as the name would suggest, creates a single "cubbyhole" per secret at any given path. When a token is generated to access that secret it is scoped to that secret. This means that one authorized token cannot access another authorized token’s data. Additionally, cubbyholes are created when the secret is stored and destroyed after the secret is removed (or the lifetime expires).
**PostgreSQL - **The PostgreSQL secret backend presents a unique way of generating dynamic access tokens to a postgreSQL database. Access is based on roles (e.g. "readonly", “list”, etc.) and tokens are generated with the appropriate scope for each API call to the database. This means that every single time a service must access the database it does so with a unique token. This makes it very easy to scope and track individual access, as well as to restrict access to a specific resource. An additional benefit is that hardcoding of credentials is no longer necessary.
These are just a few of the secret backends available. Each provides a unique set of advantages based on your particular workflow. Since secret backends are also (mostly) mounted at specific points, you may chose to use multiple secret backends in Vault at the same time, or to use multiple instances of a particular backend at different mount points. For the full list of available backends and more detail on each, see the documentation [link].
Credential Backend: "Please Sir May I Have Some More?"
Here’s where things might get a bit confusing. The credential backend provides authentication and tokens. It does not store secrets, and is not used for accessing secrets, it provides the method of verifying and assigning a user before secrets can be accessed.
The workflow for most credential backends goes something like this:


Service makes request to authorize


Vault request identification from the credential backend (this varies based on backend)


If the request succeeds, Vault authorizes that user


Depending on backend, Vault may issue a token to the requesting service


Of course the precise mechanism for authorization, as well as what is returned for authorized requests, depends on the backend used. Let’s look at a few examples.
**Username & Password - **One of the simpler authentication backends is username and password. As the name would suggest, a service requesting authorization (usually a human in this case) provides a username and password combination. Once this user is identified, Vault generates and returns a specific token with that user’s appropriate permissions. Users and passwords must be created in this backend before they can be used.
**GitHub - **Using the GitHub OAuth mechanism, this backend allows users to be identified via a personal access token generated by GitHub. Each user can use their own token to identify themselves and authorize with Vault. Users can be mapped to appropriate policies and given permissions based on their access role.
**AppID - **The AppID backend allows services (usually in this case services and other apps) to identify and authorize themselves with Vault without having to hardcode or store a password or other sensitive information. An AppID is constructed from two pieces of information, an app ID (from the application itself) and a user id (from the machine). While this is not foolproof, it is generally hard to determine both values from outside of the application. Requests to Vault include both pieces of the AppID. The app ID is coupled with a set of user ID’s on which it is allowed. If the submitted app ID and user ID are allowed then the user is authorized and identified.
These are just a few examples of the available authorization backends. Each has a unique set of capabilities and should be chosen based on your workflow. Like the secrets backend, multiple credential backends can be mounted at the same time as well as multiple instances of the same backend. For more information on the available credential backends and details for each, see the documentation [link].
Audit Backends: It was Col. Mustard in the Study with the Candlestick
If something does happen to your infrastructure and you need to trace down what is going on (or you just need to monitor things to prevent that from happening), an audit log is a critical part of the process. As we saw before all requests to Vault go through the REST API, whether from the cli or another application. This means that every interaction with Vault is logged, including errors, to the audit log. This is quite helpful when tracing the exact source of every request sent to Vault.
It is important to note a few things about the audit logs. Since they contain all of the interactions with Vault, they also contain the request and response data (including secrets, tokens, passwords, etc.). To prevent these values from being saved in plaintext in your audit logs, all request and response data is hashed and salted with HMAC SHA256. It is possible to decode these value and see the raw data by using the sys/audit-hash API function.
Finally, if audit logging is enabled for Vault at least one audit backend must be available. If no backend can be currently written to Vault will block any incoming requests until one is available. This ensures that no log data is lost because an audit backend was not available.
There are two currently supported audit backends in Vault:
**File - **This is a simple audit backend that appends log information to a file. It does not do any management like rotation.
**Syslog - **This audit backend sends information to the syslog. It currently only sends log information to the local agent, and cannot be configured with custom destination. It is only supported on systems with syslog available, and should not be enabled if no such systems are being used.
Both audit backends use the same data structure. Each log entry is output as a JSON object with a type and the payload data. Currently only two types are supported, request and response. The payload body is hashed and salted by default.
Like the secret and credential backends, multiple audit backends and multiple instances of an audit backend can be used. This is especially helpful to prevent all backends from being unavailable and blocking Vault requests. For more information on each audit backend and how to interact with it, see the docs [link].
Storage Backend: The Raw Bits
The final backend we’ll look at is the storage backend. This backend simple provides a method of storing raw data from Vault in a specific location. It is important to note that since the storage backend resides outside of Vault, it is considered untrusted. All data to or from the storage backend must pass through the crytographic barrier (more on this later! [link]) meaning that all data stored in the storage backend is encrypted while at rest. A storage backend is never able to access plaintext secrets while they are at rest or in motion.
There are only three storage backends official maintained by HashiCorp, inmem, file, and consul. There are also a number of community created storage backends available, however HashiCorp does not maintain these and may not be able to provide help should you encounter problems in using them. That being said, the community backends listed in the docs [link] generally have a high level of quality.
Let’s look at the three officially maintained storage backends:
**inmem - **The inmem storage backend stores all secrets in memory at runtime. No secrets are stored to disk. In memory storage is of course volatile, and does not persist. When the Vault process terminates all in memory data is lost. This option is mostly used for development and is the default for running Vault in dev-server mode.
**file - **This storage backend stores all data in a local file in the filesystem. Data is stored in a directory structure, mirroring the path system using in Vault. Since this stores data on the local disk, it is not considered useful for an HA system.
**consul - **Consul[link] is a distributed and HA service discovery mechanism and key value store. The consul storage backend allows storage of secrets in a consul cluster as key value pairs. Since consul is distributed this allows for a robust mechanism of persisting Vault data in an HA system.
It is worth noting again that all storage backends are considered untrusted and data is stored in an encrypted manner regardless of storage backend used. This prevents secrets from existing in a usable manner outside of Vault.
Cryptographic Barrier: Fire Moat with Dragons and Things
We’ve saved perhaps the most important component of Vault for last. As mentioned before, no interaction outside of Vault is allowed to contain unencrypted information. This is achieved by wrapping all of Vault inside a "cryptographic barrier". The barrier is the encryption and decryption mechanism through which all requests from the API and all calls to the storage backend must pass. In essence it is the “vault” in Vault.
Similar to a physical vault, the cryptographic barrier is locked/unlocked in a "sealing" or “unsealing” process. When starting a Vault cluster for the first time, Vault starts in a “sealed” state. In order to be unsealed, the required threshold of unlocking keys must be input, a value that is configured at installation. Without this threshold of keys, the cryptographic barrier will not have access to its master decryption key, and not data can be input or output. In a similar method, Vault can be “sealed” if there a condition where continued use would jeapordize secrets (such as an unauthorized user gaining credentials, or accidental placement of secrets in the open). This is, in effect, a “break glass” procedure that can lock down value by preventing all flow through the cryptographic barrier.
For more information on the cryptographic barrier and the seal/unseal process, see the documentation on "Shamir’s Secret Sharing" [link].
The Whole Enchilada
We’ve now gone through all of Vault’s major components and their relation to each other. You should know have a basic understanding of Vault’s architecture and how it’s core functions are achieved. If you would like additional information or want to get a deeper understanding of an individual piece come check out the docs [link] or the community page [link].
You now have all the basic understanding necessary to begin exploring Vault. Next up, installing and getting started[link].  Stay tuned!