Infrastructure-as-Code is a principal that drives modern DevOps practice. I discuss the current state of Terraform and provide some basic guidelines/principles regarding how to structure it's usage for your project.
- Rationale
- Background
- Terraform Documentation Gotcha's
- Repository Structure
- Basics
- DevOps Workflow
- Miscellaneous
- Conclusion
- TODO
Having used Terraform in the past to deploy a single-tenanted Kubernetes cluster to GCP, I was curious to see how much Terraform had evolved and how well it supported AWS. My deep dive revealed that to use the cutting edge required significant bleeding. To save some grief/frustration for others, I've written some guidelines which I believe will answer the following questions:
- Where do I start with a new project?
- How do I structure a Terraform repository?
- How do I accomodate different deployment environments?
To make sense of these guidelines, you should be aware of the following:
- Infrastructure as Code implies that programmatic structure should be applied to the organization of the code, not simply that configuration files should be version controlled.
- Programmatic structure implies the DRY (Don't Repeat Yourself) principle.
- Programmatic structure implies the Single Responsibility principle.
- Programmatic structure implies that code will be literate and self-documenting.
- Programming idioms are used to structure the understanding of how to best leverage Terraform and it's capabilities.
- We adhere to the KISS Principle and avoid adding unnecessary complexity.
Caveats:
- This is a 80/20 solution, meaning that not all use cases will be covered.
- This solution assumes a single cloud provider.
- Assumes that all configuration files will be version controlled and that version control features will be used (such as feature branches for changes).
- Evolving Your Infrastructure with Terraform: OpenCredo's 5 Common Terraform Patterns
- The problem with the proposed final solution is that multiple state files for various components of your infrastructure undermine the underlying premise of Terraform - that there is a single source of truth regarding the desired state of your infrastructure as code.
- In the "Orchestrating Terraform" section, she mention's that there are module ordering dependencies that multi-state Terraform configurations suffer from. Why not use
depends_on
within a single mono-repo?
- Keep your Terraform code DRY
- Advocates for the use of Terragrunt, but this seems like another layer of abstraction/complexity which violates KISS.
- Also requires duplicate config per deployment environment which seems to defeat the purpose of DRY.
- The Role-Profiles Pattern Across Infrastructure as Code
- interesting perspective with regards to desired structure of infrastructure code, but again relies on Terragrunt.
- Advantages and Pitfalls of your Infra-as-Code Repo Strategy - Another article with some interesting perspectives, but again recommends structuring a multi-tenant repo with directories instead of Workspaces. Workspaces maintain the state of a specific deployment while providing access control on who can make the changes. Relegating this to a git repo's access control seems like a step backwards to me.
These perspectives of advanced Terraform users suggest that STRUCTURE is a fundamental complexity/issue introduced by Terraform. Can these be addressed using JUST Terraform features and providing a "Convention over Configuration" solution?
Official Terraform Documentation
There's a wealth of documentation on Terraform provided by Hashicorp, and most users would assume (like myself) that this is good place to start. But here's the problem - the shear amount of documentation does very little to help determine WHERE to get started.
Terraform's documentation is like learning to communicate in a foreign language using only a dicitionary as a guide.
While the specifics are well documented, it's difficult to understand the CONTEXT of how to apply that information without significant experimentation. In other words, the tool ergonomics SUCK. Here's a list of missing features that would go a long way to improving the Terraform experience
- scaffolding which suggests an organizational pattern for structuring the project
- clear and actionable error messages
- up-to-date documentation which uses working examples and references/leverages current features (for example, Modules and the Terraform Registry).
- intellisense/autocomplete on defined output variables in the IntelliJ plugin.
This list is far from exhaustive, but it gives you a sense of the number of sharp edges associated with Terraform adoption/usage and the barrier to entry that many face.
TL;DR - here's the recommended Terraform organizational structure. What follows is a discussion of basic Terraform concepts and the justifictions as to WHY this structure makes the most sense.
Throughout, I make a number of [Recommendation]s as well as point out some [ANTI-PATTERN]s, [WARNING]s, and [BUG]s.
.
README.md
main.tf
outputs.tf
variables.tf
+-- env
+-- dev.tfvars
+-- staging.tfvars
+-- prod.tfvars
+-- aws
+-- iam
+-- README.md
+-- main.tf
+-- output.tf
+-- variables.tf
+-- lambda
+-- README.md
+-- main.tf
+-- output.tf
+-- variables.tf
+-- ...
Official Configuration Documentation
Terraform configuration consists of files (which end with the .tf
extension) which contain directives. These directives are used to accomplish the following things:
provider
: specify the infrastructure provider you want to deploy againstresource
: specify the desired end state of the resources you want to configurevariable
: pass around configuration variables between various Terraform componentsoutput
: provide output regarding configured resourcesmodule
: a collection of terraform configuration fileslocals
: variables defined specifically for amodule
scope
The above directives are not exhaustive, but the ones mentioned will be the most commonly used to set up some basic configuration.
Assigning values to root module variables
Terraform provides a number of mechanisms to provide input into the executing terraform
operation. These inputs can be provided through the following methods (listed in order of precedence):
export TF_VAR_name=value && terraform _operation_
: provide an shell environment variable to specify the variableterraform -var='name=value' _operation_
: specify a specific variable as an argument to the commandterraform -var-file="./path/to/file.tfvars" _operation_
: store specific variable name/value assignments in a.tfvars
file
Terraform is particularly useful for having a two-stage deployment (terraform plan && terraform apply
). This ensures that only valid configurations can be deployed. Terraform also provides a terraform validate
command to ensure that the syntax within your module is correct (but I find this less useful that running terraform plan
directly).
The
module
is the fundamental building block of Terraform (not, the.tf
files themselves). Understanding this is the key to being able to structure your configuration repo.
A few key points regarding the Module:
- EVERY directory that contains
.tf
files is considered a Module. This includes the root directory where Terraform configuration is first specified. - Modules can be nested. (meaning subdirectories off of the root directory are considered child modules.
- Modules are structured in a parent-child hierarchy.
- Modules do NOT provide inheritance visibility to specifications between parent/child relationships which a few notable exceptions (i.e.
provider
). Information that needs to be passed between parent/child/sibling Modules needs to be specified explicitly. - Modules can consist of more than a single
.tf
file. - Modules can be user-defined or reference external/public modules such as those available in the Terraform Registry.
If you think of the Module as analogous to a programming language method or function, then an obvious usage structure emerges regarding the organization of Terraform assets.
Programming Concept | Terraform Equivalent |
---|---|
function/method | Module |
parameters | Input Variables |
return value | Output Variables |
local variables | locals specification |
method/function calls. | module specification |
implementation code. | resource specification |
The root directory is the function/entry point for terraform
operations.
Using this analogy, a logical Module structure becomes apparent.
[Recommendation] Place all Module inputs in
variables.tf
This file explicitly declares all required variables for the Module.
[Recommendation] Specify all Module outputs in
outputs.tf
This file explicitly declares all values returned by the module associated with newly provisioned assets.
[Recommendation] Specify all implementation details in
main.tf
This file contains a Module's implementation specifics which may include the following:
provider
specification (usually defined at the root of a Terraform project).module
specifications (which specify where feature specific implementation details are found/configured).resource
specifications (which are feature specific implementation configuration).locals
specifications (local variables used to remove boiler-plate specification variables).
[ANTI-PATTERN]: The monolithic
.tf
file A lot of starting tutorials begin with a single monolithic.tf
file. While initially expedient, this lack of organization means Terraform assets with different responsibilities/dependencies remain undifferentiated for the end user. Violating the single-responsibility principle leads to unmaintainable code and future technical debt.
[ANTI-PATTERN]: The use of unconventional
.tf
file names If you look at the Terraform documentation and official Modules published on the Terraform Registry, an organizational convention is used. Each module contains the following three.tf
files:
main.tf
variables.tf
outputs.tf
While it's possible to make a service specific configuration file (i.e.
api_service.tf
), this is NOT RECOMMENDED.Why? By explicitly documenting a Module through the use of these files, you communicate with the end user the intended usage of the Module. Inputs are clearly defined within
variables.tf
as well as return values in theoutputs.tf
. All implementation specific details are separated into themain.tf
. This is more readily consumable by end users.
A commonly encountered software development pattern requires different deployment environments for developers, staging, and production. The differences between environments can be reflected in things such as:
- the size of the compute asset
- the location of the asset
- access privileges to the asset or any generated artifacts
- the provider profile being used to deploy the infrastructure
- etc.
We make the assumption here that each environment has it's own provider profile configured for the deployment specific functionality in order to follow the principle of Least Privileged Access (i.e. we don't want a single provider profile to manage each of these separate deployment environments to avoid change conflicts). Having separate provisioner profiles also means that developers/QA CANNOT accidentally overwrite the production deployments with newer/unverified changes which may break the application. As per DevOps best practices, production deployments should be automated through the use of profiles which have limited membership/access.
These types of configuration vary in the asset specifics, but not necessarily in the STRUCTURE of the application being deployed. The variants between deployment environments are best encoded in separated .tfvars
files. In particular, we recommend the following organization:
dev.tfvars
staging.tfvars
prod.tfvars
[Recommendation] Use deployment specific resource prefixes to differentiate provisioned assets Provisioned assets should be specific to a deployment type (i.e. the same asset should NOT be used against different
deployment stages). To facilitate this separation, deployment specific prefixes should be used for asset creation. This ensures a clear separation of access/responsibilities between created assets.There are a number of features Terraform provides to support this separation:
The Workspace feature (which we will discuss in detail later). Essentially, different workspaces can be used to preserved the configured state of a deployment. With separate Workspaces, you can avoid conflict between different deployment environments (i.e. changes applied to dev will not automatically be applied to production).
Terraform provides the
local.workspace
variable to reference the current Workspace within configuration files. If you use the Workspace feature, there is no need to create a user definedworkspace
variable.
It's tempting to begin using Terraform by slapping together some quick resource
directives, but this is the path to despair. You'll quickly realize that maintaining configuration this way easily leads to duplication and spaghetti.
What's the alternative? Although the official documentation only mentions Terraform Registry in passing, you should strive to use existing modules to configure your infrastructure. In addition to being officially supported and tested, leveraging existing Modules leaves you free to focus on your infrastructure and not the implementation details of standard provider features/capabilities.
While ideally, this is the way ALL users should start, there is no clear guidance on how Terraform Registry Modules should be incorporated into your project. Even worse, the Terraform Registry Module documentation is not entirely up-to-date which makes using these modules a frustrating experience at best.
The following guidance is a combination of project organization and usage which will allow you to leverage the Terraform Registry as a starting/first-class resource. We use the following conventions for our project:
-
We use a provider subdirectory to contain provider-specific implementations organized as a Module. This makes a clear distinction between different provider implementations.
-
For each provider-specific implementation, we follow the above Module structure with regards to organization.
-
We separate out provider-specific features by directory. This means, IAM configuration is separate from asset configuration (for example Lambda configuration).
-
The goal of this structure is to provide a single-source-of-truth per provider feature, adhere to the Single Responsibility principle, and avoid duplication. The consequence of this is that features which require multiple configuration (i.e. account/permission configuration as well as asset configuration) will have it's configuration spread across multiple modules. This is intentional; where configuration is dependent on other provider functionality, dependencies can be easily specified between the modules (for example ordering of configuration). This is much more difficult to accomplish if the asset creation mixes a bunch of responsibilities together.
-
Each user-defined Module in this structure is meant to be reusable (as per the DRY principle). In other words, required parameters should be passed in as Input variables instead of hard-coded within a Module. These variables MUST be passed into individual
module
blocks and defined within the provider-specific Module'svariables.tf
file. Each Module should be able to be invoked multiple times with configuration specific parameters passed in through Input variables. -
Leverage built-in Terraform variables and reduce duplicate interpolation through the use of
locals
. -
Document EACH module with a
README.md
to communicate configuration context to future users that cannot be captured in the configuration files themelves (i.e. reasons why a particular work-around was used). Optionally, add a link to the official documentation. -
Use the Terraform Registry
module
provisioning specification. Each Module should specifyProvisioning Instructions
. For example, here's the AWS IAM Module. Use indivudalresource
directives as a last resort. -
Run
terraform init
to download the Terraform Registry provided Modules. These are stored in the$TERRAFORM_ROOT/.terraform
directory. The underlying structure is a reflection of the organization of your project'smodule
structure. The importance of the.terraform
directory CANNOT be UNDERESTIMATED. For each Terraform Registry Module, the complete implementation as well as EXAMPLES are provided. Use the provided examples to understand how to configure the Module.Example
.terraform
directory
[Recommendation] The root
main.tf
should havemodule
references to user-defined implementation details for specific provider featuresThe root Module at the top of the Terraform hierarchy, should only contain global configuration and NOT provider specific implementation details. These details should be defined in another module which is referenced from the
main.tf
file. For example, here's how I have mymain.tf
structured for my project (where AWS is the provider I'm using). Replace therequired_providers
with the corresponding provider for your project.${TERRAFORM_ROOT}/main.tf
terraform { // You _CANNOT_ perform variable interpolation within this terraform block! required_providers { ... aws = { source = "hashicorp/aws" version = ">= 3.19.0" } } ... } provider "aws" { // These variables are passed in at run time with the use of a .tfvars file region = var.region profile = var.aws_profile ... } locals { // An example of a local variable that is uses a built-in terraform value and reused throughout the main.tf local_variable = "${terraform.variable_name}" ... } module "aws_iam_user" { source = "./aws/iam" // Input variables provided to the module. These NEED TO BE DEFINED in the module's variables.tf iam_user_var1 = "${local_variable}-iam" iam_user_var2 = "some_value" iam_username = "test" ... } module "aws_lambda" { source = "./aws/lambda" // Input variables provided to the module. These NEED TO BE DEFINED in the module's variables.tf lambda_var1 = "${local_variable}-lambda" lambda_var2 = "another_value" ... } ...
[Recommendation] Document
module
required/optional/default variables for 3rd party Modules For example, here's my AWS IAM user module specification:${TERRAFORM_ROOT}/aws/iam/main.tf
module "iam_iam-user" { source = "terraform-aws-modules/iam/aws//modules/iam-user" version = "3.6.0" # REQUIRED Inputs name = var.iam_username # OPTIONAL Inputs force_destroy = true // default: false password_length = 32 // default: 20 password_reset_required = false // default: true pgp_key = var.keybase_user // default: "" # UNUSED Default Inputs // create_iam_access_key = true // create_iam_user_login_profile = true // create_user = true // path = "/" // permissions_boundary = "" // ssh_key_encoding = "SSH" // ssh_public_key = "" // tags = {} // upload_iam_user_ssh_key = false }
[Recommendation] Specify input
variable
s for your user-defined Module invariables.tf
. You should have avariables.tf
file for each Module even if you don't specify any variables. It can be parsed by endusers to quickly determine what Input variables are REQUIRED to properly configure the Module. For example:$TERRAFORM_ROOT}/aws/iam/variables.tf
# These variables _SHOULD_ be provided in the parent's module block variable "iam_username" {} variable "variable2" {} variable "variable3" {} ...
[Recommendation] Define all Module output in the
outputs.tf
. For example:${TERRAFORM_ROOT}/aws/iam/outputs.tf
output "access_key" { value = module.iam_iam-user.this_iam_access_key_id } output "secret" { value = module.iam_iam-user.this_iam_access_key_encrypted_secret } output "username" { value = module.iam_iam-user.this_iam_user_name } output "password" { value = module.iam_iam-user.this_iam_user_login_profile_encrypted_password } output "lambda_role_arn" { value = aws_iam_role.lambda_role.arn } ...
NOTE: Use the
module
references to refer to a Module's defined Output variables.
[WARNING]: Variable interpolation CANNOT be used in a number of places. You CANNOT use variable interpolation/expansion in the following circumstances:
- In the
terraform
block- In the
source
parameter of amodule
block- In the definition of another
variable
. In other words, nested variable interpolation is NOT supported.
[WARNING]: The Terraform Registry does NOT correctly document the required variables NOR the correct usage. Use the
examples
provided in the.terraform
directory whenterraform init
is run.
When Terraform executes a plan
, the state for your infrastructure is stored locally in a .tfstate
file. If you have multiple deployment environments, this can be problematic as a single state file can be overwritten depending on the environment you are trying to deploy. To avoid this situation, Terraform provides the Workspace feature. Each Workspace maintains a different .tfstate
file. As a team grows, sharing this state information becomes a high priority.
An additional challenge is that state files contain sensitive information (such as secrets) which may be required for deployments.
Any solution requires both separation of deployment states as well as the ability to share/update state with access control privileges. Tracing of changes to infrastructure can help remediate breaking changes to specific changes introduced by individual devs.
Enter Terraform Cloud.
Terraform Cloud is a remote backend state storage service. If either no remote backend
or Workspace is specified, the following will occur:
- Terraform will use the
default
Workspace. - Terraform will use a local state backend and store
.tfstate
in the${TERRAFORM_ROOT}
directory on your machine.
To configure Terraform Cloud as a remote backend with multiple Workspaces, you need to:
- Create a Terraform Cloud account.
- Create a Workspace per deployment environment by specifying a Workspace
prefix
as a convention. For example, you could provision separate workspaces for dev, stage, and prod using the following Workspace naming convention (whereorg-
is the organizational prefix):org-dev
org-stage
org-prod
- Configure the
terraform
block in your${TERRAFORM_ROOT}/main.tf
with a remote backend:backend "remote" { hostname = "app.terraform.io" organization = "org" workspaces { prefix = "org-" } }
- Run
terraform init
to pick up the changes. You should see something similar in the output:Initializing the backend... Successfully configured the backend "remote"! Terraform will automatically use this backend unless the backend configuration changes. The currently selected workspace (default) does not exist. This is expected behavior when the selected workspace did not have an existing non-empty state. Please enter a number to select a workspace: 1. dev 2. prod 3. stage Enter a value:
[BUG]: Disable Remote planning per configured workspace. There is one last CAVEAT to get all of this working.
By default, Terraform Cloud uses the
Remote
Execution Mode when attempting to runterraform plan
. Unfortunately, this does not seem to work with the AWS credentials. While the use of AWS local credentials works withLocal
planning, for some reason, it FAILS when the defaultRemote
Execution Mode is configured. This behavior occurs even if theshared_credentials_file
parameter is set in your root Module'sprovider
block or if you try setting the ENV_VAR in the Terraform Cloud UI for the Workspace.The work-around is to go to each Workspace's >
Settings -> General Settings
and change the Execution Mode toLocal
.Terraform Cloud Workspace Settings:
Execution Mode Default Setting:
The Workstate's state file will be stored on > Terraform Cloud, but the
terraform plan
will run locally.Here's an example of the output when
Remote
execution is set for the Terraform Cloud Workspace, but fails:% terraform plan Running plan in the remote backend. Output will stream here. Pressing Ctrl-C will stop streaming the logs, but will not stop the plan running remotely. Preparing the remote plan... To view this run in a browser, visit: https://app.terraform.io/app/scrb/scrb/runs/run-ynhciGC5Dp5CKHgy Waiting for the plan to start... Terraform v0.14.4 Configuring remote state backend... Initializing Terraform configuration... Error: No valid credential sources found for AWS Provider. Please see https://terraform.io/docs/providers/aws/index.html for more information on providing credentials for the AWS Provider on example.tf line 17, in provider "aws": 17: provider "aws" {
[BUG]: Misleading error message on misconfiguration. If you forget to specify or use the
workspace.prefix
directive in your configuration,terraform
operations will fail opaquely with the following message:% terraform workspace list workspaces not supported
Ideally, a more informative error message would point you at the issue :(
Now that you've set up your project repository, Workspaces, environment variables, how do you incorporate this into your DevOps pipeline? The following guidance assumes an AWS provider, but should apply to any supported cloud provider.
Here are a few suggestions:
- There should be 3 branches in your Terraform project's git repo which correspond to each of the different deployments/Workspaces -
dev
,stage
, andprod
. Having different branches isolates changes from propagating automatically to QA/production environments. [Optional - You can use themain
(historically calledmaster
branch of your git repo asprod
]. In any case, pushing Terraform changes tostage
orprod
should be limited to automated/audited processes only. This allows changes to be gated on assurance tests before becoming live on production. - Developers making changes to the infrastructure should ONLY use feature branches off of the
dev
branch. When the Terraform change is ready, it is merged into thedev
branch. This is the only branch developer can commit changes to directly and should be enforced in the Terraform git repo. - When changes are made to
dev
, they are pulled intostage
branch for testing. Ideally, there are automated tests which verify that there are no breaking changes and that the deployed application passes the required assurance tests/processes before being promoted to production. - Finally, when changes to
stage
have passed, the Terraform changes are merged intoprod
. This step should be fully automated using a CI/CD system of your choice.
Note that:
- For each stage, a set of corresponding
.tfvars
is used. For example, to perform a developer deployment, thedev.tfvars
are used as an argument to the Terraform operation (i.e.terraform plan -var-file=./env/dev.tfvars
). - It's also recommended that different IAM accounts are associated with each stage. Following AWS IAM guidelines, there should be a IAM group for developers which give them permissions for creating/deploying development ONLY infrastructure. Each developer should belong to the developer IAM group. For users who are responsible for
stage
environments, another group can be created with the requiste permissions, or alternatively an assumable role. For production deploy, the CI system should be configured with a limited set of credentials to pull the change in the Terraform repository and deploy on production infrastructure. - Each deployment environment will have it's own resources (meaning that all assets created within the deployment environment will be UNIQUE to it). This ensures that environments deployed by users will not accidentally use privileges associated with these environments to update the wrong environment.
- State files should be stored remotely using Terraform Cloud. This ensures that there is an audited trail of state changes which can potentially be used to restore previous state if a rollback is required.
- There's a cost to maintaining different deployment environments due to the duplication of resources. The benefit is the clear separation of deployment assets and the privileges required to access/deploy them.
Why is there a Keybase.io dependency for the AWS Lambda Module?
If you look closely at the Terraform Registry AWS Lambda Module, you'll note that configuration requires a Keybase.io user. The purpose of this dependency is to be able to use a public PGP key for the purpose of encrypting credentials for the Lambda service. Unfortunately, there doesn't seem to be an alternative (such as having a private/public PGP key to point the configuration towards or other public keyservers which might already contain a public PGP key). To be able to fulfill this dependency requires the following steps:
- Create a PGP key pair.
- Create a Keybase.io account.
- Upload the public key to your Keybase.io account. This will require verification using the private key generated.
- Once complete, you should be able to refer to the keybase PGP key using the
keybase:username
directive. This will resolve a public Keybase URL (https://keybase.io/username/pgp_keys.asc) where the public PGP can be downloaded.
Hopefully, this give you a roadmap to get started on your Terraform journey. There should be enough to avoid the common Terraform pitfalls and provide you a scalable/extensible architecture for your Terraform project.
Comments/feedback are welcome!
- Multi-provider single monolith Terraform repos? Is the ultimate redundancy/resiliency solution to deploy your cloud infrastructure to multiple providers? If so, who is doing this?
Saw your post on HN. Some interesting thoughts and aspects here for sure. This contains a lot of helpful additions to the official documentation! Have you checked out terraform-best-practices? I found that helpful as well.
PS: The reason why remote execution does not pick up your credentials is that the credentials are stored locally on your machine. The remote execution environments cannot read your local credentials files or environmental variables. Instead of using an AWS shared credentials file you need to configure your environment variables in Terraform Cloud per the get started guide. With this set, you also have to remove the
shared_credentials_file
from your provider configuration. (If I'm not misreading you remote plan is what your looking for? This would also correspond to three different TFC workspaces though, as the workspace concept between CLI and Cloud differs a bit)