Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save smoy/0e05c5de15b10a5d5defb399697381ad to your computer and use it in GitHub Desktop.
Save smoy/0e05c5de15b10a5d5defb399697381ad to your computer and use it in GitHub Desktop.
Some thoughts regarding why IAMbic when there is Terraform

Preface: I am a developer in the IAMbic repo, so indeed my perspective is biased. Developers have individual preferences. My response is to discuss the design trade/offs why we decided to invest in IAMbic.

“possible with Terraform (albeit an extra step or utilizing something like terragrunt).“

During the design phase, we thought about writing a transpiler to go from YAML to HCL2 to be terraform-compatible. However terraform constraints and our own terraform experience at previous companies have steered us away from that option. Here are a few reasons as to why:

  1. Terraform AWS provider is single AWS account oriented. Native support for multi-account is a high design priority for security practitioners of larger environments in AWS.

  2. Single public cloud account for the entire company is not an acceptable risk because developers want high velocity and security team needs to ensure developers can move fast in dev but not put customer or company data at risk. The ratio of security staff to other developers is like 1 to 50. At a previous startup (series A), it was 1 to 35. At a public company I worked at, it was 1:53. We established multi-account-support as a pillar in our design to leverage AWS accounts as a security boundary.

  3. Terragrunt is an improvement in Terraform. At a public company I worked at, that was how the multi-AWS-account situation was handled. However, these are other challenges. Terragrunt excels at keeping terraform DRY (don’t repeat yourself). The project is limited by the terraform ecosystem, meaning HCL and other terraform constraints. Setting up terragrunt is not particularly well known across the infrastructure team. In an infrastructure org with 50+ developers, we had less than 5 developers with great fluency in HCL and the execution flow knowledge during a terraform plan/apply. Most of our other 600+ micro service developers that want to customize IAM privileges of their service roles or team roles do not want to learn HCL. The bottleneck on the security team of being able to use and expand terragrunt structure on our existing AWS multi-account layout is very real.

we studied the workflow of security team

  1. complete IaC workflow of defining and creating the IAM definition
  2. creating new account should not require someone to extend terragrunt.hcl definition to enroll the newly created account.
  3. ability to deal with drift section (between cloud and IaC)
  4. ease of importing/detecting resources that are not currently defined in IaC

our analysis is

  1. terraform / terragrunt is a by-product of existing infrastructure. It’s difficult to replace but also difficult to extend. Let’s play out the scenario. We build a terraform module that understands the AWS Organizations API and then we subsequently use terragrunt to handle multiple AWS accounts. First question: how do we define an IAM ROLE that should only exist in certain accounts? We will have to do something like
resource "aws_iam_role" "test_role" {
  count = var.test_role_should_exist ? 1 : 0
  name    = "${var.account_name}_test_role"
  # ... there are still more config for aws_iam_role
}

the count logic is ok for making terraform to do what we want, but it’s a high cognitive burden to the micro service developer to say test_role should be created in account_search_team_dev, account_search_team_stage, and account_search_team_prod. it will mean the micro service developer needs to know how to extend terragrunt.hcl to declare the new variable. The microservice developer will usually just wait for the security team to implement the HCL logic for them.

  1. terraform imports single resources at a time. it’s manual. Using projects like terraformer is difficult in practice - Try getting your newly detected resources to play nicely with your existing Terraform configuration

  2. Terraform configuration is extremely variable in organization. Every organization has to customize how they use terraform. Any new tooling that relies on terraform has a high degree of setup cost because of high customizability.

  3. Learning HCL is not particularly fun. That’s. also a reason we have decided at this point not to invent yet-another domain-specific language for IAM because that would be another cost of adopting the system. Nested in the IAMbic yaml structure is just vanilla AWS IAM permissions.

  4. Whatever solution should play nice with existing terraform installation since existing terraform installations are like inertia. (hard to modify or replaced)

“at scale these templates will be more cumbersome to use”

That depends on what kind of action you want to achieve. These are the scenarios we have thought of so far:

  1. Large numbers of AWS accounts in the organization. My colleagues have experience with 1000+ AWS accounts in another large public company. We know that a template structure cannot require folks to individually list out all the accounts the resources (like IAM roles) in the template should apply to. It needs some sort of regex-like matching, and a future benefit would be matching on tags or other artifacts. We extend support like dev_* to represent the template should be applied to account that match dev_ prefix. We also added support for the exclude perspective, such that you can define pii_* to exclude application to any accounts that designated with pii_ prefix. Scaling can go in many directions. I am curious to learn more what scaling challenges you worry about a solution like IAMbic has.

“written in YAML which is another negative mark for me”

I had worked in company with YAML configurations that had 1000+ lines. I can emphasize with the sour taste when one works in such large YAML files. (especially in a world with k8s and the large yaml). The large yaml problem would exist if there are much customization in single template. Just like we don’t want to see 1000+ lines in a single python file, eventually we have to figure out how to do include fragments from a different file. We are not there yet. We really have decide against yet another domain-specific-language at the moment because we have not identified what situation YAML are terrible to express the declarative state of IAM definition. If you have a DSL in mind, I’d like to learn more.

“Also, when using more granular permission sets, it seems far easier and more manageable to have it with your IaC code so you can directly reference resource ids in the policy.”

Non-predictable resource-ids is a big problem when security team reviews IaC change of IAM policies. For example, I have crafted many Data Lake policies that involve what s3 buckets are accessible by data analytics micro services. Micro-service developers will not inform security teams of the various name patterns they are writing their data. Meaning security team needs to investigate every resource declaration when they review an IAM policy pull request. The typical approach is namespace (or prefix in ARN). It’s very frustrating for security teams when micoservice developers keep their IAM IaC code in their micoservice repo because they have to hunt down what suddenly create a new IAM role outside of the company-wide terraform repo. In fact, terraform does not create about resources that is not declared in it state file, is of the reason why we have decide against using terraform because security team wants to know rouge IAM changes out of band.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment