Skip to content

Instantly share code, notes, and snippets.

@apparentlymart
Created July 19, 2021 17:20
Show Gist options
  • Save apparentlymart/1c4fd9313b6ab78b21a63ad80653ec4b to your computer and use it in GitHub Desktop.
Save apparentlymart/1c4fd9313b6ab78b21a63ad80653ec4b to your computer and use it in GitHub Desktop.
Terraform Ephemeral Resources, early draft

Ephemeral Resources

Background

Terraform currently has two resource "modes":

  • Managed resources are those where Terraform considers itself to "own" a corresponding external object indefinitely. Terraform is responsible for its full lifecycle, including creating it, applying updates to it, and possibly eventually destroying it.

    "Managed" is the primary resource mode, and so it's given the honor of being declared with blocks simply named resource. Managed resource objects are persisted between runs using information saved in the state.

  • Data resources represent some existing object that Terraform will read the data from without taking ownership of it. Data resources are declared using data blocks, and their only lifecycle action is "read".

    Data resources are retained in the state so that Terraform can determine when they are changed, but their data is always re-read on each operation. Until Terraform 0.13, Terraform squandered the ability to do something useful with its ability to check for data resource changes, but in Terraform 0.13 we finally started showing changes since the last operation in the rendered diff to help users understand how data resource changes impacted the configurations of managed resources.

What both of these resource modes have in common is that their results are saved in a state snapshot after each terraform apply. This allows Terraform to compare new configuration/state with prior state.

Due to some implementation details of how data resources were initially implemented, they ended up acquiring another unintended use-case: generating temporary values, such as credentials, whose scope is intended only to include a single Terraform operation. Because Terraform prior to 0.13 was not explicitly showing changes to data resources unless their reads were deferred until the apply step, this other use-case appeared to mostly work aside from there being no opportunity to explicitly delete or otherwise release a temporary object created by a data resource.

A specific example is the Vault provider: it was initially using managed resources to represent the creation of a leases for secrets, which was a poor fit because those values would then not be available until the apply step and so Vault-issued credentials could not be used to configure providers.

Data resources gave an opportunity to acquire a lease during Terraform's refresh phase, but at the expense of violating one of the key assumptions about data resources: that reading them does not have externally-visible side effects. Each time a vault_generic_secret data resource is read, a new lease object is created inside the target Vault cluster which is cleaned up only because leases have an expiration process managed by the Vault server. To mitigate this, the Vault provider itself intentionally issues itself a short-lived child token to limit the effective lifetime of its leases. However, because of how data resources participate in the plan/apply cycle, this in turn means that the apply must happen very shortly after the plan was created in order to still be working with a valid lease.

This issuing of credentials is one example of a use-case involving an ephemeral object that ought to exist only for the duration of a single Terraform operation, and should ideally be closed or deleted automatically at the end of that operation. Another similar use-case that has been discussed is temporary networking tunnels, allowing Terraform to temporarily gain connectivity to a network it would not naturally have access to. In this case, the temporary object is naturally scoped to the system where Terraform is running (it's not a "remote object" in the usual sense) but it does still have an "open"/"close" lifecycle where ideally it should remain open for as short a time as possible.

Most of the existing use-cases for this sort of ephemeral object tend to naturally relate to other ephemeral state inherent to Terraform. For example, provider configurations are active only for a single operation, and tend to consume credentials and network tunnels. Provisioners are active only briefly during the creation of a managed resource, and similarly can consume credentials and network tunnels. Due to the ephemeral nature of a time-limited credential or a network tunnel, it doesn't make sense to use it as part of the configuration of managed and data resources: their data outlives a single operation, and so it would tend to be useless for them to refer to an ephemeral object.

This document observes that "ephemeral objects" seems to be a real use-case, distinct from the use-cases of managed and data resources, and proposes an explicit representation of them in the Terraform language as a third resource mode with its own lifecycle.

Proposal

Our usual representation of "remote" objects (some of which are more remote than others) is resources, and resource modes are how we recognize that not all resources have the same core lifecycle. Therefore this document proposes to add a new "ephemeral" resource mode, and an associated ephemeral block for declaring such a resource:

ephemeral "vault_generic_secret" "example" {
  path = "secret/rundeck_auth"
}

provider "rundeck" {
  url        = "http://rundeck.example.com/"
  auth_token = ephemeral.vault_generic_secret.example.data["auth_token"]
}
ephemeral "ssh_tunnel" "example" {
  user        = "terraform"
  tunnel_host = "bastion.prod.example.com"
  remote_host = "consul.prod.example.com"
  remote_port = 443
}

provider "consul" {
  # local_address would be something like 127.0.0.1:45623, reflecting a
  # local port dynamically allocated for the SSH tunnel.
  address = ephemeral.ssh_tunnel.example.local_address
}

An ephemeral resource would have a lifecycle consisting of two actions, which might be expressed as provider protocol operations as follows:

  • OpenEphemeral: given an object representing the content of the resource's configuration block, create the ephemeral remote object and return a bigger object adding additional information about that ephemeral object, such as the SSH tunnel local address or credential information in the above examples.

    This is roughly analogous to "creating" a managed resource, but with a different verb to help distinguish it from the idea of creating some long-lived persistent object, as an analogy to opening a local file, a socket, etc.

  • CloseEphemeral: given an object returned by a previous call to OpenEphemeral, clean up any externally-visible state associated with the ephemeral object (e.g. explicitly end a Vault lease, or explicitly close an SSH tunnel listen socket). This operation returns nothing except a possible set of error or warning diagnostics.

    This is roughly analogous to "destroying" a managed resource, but with a different verb to help distinguish it from the idea of destroying some long-lived persistent object, and as the opposite of "open" above.

A significant difference for the ephemeral resource lifecycle compared to the managed resource lifecycle is that the open and close operations will always appear together in a graph: any graph walk that opens an ephemeral must also close it, to limit the scope to that single walk. We'll explore more details about the graph representation of ephemeral resources in a later section.

Restrictions for Ephemeral Resource Configuration

The lifecycle of an ephemeral resource is similar to that of a provider configuration: each one is re-created for each walk and then destroyed before the end of that walk. We've not historically imposed any explicit restrictions on what objects can be referred to in provider configurations, but in retrospect we've seen that we should have: provider configurations cannot feasibly make use of values that are determined only after apply, because we need to configure providers even for planning.

Learning from that historical error, I propose that an initial implementation of ephemeral resources impose the following restrictions, checked during or after graph construction and before graph walking:

  • Ephemeral resources may derive their values only from other ephemeral resources, either directly or indirectly. That is, an ephemeral resource could refer to an instance of another ephemeral resource, or it could refer to a named value that is derived only from other ephemeral resources, but it may not refer to a managed or data resource, nor may it refer to a named value derived from one.

  • Outputs from ephemeral resources may not be used in either managed nor data resource configurations, because their results will outlive a single walk and thus become invalid immediately. Again, this rule applies indirectly too: a named value derived from an ephemeral resource may not be used in a managed nor data resource.

  • Provider configurations and provisioner configurations (including their associated connection blocks) may refer to ephemeral resources, either directly or indirectly.

We may be able to relax some of these restrictions if we later implement something like the Partial Apply proposal, e.g. by allowing managed resources to be used as part of the configuration of an ephemeral resource but deferring it and anything that depends on it until a subsequent plan/apply if the managed resource is not yet created. Being restrictive in the initial implementation will give the greatest freedom to selectively loosen those restrictions as Terraform's other capabilities change.

Interaction with Terraform State

Because the full lifecycle of an ephemeral resource is completed separately during each walk, there is no need to persist any record of it in saved state snapshots. Instead, the ephemeral resource state will exist only briefly in memory during its open window.

For ephemeral resources that issue credentials, this creates a significant advantage over the existing "abuse" of data resources: the temporary credentials will exist only in memory within the Terraform Core and provider processes, and never be written out in a state snapshot.

This does not fully address the "sensitive values" class of problems -- there are still use-cases around resources that generate persistent secrets like private keys associated with TLS certificates -- but implementing ephemeral resources would likely take some of the heat off in user feedback about sensitive values by addressing a common sub-section of that problem space.

Graph Construction with Ephemeral Resources

Since the primary use-cases for ephemeral resources are in management of objects that are in some sense sensitive -- credentials directly, or privileged access to a remote network derived from some credentials -- our aim would be to avoid opening them at all when possible and, when we do need to open them, to keep the window of time they are open as short as possible.

With that in mind, and considering the restrictions on referencing from the previous section, the additional graph construction behaviors for ephemeral resources would be:

  • For each ephemeral resource, check to see if there is at least one valid reference to it from a provider configuration that will be opened in this operation (i.e. that has at least one associated resource in the graph) or, during the apply phase only, from a provisioner associated with a managed resource that is planned for creation or destruction. If not, create no additional objects and halt further processing for that ephemeral resource.

  • For each provider configuration that makes use of a given ephemeral resource, locate the provider configuration's open and close graph nodes. The open node for the provider configuration depends on the open node for the ephemeral resource. The close node for the ephemeral resource depends on the close node for the provider. Or, diagrammatically:

       ephemeral.vault_generic_secret.example (open)
                             ⇧
                 provider.rundeck (open)
                             ⇧
              rundeck_job.example (any action)
                             ⇧
                provider.rundeck (close)
                             ⇧
       ephemeral.vault_generic_secret.example (close)
    
  • During the apply phase only, for each provisioner associated with a managed resource planned for creation or destruction whose provisioner configurations refers to an ephemeral resource, locate the create and/or destroy node for the managed resource and mark it as dependent on the open node for the ephemeral resource, and mark the close node for the ephemeral resource as dependent on the managed resource node. Or, diagrammatically:

       ephemeral.vault_generic_secret.example (open)
                             ⇧
            rundeck_job.example (create/destroy)
                             ⇧
       ephemeral.vault_generic_secret.example (close)
    

    (Only create-time provisioners need to be considered for managed resources planned for creation, and only destroy-time provisioners for those planned for destruction.)

In addition to the above behaviors, ephemeral resources must also follow a similar behavior as for provider configurations in that they must be forcefully closed even if an error occurs before their "close" node is reached during graph traversal. The only cases where an ephemeral remote object should persist after a graph walk is completed are if the CloseEphemeral operation itself fails (the provider's own responsibility) or if Terraform encounters a panic condition.

Interactions with the Plan/Apply flow

The key distinguishing factor for ephemeral resources is that they are processed in exactly the same way for all walk types. The only differences are a result of the interactions with other objects in the graph: ephemeral resources would never appear in a validate graph, for example, because in practice such a graph doesn't contain any provider configurations nor managed resource create/destroy actions.

As a follow-on consequence of that, ephemeral resources to not explicitly participate in the plan/apply flow: there will never be an entry in a generated plan representing opening or closing an ephemeral resource. Instead, the ephemeral resource behaviors are an implied side-effect of all other operations, and no information about an ephemeral resource opened and closed during plan is available during a subsequent apply. The apply walk will open and then later close any necessary ephemeral resources itself.

This addresses the problem of ephemeral credentials generated during plan becoming unavailable before the plan is applied: the apply phase will instead issue its own credentials, entirely separate from those issued during the plan phase.

Provider SDK Representation

I'll leave the finer details of Provider SDK Representation for the SDK team to define, but I want to note a few things related to it.

Firstly, from a provider protocol standpoint Terraform Core will consider an ephemeral resource type to be totally distinct from a managed or data resource type of the same name. This continues the precedent that e.g. a managed resource type aws_vpc is not connected in any technical way to a data resource type aws_vpc, and instead the relationship between them is a UX concern managed by provider developers.

The SDK may in practice be designed to allow sharing implementation between resources types with the same name but of different modes. From Terraform Core's perspective, that would be an implementation detail of the SDK. I'd recommend caution about making such sharing of implementation the default behavior, because e.g. it would be confusing if a data "ssh_tunnel" "example block were to be treated as valid, create an SSH tunnel process during its read, and then totally lose track of that process and not formally clean it up.

The design of ephemeral resources does intentionally have various things in common with other resource modes, though. For example, the representation of configuration as an object, and the open action augmenting that object with additional "computed" attribute values in a similar way as we see for both managed and data resources. In principle then, the same abstractions used to represent config-in-state-out transformations for managed and data resource types should be adaptable to ephemeral resource types too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment