Deployment Stacks requires the ability to manage the full lifecycle (including deletion) of all resources defined inside a Stack. Currently this is only possible for Azure resources, not extensible ones, due to lack of an id
field, and an authentication mechanism for an extensible control plane.
- Provide an
id
equivalent which Stacks can use to uniquely identify an extensible resource for change tracking. - Provide a mechanism to allow Stacks to submit
Delete
&Get
operations against extensible resources.
This spec will be broken down into two parts to cover these separate goals.
Each provider can choose to expose a config property named auth
. This property will have special handling in the Deployment Engine, which will understand that it contains the instructions to fetch a secret, rather than the secret itself (similar to the handling of KeyVault secret reference).
The auth
property is defined as an object, with a required type
field. The type
field is be used as a discriminator to provide validation for the various authentication mechanisms available.
The following auth
types will be available in template deployments:
UserProvided
:{ "type": "UserProvided", "value": "<raw secret value>" }
KeyVault
:{ "type": "KeyVaultSecret", "keyVaultId": "<key vault resource id>", "secretName": "<name of the secret to fetch>" }
NOTE: We could consider extending the built-in auth mechanisms to simplify certain scenarios - e.g. for Kubernetes:
auth: { type: 'AksResource' id: aks.id }This is out of scope for this spec.
NOTE:
auth
should be used for control planes which do not support AAD OBO tokens. Once the platform has the capability to support OBO, we should ensure we first-class this experience.
NOTE: We are working on improving the ergonomics of the
import
statement, such that config values can be passed in by the parent module. When we do this work, we should aim to align on a syntax similar tokv.getSecret()
for setting keyvault-provided credentials.
param kv resource 'Microsoft.KeyVault/vaults@2022-07-01'
import kubernetes as k8s {
auth: {
type: 'KeyVaultSecret'
keyVaultId: kv.id
secretName: 'myKubeConfig'
}
namespace: 'default'
}
NOTE: We may want to provide syntactic sugar to simplify authoring; conceivably this could look something like:
import kubernetes as k8s { auth: kv.getSecret('myKubeConfig') namespace: 'default' }This is out of scope for this spec.
If an extensible resource is configured using auth
, the body of the resource in the outputResources
section of the deployment must contain the evaluated auth
property. For UserProvided
auth type, the value
property must not be present.
UserProvided
example:"outputResources": [ { "id": "...", "auth": { "type": "UserProvided" } // other properties omitted } ]
KeyVaultSecret
example:"outputResources": [ { "id": "...", "auth": { "type": "KeyVaultSecret", "keyVaultId": "/subscriptions/...", "secretName": "myKubeConfig" } // other properties omitted } ]
NOTE: Stacks will not support the
UserProvided
option, because it has no capability to store credentials securely, and has no capability for interactive authentication when performing cleanup. It is however trivial to insert credentials into a KV during a deployment, and thus use theKeyVaultSecret
mode.
Extensible resources do not contain a well-known name
or id
field, and instead can consist of one or more identifying fields with different keys (e.g. for Kubernetes, metadata.namespace
& metadata.name
). Stacks however needs an identifier (or set of identifiers) which it can use to accurately track the lifecyle of a resource.
This identifier must be sufficiently unique such that it cannot be confused with other resources in the same deployment, but it must only be composed of properties that identify the resource. There are also cases where there is no direct property in the resource body which can be mapped to the identifier.
For example, a kubernetes resource must contain the namespace & the name of the resource. However, if multiple clusters are being deployed to with the same deployment, namespace + name may not be sufficiently unique - therefore it is necessary to also inject the cluster name into the id.
The Bicep type provider must indicate which fields in a resource body compose the identifier, as we must verify they are always set in a resource
declaration, and are the only properties set in an existing
resource
declaration. This capability exists in Bicep today, but should be instead moved into types.json
so that the type provider is able to make this decision when authoring types.
Overall, the Bicep authoring experience will be unchanged.
Each extensibility provider must implement a mandatory /GetId
endpoint for obtainin g the predicted id
field, given a resource body. This will be used at the start of a deployment operation, to obtain the id
for logging. The POST
body will be of the same format as the /Save
& /PreviewSave
endpoints. The response body will be of similar format to other API responses, but just containing the id
property:
{
"resource": {
"id": "cluster/ant-test-cluster/metadata.namespace/default/metadata.name/foo",
"type": "apps/Deployment",
"apiVersion": "v1"
}
}
Since the extensibility provider will be responsible for defining the format of the id
field, the extensibility response contract will be updated to include a mandatory id
string for all APIs:
{
"resource": {
"id": "cluster/ant-test-cluster/metadata.namespace/default/metadata.name/foo",
"type": "apps/Deployment",
"apiVersion": "v1",
"properties": {
...
}
}
}
Stacks will need to be able to issue a /Get
or a /Delete
purely using the id
, so these methods will be modified to require fetching or deleting a resource by id
:
{
"import": {...},
"resource": {
"id": "cluster/ant-test-cluster/metadata.namespace/default/metadata.name/foo",
"type": "apps/Deployment",
"apiVersion": "v1"
}
}
If the Deployments Engine needs to fetch an
existing
resource, it will need to first perform a/GetId
followed by a/Get
, using thisid
.
The Deployment Engine will execute a /GetId
at the start of a resource deployment, to obtain the id
field. This will be used to save deployment operation results, and for logging.
To add to the "outputResources" body described in Part 1, the Deployment Engine now needs to ensure it provides the full import
configuration, along with a list of identifiers for the resource.
"outputResources": [
{
"import": {
"provider": "Kubernetes",
"version": "0.1",
"config": {
"cluster": "myCluster",
"auth": {
"type": "KeyVaultSecret",
"keyVaultId": "/subscriptions/...",
"secretName": "myKubeConfig"
}
}
},
"type": "apps/Deployment",
"apiVersion": "v1",
"id": "cluster/ant-test-cluster/metadata.namespace/default/metadata.name/foo"
}
]
In the samples in this document, I have also proposed we split up the "type" field in the Deployments representation as well as the extensibility contract (which currently contains the type
AND the apiVersion
). This makes it easier to communicate to Stacks which properties should & shouldn't contribute to the uniqueness of a resource. I propose we make this change at the same time as we introduce the id
field.
DISCUSSION TOPIC: Is there any reason to make
id
a string? We could instead define it as a set of keys & values:{ "cluster": "ant-test-cluster", "metadata.namespace": "default", "metadata.name": "foo" }For the purpose of logging & deployment operations, we can come up with a consistent mechanism for converting it to a string.
The Deployments service will provide a new ARM API which Stacks can invoke to clean up any extraneous resources (on a Stacks PUT removing resources, or on a Stacks deletion). This will accept an array of resources, in the format described above for the outputResources
property.
This API will batch up and send ARM resources to be deleted by ARM's /bulkDelete
API. Extensible resources will be handled internally by the Deployment Engine code, using a similar algorithm to the one powering ARM's bulk delete. The format described above will be sufficient to generate the Delete
request body which the Deployment Engine needs to submit to the Extensibility Host to perform each resource cleanup.
Similar to the behavior on a template PUT, the Deployment Engine will need the capability to resolve auth credentials in order to submit the Delete
request to the Extensibility Host.
For example:
{
"import": {
"provider": "Kubernetes",
"version": "0.1",
"config": {
"cluster": "myCluster",
"auth": {
"type": "KeyVaultSecret",
"keyVaultId": "/subscriptions/...",
"secretName": "myKubeConfig"
}
}
},
"type": "apps/Deployment",
"apiVersion": "v1",
"id": "cluster/ant-test-cluster/metadata.namespace/default/metadata.name/foo"
}
The changes required for Stacks to support the end-to-end:
- Persist the format used for extensible resources in the
outputResources
body, and supply it on a/bulkDelete
request. - Use the new Deployments
/bulkDelete
API for cleanup. - Understand which properties of the output resource definition constitute a unique identifier for a resource, for change tracking.
- This spec doesn't aim to solve the repetition associated with
import
statements in Bicep; there are separate proposals for this. I am however making the assumption that this will be a solved problem in the future. - Some of the samples use proposed syntax (e.g. Resources as parameters) for simplicity, but do not require this syntax to exist.
- We would like to retain the capability to pass raw credentials for local-mode (non-Azure) evaluation, but do not need to support this mode with Stacks on Azure.
- The current Kubernetes provider has already implicitly introduced the concept of defaulting properties in the
import
configuration block. I have removed this capability to simplify the generation of resource IDs, with the understanding that it'll be brought back with a more generic proposal - to avoid the repetition of having to specify the Kubernetesnamespace
multiple times in a file.
Any id
generated by Bicep must uniquely identify a resource, and be immutable for the lifecycle of that resource. This means that we need to be careful that it is composed of all the unique identifying characteristics of a resource, in a deterministic order. It will also be important to ensure that the id
identifies the same resource across different versions of a provider.
In the context of Kubernetes, we will want to include the type, namespace, name & identifying cluster information in the id
.
Extensible resources generally require an authentication context to communicate with an external control plane. Deployment Stacks will need to be able to access this same authentication context in order to submit a Delete
request.
In certain scenarios, the authentication context will not be known at the start of the deployment - for example using an Azure list*
method to access the kubeConfig
for an AKS cluster. It's not clear how Deployment Stacks would be able to capture and utilize this operation to obtain the auth needed to clean up a deleted resource.
Deployment Stacks supports locking of ARM resources to prevent external modification or deletion. This isn't something that can be generically extended to other control planes. We could consider adding this to the extensibility contract in future if we have a compelling reason to do so (e.g. a particular control plane supports it).
The current Deployment Stacks design uses the Deployments RP in order to orchestrate a deployment, but uses the ARM bulk delete API to perform cleanup. Deployments extensibility is purely built into the Deployments RP, and doesn't (and conceptually shouldn't) involve ARM. This may necessitate the creation of an API on the Deployments RP to perform cleanup.
There will be resources which cannot have a globally unique identifier - for example a private Kubernetes cluster with a non-public DNS name. We should design the feature in a way that mixing up two resources is difficult, but it will not be possible to define a globally-unique id
.