The Keylime integrity verification system currently operates on a pull, or server-initiated, basis whereby a verifier directs a number of enrolled nodes to attest their state to the server on a periodic basis. This model is not appropriate for enterprise environments, as each attested node thereby acts as an HTTP server. The requirement to open additional ports for each node and the associated increase in attack surface is unacceptable from a compliance and risk management perspective.
This document aims to outline the challenges that need to be overcome in order to support an alternate push model in which the nodes themselves are responsible for driving the attestation cycle. These include changes to the registration and enrolment mechanisms, attestation and verification processes, and data model. We hope to elicit feedback from the Keylime community on these topics to arrive at a robust, forward-thinking solution which considers the latest developments in verification.
Thore Sommer (@THS-on) has previously put together a draft proposal on how some aspects of this could work. We make reference to this where relevant.
To begin, we first provide an overview of the current state of Keylime as it relates to this topic before moving onto a discussion of the inherent challenges and our ideas for overcoming them.
- 2023-08-10: (Current) Introduces new comprehensive approach to agent authentication
- 2023-08-02: Initial version
Keylime's stated purpose, as given on the official website, is to "bootstrap and maintain trust". The paper which presents the original design explains that this entails (1) provisioning of identity for each node and (2) monitoring of the nodes to detect integrity deviations.
Since this paper was presented at ACSAC in 2016, new approaches to handling the identity component of the equation have come on the scene, such as SPIFFE/SPIRE, a fellow CNCF project.
Keylime consists of three main components: a registrar, which acts as a simple store of identities associated with each node, a verifier, which analyses node state and detects changes, and an agent, which is installed on each node and reports information to the registrar and verifier.
The ACSAC paper also leaves a number of responsibilities to the "tenant", that is, the customer of a cloud platform. These tasks have been extracted into a management CLI, referred to by the same name. In the rest of this document, when we mention the tenant, we are referring to the command-line tool.
The Keylime registrar, verifier and tenant are all fully trusted to perform their respective responsibilities faithfully. Nodes, including their installed agents and any running workloads, are not trusted until the registrar, verifier and tenant collaborate to obtain and validate a set of trusted measurements of node state. From that point, a node is deemed trusted, at least until that trust is revoked (either by an automatic mechanism or through manual intervention by an administrator).
The authenticity of measurements are assured by the TPM of each node (the TPMs are thereby also considered to be trusted).
Attacks which exploit poorly-defined verification policies or deficiencies in the information which can be obtained from IMA and UEFI logs (and other measures of system state) are necessarily out of scope. A valid attack against Keylime is one in which an adversary can cause verification of a node to pass when the policy for that node should have resulted in a verification failure.
Certain forms of denial-of-service (DoS) attacks also need to be prevented. For example, consider the inverse of the bolded statement: if an adversary is able to cause verification to fail when it should have passed, this could cause users to mistrust the results and, if these results are consumed by SPIRE or otherwise used for authentication of non-person entities (NPE), could trigger cascading failures throughout the network.
In the default configuration, the registrar and verifier share a single TLS certificate and corresponding private key for server authentication and secure channel establishment. The verifier and tenant share a TLS certificate and private key for client authentication. Both server and client certificate are produced by a common Keylime CA, the certificate for which is preloaded and trusted by all components (verifier, registrar, tenant and agent).
The agent has its own TLS certificate this document calls the NKcert (mtls_cert
in the REST APIs and source code) which it generates on first startup using a set of transport keys (collectively, the NK, or individually, the NKpub and NKpriv) chosen at random. NKpub is sometimes referred to simply as the pubkey
(e.g., in the API docs). The NKcert is registered with the registrar on agent startup and then verified as being linked to a trusted TPM when the agent is first enrolled for periodic verification.
The agent also uses the TPM to create an attestation key (AK), referred to as an "attestation identity key" (AIK) in older TCG specs (and in places in the Keylime source code), associated with the TPM's endorsement hierarchy. The public portion of attestation key (AKpub) is reported to the registrar and verifier during registration/enrolment of the agent. Additionally, the registrar receives the TPM's public endorsement key (EKpub) and endorsement certificate (EKcert).
Work is in progress (on the server-side and agent-side) to allow Keylime to use IDevIDs as form of device identity tied to the device's TPM (relevant TCG spec).
During manufacture of a device that ships with a IDevID, the manufacturer uses the TPM to generate a new attestation key (AK) and issues a certificate for this AK, signed by the manufacturer's certificate authority (CA). These are known as the initial attestation key (IAK) and IAK certificate.
The actual IDevID is produced in similar fashion: the manufacturer uses the TPM to generate a new signing key (i.e., a non-restricted key) and issues a certificate. The signing key is referred to as the IDevID key and the certificate itself is used as the IDevID.
An IAK can only be used to certify data which has been created by, or loaded into, the TPM. Meanwhile, an IDevID key can be used to sign any arbitrary data, making it suitable for producing identity proofs for any arbitrary public key challenge–response protocol.
The manufacturer will include information which uniquely identifies the device in the subject field of the IAK and IDevID certificates, e.g., the serial number of the device, and may include other details such as the model number.
In the current architecture, the Keylime server components are entrusted with these specific responsibilities:
Registrar | Verifier | Tenant |
---|---|---|
|
|
|
Keylime performs its functions via a handful of protocols between the various Keylime components. The relevant ones are as follows:
- Registration protocol: Enables the agent on first start to register its EKpub, EKcert and AKpub with the registrar and prove that the AK and EK are linked.
- Enrolment protocol: Four-way protocol between the tenant, registrar, verifier and agent to enrol the agent for periodic verification by the verifier and provision nodes with credentials.
- Attestation protocol: The verifier uses this to request TPM quotes from the agent according to a configured interval.
A high-level overview of these protocols is given in the diagram below:
As part of the current enrolment process, the user specifies a payload which is delivered to the agent and placed in a directory to be consumed by other software. The reason for this is to support the provisioning of identities to workloads running on the node (e.g., TLS certificates or long-lived shared secrets). The payload may optionally contain a script file, which is executed by the agent.
Considering the current landscape of the identity and access management space, a more modern approach to solving this problem would likely be to have Keylime report verification results to a SPIRE attestor plugin which could then handle provisioning of workload identities (see enhancement proposal 100). This offloads issues related to revocation and suitability for cloud-native workloads.
The arbitrary nature of the payloads mechanism also raises concerns as to the attack surface of the agent and the whole Keylime system. Not only can Keylime server components query a node to report on its state but they also have the power to modify a node's state and execute arbitrary code. Enterprise users would consider this unacceptable.
As a result, we recommend that the payload feature is not implemented in the push model. This gives users the choice to opt into a more secure design which considers a stronger threat model without taking features away from existing users. And for users which do require identity provisioning alongside push support, they have the option of using SPIFFE/SPIRE.
Authentication of the various parties in a Keylime deployment is a non-trivial problem to solve. New agents may come online at any time and may not have a pre-established, pre-trusted identity which they can present to other components in the system to authenticate their requests.
The authentication mechanisms which exist in Keylime today emerged as a theoretical design originally presented in the ACSAC paper and have subsequently evolved to fix deficiencies revealed overtime.
We believe there is a simpler, more-consistent way to handle authentication which takes into account a number of real-world deployment scenarios. Push support presents the perfect opportunity to make such fundamental changes, as backwards compatibility with old agents does not need be considered (updates to both the agents and the server components would be required to use the push feature regardless).
There are three fundamental authentication concerns for Keylime when considering the threat model:
-
Ensuring that TPMs are made by a trusted manufacturer. In order to have confidence in measurements of system state, it is necessary that the TPM behaves according to spec, e.g., that it correctly performs PCR extend operations and protects the contents of its registers (including keys) from access by outside entities except via the well-defined mechanisms.
-
Protecting API requests and responses. No party should be able to change or read the information of record stored by the registrar or verifier for which they are not authorised. Nor should any trusted party be tricked into sending information to the wrong party. Endpoints which do not require authentication should be limited to prevent information leakage and other abuse in cases where the server components are deployed to be accessible publicly or semi-publicly.
-
The binding of identities. If agent communications are authenticated with a credential other than one of the credentials solely resident in (or generated by) the TPM, then that credential needs to be securely bound to the TPM identity.
Additionally, while a TPM's identity should be unique to a particular node, the public credentials (EKcert, EKpub, AKpub, etc.) are not designed to be handled by users directly and, as such, are not linked to any useful "friendly" identifier. Because of this, it is important that the user-facing identifier used by the Keylime agent (randomly-generated UUID, system UUID from DMI, hostname, hash of the EK, or user-provided value) is properly bound to the node's other identities.
If a node's various identities/identifiers are not properly bound together, an attacker may be able to cause a mismatch of identities such that the wrong node is verified, an incorrect verification policy is applied, or no verification is performed at all.
In summary, when an agent sends an attestation or otherwise accesses a Keylime API, the trust that is placed in the agent's request needs to derive from some root identity (e.g., EK), bound to a particular node identifier, which the user has specified as trusted (the user must trust the identity itself as well as the binding). The node may have certain subordinate identities (e.g., AK) which are transitively trusted by their binding to one of the node's root identities.
In the current pull model, an end user needs to use the tenant to enrol an agent with the verifier. As part of this process, the tenant contacts the agent to obtain an identity quote which associates an agent's NKpub with the TPM's AKpub and makes the corresponding NKcert available to the verifier. The verifier later uses this to authenticate the agent when requesting attestations.
In the past, concerns have been expressed that generating a certificate and causing it to be trusted automatically with no central certificate authority (CA) does not comply with certificate management polices in certain organisations. Because of this, it would be best to pursue other strategies for agent authentication.
For organisations with established internal public key infrastructure (PKI), client-side authentication with pre-trusted certificates via mTLS can be supported as an option. However, automatic generation of such certificate (NKcert) and private key (NKpriv) should not be implemented in the push model in favour of more appropriate mechanisms (described in the subsequent subsections).
When agents are deployed with pre-trusted certificates supplied by the user, the agent should bind the corresponding private key to the TPM's identity by way of TPM2_Certify (TPM 2.0 Part 3, §18.2) as suggested by the earlier draft proposal. Additionally, all endpoints which accept authentication via mTLS should check the node identifier contained within the Subject
field of the presented certificate to ensure that there is a binding to the identifier.
When mTLS is not enabled, the verifier can use an alternate mechanism to authenticate agents (also shown in the sequence diagram to the right):
- The agent will request a new nonce from a verifier endpoint which does not require authentication. The verifier will randomly generate a new nonce and store it alongside the node identifier provided by the agent (multiple nonces can be associated with a single node).
- The agent will use TPM2_Certify (TPM 2.0 Part 3, §18.2) to produce a proof of possession of the AK, based on the nonce. The AK will be the object certified and also the key used to produce the signature.
- The agent will send the result to the verifier which will verify the signature using its record of the AKpub for the node and check that the signed data contains a valid nonce.
- If verification of the AK possession proof is successful, the verifier will respond with a session token, which it will persist and deem valid for a configured time period.
- The agent will use this token to authenticate subsequent requests to the verifier. If the token is still valid, the action will proceed. Otherwise, it will reply with a 401 status and the agent will repeat the challenge–response protocol to obtain a new session token.
Whenever the agent submits a valid attestation, the quote itself acts as a proof of possession of the AK. Therefore, on successful verification of a quote, the verifier may extend the validity of the session token, minimising the need to repeat the challenge–response protocol.
This mechanism is very similar to DPoP which allows authentication of OAuth 2.0 clients by way of cryptographic proof. As such, it should be possible to evolve it into a standard OAuth-based solution for API authentication down the line, if desirable.
To enrol an agent at the verifier, the tenant currently retrieves the EKcert from the registrar and verifies it by checking if it has been issued by a CA with a certificate present in its trust store of TPM manufacturer certificates.
The original vision for the registrar from ACSAC paper was to perform the task of an Attestation CA per section 9.5.3.1 (3) of the TPM 2.0 architecture specification (formerly called a Privacy CA). This means that other components in the system would trust the registrar to verify the authenticity of the AK all the way up the chain to the TPM manufacturer's CA certificate.
We suggest returning to this original design such that “the registrar [...] checks the validity of the TPM EK with the TPM manufacturer” (quoting from the paper) which is also consistent with the previous draft proposal. The registrar already checks part of the chain (from AK to EK) so it would seem natural to extend this the rest of the way.
This would require no changes to the protocols; EK verification logic simply needs to be added to the registration request handler.
Comparing the EKcert against a TPM manufacturer certificate is sufficient when the agents themselves are also trusted by an out-of-band mechanism, for example when they are pre-configured with TLS certificates issued by a trusted CA. However, when this is not the case, the user needs to either be able to (1) mark specific EKs as trusted or (2) reliably bind an EK to a particular node identifier (which is in turn trusted). Otherwise, any attacker with access to a TPM from a trusted manufacturer can associate their TPM with a particular node (subsequently causing the verifier to accept quotes from an untrusted agent).
The current mechanism does not easily facilitate the trusting of specific EKs. However, this can be addressed through a simple evolution of the current mechanism. We propose genericising the current certificate trust mechanism such that it accepts (1) a certificate to check, (2) a list of untrusted intermediate certificates and (3) a list of trusted certificates. It will trust the given certificate if and only if:
- the certificate exists in the list of trusted certificates;
- the certificate has been issued by a CA with a certificate in the list of trusted certificates; or
- the certificate has been issued by a CA with a certificate in the list of intermediate certificates which may, in turn, be issued by another CA with a certificate in list of intermediate certificates (recursively) and the last intermediate certificates in the chain has been issued by a CA with a certificate in the list of trusted certificates.
This would allow the user to trust EKs on an individual basis by adding EKcerts to the trust store directly, instead of blindly trusting all EKs from a particular TPM manufacturer. The same mechanism can be used for verification of agent certificates when mTLS is enabled and for IDevIDs (discussed in the next section).
It is important to note that while this greatly improves the security of Keylime and the trust which one can place in its verification of nodes, this mechanism on its own does not bind the node's identifier to the cryptographic identity of the TPM. This means that an attacker would not be able to cause any TPM to be associated with a given node, but it could still confuse the identities of two trusted nodes (causing the verification policies of two different nodes to be incorrectly applied to one another). So, for an EK to be considered trusted by the registrar automatically, the agent must be configured to use the hash of the EK as node identifier (this is already supported by Keylime). Alternatively, the user can make use of DevIDs or the webhook mechanism (both described in subsequent sections), either in isolation or in combination with one another.
IDevID/IAK certificates have similar properties to EKcerts but they are issued by the device manufacturer instead of the TPM manufacturer and are usually tied to a useful identifier such as a device serial number.
As such, the same mechanism for trusting EKcerts can be used for IDevIDs/IAKs. If the IDevID or IAK certificate provided by an agent can be verified against the registrar's trust store, the IDevID/IAK is trusted in transitive fashion. The user may provide a collection of IDevIDs and IAKs and place them in the registrar's trust store to mark individual devices as trusted.
This can be evolved to support local DevIDs, i.e., LDevIDs/LAKs, which are issued by the user (in place of the device manufacturer) in the future.
It is worth pointing out that a Keylime agent may not have access to the IDevID/IAK certificates for the node (the way in which these are made available to customers differs per manufacturer and so the user will be required to pre-provision the agent with the IDevID/IAK certificates). However, the agent is always able to re-generate and retrieve the IDevID/IAK keys in the TPM. So, if the registrar receives IDevID/IAK public keys from an agent without the corresponding certificates, the registrar should still be able to trust the IDevID/IAK if the matching certificates are present in the trust store.
On a different note, since IDevID/IAK certificates contain the device's serial number more often than not, it would be worth adding an option to set the agent's identifier to the serial number extracted from the IDevID/IAK (this can be supported alongside the current options for hostname and EK hash). Then, given that the identifier presented by an agent matches its IDevID and IAK certificates, the identifier can be deemed bound to the device identity and the TPM identity and the verifier can confidently apply the correct verification policy for that node.
Currently, if no EKcert is available (for example, if the agent is running on a VM deployed in the cloud), then the tenant allows the user to specify a script to use to verify the EK using custom logic.
We would recommend against adopting the script-based approach used by the tenant in the current pull model, and propose instead that the user is given the option of configuring a webhook which the registrar can query for a trust decision. This has a number of benefits:
- Does not require the registrar to invoke a shell command, avoiding the associated performance impacts.
- More consistent with service-oriented architectures.
- Allows users to change the decision logic without making changes to the registrar.
- Keeps the attack surface of the registrar as small as possible.
- By confining the decision logic to a separate node, this node can itself be verified by Keylime.
However, instead of limiting this functionality to situations in which the EKcert is unavailable, we propose that the webhook is triggered whenever a trust decision cannot be reached by the registrar on its own, i.e., in the following circumstances:
- when an EK has been provided by the agent but this is not accompanied by an EKcert;
- when an EKcert has been provided but verification against the trust store fails;
- when an EKcert has been provided and verifies against the trust store, but the node identifier does not match the hash of the EKpub;
- when an IDevID and IAK certificate have been provided but verification of one or both certificates against the trust store fails; or
- when the IDevID and IAK certificates verify against the trust store but the device serial number contained in each do not match the node identifier.
When the registrar issues its request to the webhook URI, it would provide all information it has about the agent: its keys and certificates and the outcomes of the checks which the registrar has already performed. For example, it may supply a JSON object similar to the following in its request:
{
"node_id": "...",
"root_identities": ["ek"],
"subordinate_identities": ["ak"],
"ek": {
"trust_status": "NOT_TRUSTED",
"trust_details": ["EK_CERT_RECEIVED", "EK_CERT_NOT_TRUSTED", "EK_NOT_BOUND_TO_ID"],
"ekcert": "...",
"intermediate_certs": [ ... ],
"trusted_certs": [ ... ]
},
"ak": {
"binding_status": "BOUND",
"binding_details": ["AK_BOUND_TO_EK"]
"bound_root_identities": ["ek"]
}
}
The webhook endpoint may reply with a list of decisions it wishes to override based on its own custom logic, e.g.:
{
"node_id": "...",
"decisions": ["EK_CERT_TRUSTED", "EK_BOUND_TO_ID"]
}
It may also elect not to override any decisions, or override only a subset of decisions, and instead provide additional information to enable the registrar to reach its own new decisions. For example, consider the case in which an agent is deployed in a VM on a cloud provider and, as a result, the EKcert is unavailable and thus the EK is not trusted nor bound to the node identifier, set to the hostname of the node. The web service serving the webhook endpoint could use the node identifier to retrieve the EKcert from an API provided by the cloud provider and respond to the registrar with the following:
{
"node_id": "<hostname>",
"decisions": ["EK_BOUND_TO_ID"],
"ek": {
"ek_cert": "<ek_cert_from_cloud_provider>",
"intermediate_certs": ["<cloud_provider_intermediate_cert>"]
}
}
Given that the cloud provider's root CA certificate is present in the registrar's trust store, the registrar will re-evaluate its EK decision, and mark the EK as trusted.
This proposed webhook functionality can be added to the existing registration protocol while remaining backwards compatible with previous versions and is illustrated by the sequence diagram:
(The current protocol messages are in black. Blue indicates planned additions for DevID support while green indicates the new items proposed by this document. Items not relevant to the push model are struck out in red.)
Note that the outgoing request to the webhook URI is performed in a non-blocking way, so the registrar can reply to an agent's registration request without waiting for a response from the outside web service. If no well-formed response is received from the web service, it should reattempt the request using an exponential backoff, similar to what the verifier currently does when a request for attestation data from an agent fails.
The webhook mechanism supports a wide variety of deployment scenarios (discussed in the next section) and is general enough to support EKs, IDevIDs/IAKs, LDevIDs/LAKs and other identities which could be associated with a future TEE attestation implementation.
The new API authentication mechanism plus the new ways for trusting root identities (EKs or IAKs/IDevIDs) allow users to deploy Keylime in a wide variety of ways, appropriate for their environment and use case. A user can elect to use a single deployment method, if their environment is fairly homogenous, or multiple, if they need to verify a diverse set of nodes.
The possible characteristics of a node are given in the matrix below (the significance of the letters will be explained momentarily):
A | B | C | D | E | F | G | H | I | |
---|---|---|---|---|---|---|---|---|---|
Node Type | |||||||||
The node to be verified is a physical device | A | B | C | D | E | G | |||
The node to be verified is a virtual machine | A | F | G | H | I | ||||
Infrastructure Availability | |||||||||
The user has a PKI and wishes to pre-provision the node with a certificate | A | ||||||||
The user has an inventory management system which can associate the node's identifier with an IDevID and IAK | C | D | |||||||
No such infrastructure | B | D | E | F | G | H | I | ||
Certificate Availability | |||||||||
The node to be verified has an IDevID and IAK | A | B | C | D | |||||
The node has an EKcert loaded in the TPM by the manufacturer | E | G | |||||||
The EKcert is obtainable but not directly from the TPM | F | G | |||||||
No EKcert or IDevID/IAK is available for the node | H | I | |||||||
Choice of Node Identifier | |||||||||
The user wishes to identify the node with a semantic/human-readable value | A | B | C | D | G | I | |||
The user wishes to identify the node with the EK of the node's TPM | E | F | H |
To determine the possible deployment scenarios for a given situation, select one characteristic from each category in the leftmost column, and find the letters of the alphabet common to all of them. These can be matched against the table below which contains all the supported scenarios:
Scenario | Available Certificate | Node Identifier | Trust Mechanisms | API Authentication |
---|---|---|---|---|
A | Agent TLS certificate *1 | Any **1 | mTLS trust store | Agent mTLS |
B | IDevID/IAK cert (w/ serial) *2 |
Device serial no. | IDevID/IAK trust store †1 or †2 | AK challenge–response |
C | IDevID/IAK cert (w/ serial) *3 |
Any (including serial) | IDevID/IAK trust store †1 †3 + Webhook ‡1 | AK challenge–response |
D | IDevID/IAK cert (w/out serial) *2 or *3 |
Any (including serial) | IDevID/IAK trust store †1 †3 + Webhook ‡1 | AK challenge–response |
E | EKcert *2 | EK hash | EKcert trust store †1 or †2 | AK challenge–response |
F | EKcert *3 | EK hash | Webhook ‡2 | AK challenge–response |
G | EKcert *2 or *3 | Any (except EK hash) **2 | EKcert trust store †1 †3 + Webhook ‡1 | AK challenge–response |
H | None | EK hash | Webhook ‡2 | AK challenge–response |
I | None | Any (except EK hash) **2 | Webhook ‡1 | AK challenge–response |
Notes:
*1 Pre-provisioned by the user.
*2 Sent from the agent to the registrar.
*3 Obtainable via some out of band mechanism.
**1 Whatever identifier is used, it must be present in the TLS certificate's Subject
field.
**2 The node identifier should be linkable to the root identity (EK or IDevID/IAK) via some out-of-band mechanism. So, it would likely be something like a hostname, VM instance name, cloud resource URI or device serial number for example.
†1 Containing manufacturer CA certificates.
†2 Containing the individual leaf certificates to be trusted.
†3 Use of the trust store is optional in this case, but frees the webhook from performing its own certificate checks.
‡1 The webhook would need access to a lookup table or queryable endpoint which associates node identifiers with root identities (EKs or IDevIDs/IAKs) or hashes of root identities.
‡2 The webhook would need access to a list of trusted root identities (EKs or IDevIDs/IAKs) or hashes of trusted root identities.
To obtain an integrity quote in the current pull architecture, the verifier issues a request to the agent, supplying the following details:
- A nonce for the TPM to include in the quote
- A mask indicating which PCRs should be included in the quote
- An offset value indicating which IMA log entries should be sent by the agent
The agent then replies with:
- The UEFI measured boot log (kept in
/sys/kernel/security/tpm0/binary_bios_measurements
) - A list of IMA entries from the given offset
- A quote of the relevant PCRs generated and signed by the TPM using the nonce
In a push version of the protocol where the UEFI logs, IMA entries and quote are delivered to the verifier as an HTTP request issued by the agent, the agent needs a mechanism to first obtain the nonce, PCR mask and IMA offset from the verifier. We suggest simply adding a new HTTP endpoint to the verifier to make this information available to an agent correctly authenticated with the expected certificate via mTLS.
As such, the push attestation protocol would operate in this manner:
-
When it is time to report the next scheduled attestation, the agent will request the attestation details from the verifier.
-
If the request is well formed, the verifier will reply with a new randomly-generated nonce and the PCR mask and IMA offset obtained from its database. Additionally, the verifier will persist the nonce to the database.
-
The agent will gather the information required by the verifier (UEFI log, IMA entries and quote) and report these in a new HTTP request along with other information relevant to the quote (such as algorithms used).
-
The verifier will reply with the number of seconds the agent should wait before performing the next attestation and an indication of whether the request from agent appeared well formed according to basic validation checks. Actual processing and verification of the measurements against policy can happen asynchronously after the response is returned.
This protocol is contrasted against the current pull protocol in the sequence diagrams which follow:
Pull attestation protocol | Push attestation protocol |
---|---|
One drawback of this approach is that the number of messages a verifier needs to process is doubled. However, this is unlikely to significantly impact performance as the most intensive operations performed by the verifier remain those related to verification of the received quotes. Any such impact should be offset by the increased opportunity for horizontal scaling presented by the push model (as it makes it easy to load balance multiple verifiers). Further optimisations of the protocol can be explored once work on the simple version presented above has been completed.
Historically, the Keylime protocols were envisioned to work over unencrypted HTTP before a number of security issues were identified with this approach. Luckily, most communication is now protected by TLS. However, agent–registrar communication still happens over HTTP. There is no reason for this, as far as we can tell: the registrar has a TLS certificate preloaded as trusted by the agent. If we consider the registrar to be performing the function of a TPM Attestation CA (i.e., Privacy CA), this is a problem as anyone can intercept the traffic and associate a given AK with its EK (an eavesdropper can even determine the outcome of the challenge–response between the registrar and remote TPM), defeating the privacy-preserving properties of the AK.
Putting all the above recommendations together, the lifecycle of an agent operating in push mode can be described according to the following steps (also shown in the flowchart to the right):
-
The agent starts for the first time. If the agent has not been pre-configured with a specific identifier, the agent will use the configured mechanism to obtain the identifier for the node (i.e., DMI UUID, hostname, EK hash, or IDevID serial number).
-
The agent will, as is currently the case, register itself at the registrar, providing its identifier and AK, in addition to either (a) the TPM's EK (with EKcert, if available) or (b) the node's IDevID and IAK.
(The registrar will process the provided keys and certificates and reach a trust decision asynchronously, either on its own, or by invoking the configured webhook.)
-
The agent will attempt to authenticate itself to the verifier, providing its node identifier and obtaining, in response, a nonce. Then, it will use the TPM to construct a proof of possession of its AK, based on the nonce, and try to exchange it for a session token.
-
At this point, it is likely the user has not yet enrolled the agent with the verifier and provided a verification policy for the node. As such, the verifier will reply will an error and the agent will attempt authentication again from step 3 repeatedly, employing an exponential backoff.
(The user may subsequently use the tenant to enrol the node for verification. The user supplies the identifier of the node and desired verification policy and the tenant will obtain the keys for the node from the registrar. If the registrar indicates that all identity checks for the node have passed [the AK is associated with an EK/IAK and the EK/IDevID/IAK is trusted and bound to the node identifier], then the tenant sends the node identifier, AK and verification policy to the verifier. The next authentication attempt performed by the agent should succeed such that the agent obtains a valid session token.)
-
The agent uses the session token to retrieve the information needed to produce an attestation for the node. It prepares the quote, gathers logs and sends it to the verifier. The verifier will process the quote received and arrive at a decision asynchronously.
The verifier may reject an attestation sent by an agent for one of two reasons:
- (a) The token is no longer valid. The agent will reattempt authentication from step 3.
- (b) The previous attestation failed (and the verification policy has not since changed). In such case, the agent will continue to send attestations according to exponential backoff.
-
The agent will continue to send periodic attestations to the verifier from step 5, using the session token.
Note that full separation between the registrar and verifier are maintained and no communication takes place between them.
Enterprise users often use HTTP proxies to inspect web traffic. Support needs to be added to the Keylime agent to proxy HTTPS requests to a given proxy URI.
Keylime defines a number of states (in keylime/common/states.py) for driving the verifier's event loop and reporting the status of an agent. These states, of course, have not been considered in the context of the push model. However, certain states can be mapped to their push model equivalents.
When a verifier is configured to operate in push mode, we suggest that the operational state integer stored in the database take on these meanings instead:
Integer | Name (pull) | Name (push) | Meaning (push) |
---|---|---|---|
1 | START | ENROLLED | Agent has provided AKpub and other details but has not sent its first attestation |
2 | SAVED | NO_POLICY | There was no policy configured for the agent when it last sent an attestation |
3 | GET_QUOTE | AWAITING_QUOTES | The last attestation received from the agent verified successfully |
7 | FAILED | MALFORMED_QUOTE | The last attestation received from the agent was invalid |
9 | INVALID_QUOTE | POLICY_VIOLATION | The last attestation received from the agent did not verify against policy |
The other values (4, 5, 6, 8 and 10) relate to payloads or arise from the verifier-driven nature of the pull protocols and will not be used.
The following changes should be made to the agent's configuration options (usually set in /etc/keylime/agent.conf
):
- Add an
operation_mode
option which accepts eitherpush
orpull
. - Add
verifier_ip
andverifier_port
options to specify how the agent should contact the verifier when operating in push mode. - Add an
https_proxy
option to allow users to specify an HTTP proxy by which the agent should contact the registrar and verifier. - Update comments to indicate which values won't have an effect when push mode is turned on (e.g., those related to payloads).
It is suggested that operation_mode
should not be configurable via environment variable and that the agent checks the ownership of the config file on startup, outputting a warning if it can be written to by any user other than root.
These changes should be made to the verifier's configuration options:
- Add an
operation_mode
option which accepts eitherpush
orpull
.
Finally, these changes should be made to the registrar's configuration options:
- Add a
mtls_trusted_certs
option to set the directory where the registrar should look for trusted agent certificates. - Add a
ek_trusted_certs
option to set the directory where the registrar should look for trusted certificates when verifying EKs. - Add a
ek_intermediate_certs
option to set the directory where the registrar should look for intermediate certificates when verifying EKs. - Add a
devid_trusted_certs
option to set the directory where the registrar should look for trusted certificates when verifying DevIDs / AK certs. - Add a
devid_intermediate_certs
option to set the directory where the registrar should look for intermediate certificates when verifying DevIDs / AK certs. - Add a
trust_decision_webhook
option to set the URI the registrar should invoke when it cannot determine a trust outcome on its own.
Many thanks to Thore Sommer (@THS-on) for sharing his ideas for implementing push model support and to Marcus Heese (@mheese) for many helpful discussions around threat model and operational concerns. We also greatly appreciate the feedback and guidance received from the maintainers and community members in the June and July community meetings.
Once you are in an exponential backoff it may then take a while until the attestation actually starts. There should probably be a limit set to a few seconds on how far to back off until the first attestation with the policy starts. Like maybe 10s ?
If nothing happens before a policy is set we could require that a policy be set first using the tenant tool. This ordering of requiring a policy first could be used to open access for a particular agent to the verifier while preventing it to connect to the verifier first and keeping it busy for no reason...
Should the UUID of the agent be written into the client side certificate and make the client side certificate unique? This would at least make it a bit more difficult to just register an agent that then claims a UUID from a configuration file and prevent re-use of possibly stolen cert. On the other hand it requires issuing a certificate per agent, which may be an operational pain.