Skip to content

Instantly share code, notes, and snippets.

@stringlytyped
Last active September 18, 2023 12:02
Show Gist options
  • Save stringlytyped/367dd6cd29141f4f97b142035203b12a to your computer and use it in GitHub Desktop.
Save stringlytyped/367dd6cd29141f4f97b142035203b12a to your computer and use it in GitHub Desktop.
Roadmap to Push Model Support in Keylime

Roadmap to Push Model Support in Keylime

The Keylime integrity verification system currently operates on a pull, or server-initiated, basis whereby a verifier directs a number of enrolled nodes to attest their state to the server on a periodic basis. This model is not appropriate for enterprise environments, as each attested node thereby acts as an HTTP server. The requirement to open additional ports for each node and the associated increase in attack surface is unacceptable from a compliance and risk management perspective.

This document aims to outline the challenges that need to be overcome in order to support an alternate push model in which the nodes themselves are responsible for driving the attestation cycle. These include changes to the registration and enrolment mechanisms, attestation and verification processes, and data model. We hope to elicit feedback from the Keylime community on these topics to arrive at a robust, forward-thinking solution which considers the latest developments in verification.

Thore Sommer (@THS-on) has previously put together a draft proposal on how some aspects of this could work. We make reference to this where relevant.

To begin, we first provide an overview of the current state of Keylime as it relates to this topic before moving onto a discussion of the inherent challenges and our ideas for overcoming them.


Contents

Background

Keylime's stated purpose, as given on the official website, is to "bootstrap and maintain trust". The paper which presents the original design explains that this entails (1) provisioning of identity for each node and (2) monitoring of the nodes to detect integrity deviations.

Since this paper was presented at ACSAC in 2016, new approaches to handling the identity component of the equation have come on the scene, such as SPIFFE/SPIRE, a fellow CNCF project.

Components

Keylime consists of three main components: a registrar, which acts as a simple store of identities associated with each node, a verifier, which analyses node state and detects changes, and an agent, which is installed on each node and reports information to the registrar and verifier.

The ACSAC paper also leaves a number of responsibilities to the "tenant", that is, the customer of a cloud platform. These tasks have been extracted into a management CLI, referred to by the same name. In the rest of this document, when we mention the tenant, we are referring to the command-line tool.

Threat Model

The Keylime registrar, verifier and tenant are all fully trusted to perform their respective responsibilities faithfully. Nodes, including their installed agents and any running workloads, are not trusted until the registrar, verifier and tenant collaborate to obtain and validate a set of trusted measurements of node state. From that point, a node is deemed trusted, at least until that trust is revoked (either by an automatic mechanism or through manual intervention by an administrator).

The authenticity of measurements are assured by the TPM of each node (the TPMs are thereby also considered to be trusted).

Long-Term Keys

In the default configuration, the registrar and verifier share a single TLS certificate and corresponding private key for server authentication and secure channel establishment. The verifier and tenant share a TLS certificate and private key for client authentication. Both server and client certificate are produced by a common Keylime CA, the certificate for which is preloaded and trusted by all components (verifier, registrar, tenant and agent).

The agent has its own TLS certificate this document calls the NKcert (mtls_cert in the REST APIs and source code) which it generates on first startup using a set of transport keys (collectively, the NK, or individually, the NKpub and NKpriv) chosen at random. NKpub is sometimes referred to simply as the pubkey (e.g., in the API docs). The NKcert is registered with the registrar on agent startup and then verified as being linked to a trusted TPM when the agent is first enrolled for periodic verification.

The agent also uses the TPM to create an attestation key (AK), referred to as an "attestation identity key" (AIK) in older TCG specs (and in places in the Keylime source code), associated with the TPM's endorsement hierarchy. The public portion of attestation key (AKpub) is reported to the registrar and verifier during registration/enrolment of the agent. Additionally, the registrar receives the TPM's public endorsement key (EKpub) and endorsement certificate (EKcert).

Division of Responsibilities

In the current architecture, the Keylime server components are entrusted with these specific responsibilities:

Registrar Verifier Tenant
  • Receives and stores the public endorsement key (EKpub) belonging to a node's TPM, a certificate issued by the TPM manufacturer (EKcert), and a TPM-generated public attestation key (AKpub).
  • Also receives and stores the TLS certificate (NKcert) used by the agent.
  • Verifies that the AKpub was generated by the TPM identified by the EKpub.
  • Provides this information to other components in the system via REST API.
  • Validates that measurements received from an agent matches the expected node state and that signatures on those measurements are cryptographically verifiable using the node's AKpub.
  • Periodically requests measurements from the agent.
  • Makes verification results available to consumers of the REST API.
  • For a given NKpub and AKpub recorded by the registrar for a node, verifies that the NKpub has been provided by an agent with access to the same TPM which generated the AKpub (and thereby is in possession of the corresponding AKpriv).
  • Verifies the EKcert stored by the registrar against a list of trusted TPM manufacturer certificates to ensure that the EKpub belongs to a genuine TPM.
  • Enrols agents for periodic verification with the verifier and delivers payload to the agent (provided above tests pass).
  • Provides command-line interface to REST APIs to allow for introspection and modification of node data as stored by the registrar and verifier.

Protocol Suite

Keylime performs its functions via a handful of protocols between the various Keylime components. The relevant ones are as follows:

  • Registration protocol: Enables the agent on first start to register its EKpub, EKcert and AKpub with the registrar and prove that the AK and EK are linked.
  • Enrolment protocol: Four-way protocol between the tenant, registrar, verifier and agent to enrol the agent for periodic verification by the verifier and provision nodes with credentials.
  • Attestation protocol: The verifier uses this to request TPM quotes from the agent according to a configured interval.

A high-level overview of these protocols is given in the diagram below:

Overview of the Keylime protocol suite

Proposed Protocol Changes for Agent-Driven Attestation

Move Verification of the Agent's TLS Certificate Out of the Tenant

In the current pull model, an end user needs to use the tenant to enrol an agent with the verifier. As part of this process, the tenant contacts the agent to obtain an identity quote which cryptographically links the agent's NKpub to the AKpub.

Of course it is not possible to simply reverse the directionality of the tenant–agent interaction, as the tenant is not a long-running process which exposes a REST interface. As a result, this responsibility needs to be fulfilled by another component.

It seems most natural to move this functionality out of the enrolment process and into the registration protocol. Verification of the NKpub would be performed by the registrar, thereby tying all identities of the agent used in subsequent protocol runs together at time of registration.

The earlier draft proposal suggested replacing the identity quote with a signature produced by TPM2_Certify (TPM 2.0 Part 3, §18.2). We agree that this is a better approach than the current method of extending PCR 16 with the NKpub and generating a quote.

To implement this, one would simply need to augment the existing registration messages with additional fields for a nonce and signature (see diagrams below). Since no existing fields would be affected, this change would be backwards compatible with the existing pull model.

Current registration protocol Future registration protocol
Current registration protocol Future registration protocol

Move Verification of the EK Out of the Tenant

The tenant also currently retrieves the EKcert from the registrar and verifies it against a trust store containing TPM manufacturer certificates. If no EKcert is available (for example, if the agent is running on a VM deployed in the cloud), then the tenant allows the user to specify a script to use to verify the EK using custom logic.

The original vision for the registrar from ACSAC paper was to perform the task of an Attestation CA per section 9.5.3.1 (3) of the TPM 2.0 architecture specification (formerly called a Privacy CA). This means that other components in the system would trust the registrar to verify the authenticity of the AK all the way up the chain to the TPM manufacturer's CA certificate.

We suggest returning to this original design such that “the registrar [...] checks the validity of the TPM EK with the TPM manufacturer” (quoting from the paper) which is also consistent with the previous draft proposal. The registrar already checks part of the chain (from AK to EK) so it would seem natural to extend this the rest of the way.

This would require no changes to the protocols; EK verification logic simply needs to be added to the registration request handler.

There is the slight challenge of handling the case where no EKcert is provided by the agent. We would recommend against adopting the script-based approach used by the tenant in the current pull model, and propose instead that the user is given the option of configuring a webhook which the registrar can query for a decision on whether a given EK should be trusted. This has a number of benefits:

  • Does not require the registrar to invoke a shell command, avoiding the associated performance impacts.
  • More consistent with service-oriented architectures.
  • Allows users to change the decision logic without making changes to the registrar.
  • Keeps the attack surface of the registrar as small as possible.
  • By confining the decision logic to a separate node, it can be verified by Keylime if the user manually checks that node's EK and marks it as trusted in the registrar's database.

This proposed web hook functionality can be added to the existing registration protocol while remaining backwards compatible with previous versions and is illustrated by the sequence diagram:

Registration protocol with webhook functionality

Note that the outgoing request to the webhook URI is performed in a non-blocking way, so the registrar can reply to an agent's registration request without waiting for a response from the outside web service. If no well-formed response is received from the web service, it should reattempt the request using an exponential backoff, similar to what the verifier currently does when a request for attestation data from an agent fails.

Don't Implement the Payload Feature in the Push Protocols

As part of the current enrolment process, the user specifies a payload which is delivered to the agent and placed in a directory to be consumed by other software. The reason for this is to support the provisioning of identities to workloads running on the node (e.g., TLS certificates or long-lived shared secrets). The payload may optionally contain a script file, which is executed by the agent.

Considering the current landscape of the identity and access management space, a more modern approach to solving this problem would likely be to have Keylime report verification results to a SPIRE attestor plugin which could then handle provisioning of workload identities (see enhancement proposal 100). This offloads issues related to revocation and suitability for cloud-native workloads.

The arbitrary nature of the payloads mechanism also raises concerns as to the attack surface of the agent and the whole Keylime system. Not only can Keylime server components query a node to report on its state but they also have the power to modify a node's state and execute arbitrary code. Enterprise users would consider this unacceptable.

As a result, we recommend that the payload feature is not implemented in the push model. This gives users the choice to opt into a more secure design which considers a stronger threat model without taking features away from existing users. And for users which do require identity provisioning alongside push support, they have the option of using SPIFFE/SPIRE.

Make Enrolment Partially Automatic

Assuming the above recommendations are implemented, most of the tenant's role in enrolment has effectively been eliminated from the push model. Beyond those benefits already discussed, this is worth pursuing as the Keylime project starts to consider future deployment scenarios which do not involve the tenant.

Since users will be able to rely on the registrar to perform verification of a node's entire identity chain from the NK and AK all the way to a trusted TPM manufacturer CA certificate, they can interact with Keylime exclusively through its REST APIs without having to implement these checks themselves. This makes it easy, as an example, to combine horizontal autoscaling with a serverless function (where invoking the tenant would be impractical) to automatically start verification of newly provisioned VMs.

Outside of the binding of a nodes' various cryptographic identities, the only enrolment functions which the tenant performs are:

  • It retrieves certain information about the agent (such as its AKpub) from the registrar and provides that to the verifier.
  • It provides the verifier with the verification policies it should use to verify the attestations received from the agent.

Continuing in the vain of making routine tasks automatic where possible, it would make sense to have the agent provide information about itself to the verifier on first run, just as it does to register itself at the registrar. The verifier would then begin receiving attestations from the agent right away but indicate that no policy is configured in its response. Subsequent attestations would proceed according to an exponential backoff until a policy is added via the tenant (or API request from some third-party component).

This automatic enrolment could optionally be backported to the pull model to bring the user experience of both into alignment. The difference would be that the verifier would not begin request attestations from the agent until a policy is configured.

Require TLS Everywhere

Historically, the Keylime protocols were envisioned to work over unencrypted HTTP before a number of security issues were identified with this approach. Luckily, most communication is now protected by TLS. However, agent–registrar communication still happens over HTTP. There is no reason for this, as far as we can tell: the registrar has a TLS certificate preloaded as trusted by the agent. If we consider the registrar to be performing the function of a TPM Attestation CA (i.e., Privacy CA), this is a problem as anyone can intercept the traffic and associate a given AK with its EK (an eavesdropper can even determine the outcome of the challenge–response between the registrar and remote TPM), defeating the privacy-preserving properties of the AK.

The lack of TLS for registration also makes it easier for an attacker to interleave registration messages between two different runs of the protocol to cause the TPM of one node to be associated with the UUID of another, later resulting in the wrong verification policies being applied to the nodes.

Because of these threats, in the push model, TLS should be required during registration. We recommend that this is also required in future versions of the pull model protocols.

Changes to Integrity Quotes

To obtain an integrity quote in the current pull architecture, the verifier issues a request to the agent, supplying the following details:

  • A nonce for the TPM to include in the quote
  • A mask indicating which PCRs should be included in the quote
  • An offset value indicating which IMA log entries should be sent by the agent

The agent then replies with:

  • The UEFI measured boot log (kept in /sys/kernel/security/tpm0/binary_bios_measurements)
  • A list of IMA entries from the given offset
  • A quote of the relevant PCRs generated and signed by the TPM using the nonce

In a push version of the protocol where the UEFI logs, IMA entries and quote are delivered to the verifier as an HTTP request issued by the agent, the agent needs a mechanism to first obtain the nonce, PCR mask and IMA offset from the verifier. We suggest simply adding a new HTTP endpoint to the verifier to make this information available to an agent correctly authenticated with the expected certificate via mTLS.

As such, the push attestation protocol would operate in this manner:

  1. When it is time to report the next scheduled attestation, the agent will request the attestation details from the verifier.

  2. If the request is well formed, the verifier will reply with a new randomly-generated nonce and the PCR mask and IMA offset obtained from its database. Additionally, the verifier will persist the nonce to the database.

  3. The agent will gather the information required by the verifier (UEFI log, IMA entries and quote) and report these in a new HTTP request along with other information relevant to the quote (such as algorithms used).

  4. The verifier will reply with the number of seconds the agent should wait before performing the next attestation and an indication of whether the request from agent appeared well formed according to basic validation checks. Actual processing and verification of the measurements against policy can happen asynchronously after the response is returned.

This protocol is contrasted against the current pull protocol in the sequence diagrams which follow:

Pull attestation protocol Push attestation protocol
Current pull attestation protocol Proposed push attestation protocol

One drawback of this approach is that the number of messages a verifier needs to process is doubled. However, this is unlikely to significantly impact performance as the most intensive operations performed by the verifier remain those related to verification of the received quotes. Any such impact should be offset by the increased opportunity for horizontal scaling presented by the push model (as it makes it easy to load balance multiple verifiers). Further optimisations of the protocol can be explored once work on the simple version presented above has been completed.

Other Required Changes

Representing the Trust Status of Keys

Currently, when an AK is successfully bound to the EK of the node by way of TPM2_ActivateCredential, the registrar sets the node's active field in the database to true. Prior to this event, performing an HTTP GET request to get information about the node results in a 404 error even if it exists in the database.

To support the push model, the registrar will be performing checking of all the keys associated with an agent. It thus becomes necessary to store and report more granular information about the trust status of agents.

Our suggestions for representing this information is as follows:

  • Add an ek_trust_status field of type Integer containing one of these values: NOT_TRUSTED (0) / TRUSTED_BY_CERT (1) / TRUSTED_BY_WEBHOOK (2)
  • Add a last_ek_trust_decision field of type Integer containing the timestamp at which the trust decision was made
  • Add a mtls_cert_activated field of type Integer and treated as a boolean
  • Rename the active field to ak_activated for consistency (a backwards-compatible change as active is not exposed by the REST API)

Since the configuration of how the registrar trusts EKs can change over time (if the user add/removes certificates from the registrar's trust store or if the logic of the webhook changes), it is worth storing some extra information about how the decision was made and when.

These fields would be available through the REST API. It would be best if the GET endpoint returns the JSON representation of an agent regardless of whether all checks have passed or not, to give consumers of the API full access to the status of an agent and its keys.

Representing Agent Operational States

Keylime defines a number of states (in keylime/common/states.py) for driving the verifier's event loop and reporting the status of an agent. These states, of course, have not been considered in the context of the push model. However, certain states can be mapped to their push model equivalents.

When a verifier is configured to operate in push mode, we suggest that the operational state integer stored in the database take on these meanings instead:

Integer Name (pull) Name (push) Meaning (push)
1 START ENROLLED Agent has provided AKpub and other details but has not sent its first attestation
2 SAVED NO_POLICY There was no policy configured for the agent when it last sent an attestation
3 GET_QUOTE AWAITING_QUOTES The last attestation received from the agent verified successfully
7 FAILED MALFORMED_QUOTE The last attestation received from the agent was invalid
9 INVALID_QUOTE POLICY_VIOLATION The last attestation received from the agent did not verify against policy

The other values (4, 5, 6, 8 and 10) relate to payloads or arise from the verifier-driven nature of the pull protocols and will not be used.

Configuration Changes

The following changes should be made to the agent's configuration options (usually set in /etc/keylime/agent.conf):

  • Add an operation_mode option which accepts either push or pull.
  • Add verifier_ip and verifier_port options to specify how the agent should contact the verifier when operating in push mode.
  • Disallow setting enable_agent_mtls to false when operating in push mode.
  • Update comments to indicate which values won't have an effect when push mode is turned on (e.g., those related to payloads).

It is suggested that operation_mode should not be configurable via environment variable and that the agent checks the ownership of the config file on startup, outputting a warning if it can be written to by any user other than root.

These changes should be made to the verifier's configuration options:

  • Add an operation_mode option which accepts either push or pull.

Finally, these changes should be made to the registrar's configuration options:

  • Add an ek_trust_store option to set the directory where the registrar should look for trusted certificates when verifying EKs.
  • Add an ek_webhook option to set the URI the registrar should use to determine whether it should trust an EK when no EKcert is available.

Acknowledgements

Many thanks to Thore Sommer (@THS-on) for sharing his ideas for implementing push model support and to Marcus Heese (@mheese) for many helpful discussions around threat model and operational concerns. We also greatly appreciate the feedback and guidance received from the maintainers and community members in the June and July community meetings.

@stefanberger
Copy link

Subsequent attestations would proceed according to an exponential backoff until a policy is added via the tenant (or API request from some third-party component).

Once you are in an exponential backoff it may then take a while until the attestation actually starts. There should probably be a limit set to a few seconds on how far to back off until the first attestation with the policy starts. Like maybe 10s ?

If nothing happens before a policy is set we could require that a policy be set first using the tenant tool. This ordering of requiring a policy first could be used to open access for a particular agent to the verifier while preventing it to connect to the verifier first and keeping it busy for no reason...

The lack of TLS for registration also makes it easier for an attacker to interleave registration messages between two different runs of the protocol to cause the TPM of one node to be associated with the UUID of another, later resulting in the wrong verification policies being applied to the nodes.

Should the UUID of the agent be written into the client side certificate and make the client side certificate unique? This would at least make it a bit more difficult to just register an agent that then claims a UUID from a configuration file and prevent re-use of possibly stolen cert. On the other hand it requires issuing a certificate per agent, which may be an operational pain.

@stefanberger
Copy link

... anyway, thanks a lot for putting all this together!

@stringlytyped
Copy link
Author

@stefanberger Thanks for your comments!

Once you are in an exponential backoff it may then take a while until the attestation actually starts. There should probably be a limit set to a few seconds on how far to back off until the first attestation with the policy starts. Like maybe 10s ?

That is a good point. Yes, there definitely should be a limit.

If nothing happens before a policy is set we could require that a policy be set first using the tenant tool. This ordering of requiring a policy first could be used to open access for a particular agent to the verifier while preventing it to connect to the verifier first and keeping it busy for no reason...

So, Thore and I had a long chat somewhat related to this yesterday. I am going to update the gist to move to this model whereby the verifier only accepts attestations after a policy is set.

However, this does not really solve the issue you bring up about the repeated "polling" requests from the agent on first startup. Because there is no good mechanism by which the verifier can notify the agent that a policy is now available. Even if the verifier could update a database field at the registrar to indicate that a policy has been configured, then the agent would need to poll the registrar until that field is set. So, you have to poll either the registrar or the verifier regardless.

Should the UUID of the agent be written into the client side certificate and make the client side certificate unique? This would at least make it a bit more difficult to just register an agent that then claims a UUID from a configuration file and prevent re-use of possibly stolen cert. On the other hand it requires issuing a certificate per agent, which may be an operational pain.

So, this is sort of the case currently actually. The agent, on first startup, creates a new TLS certificate (currently used for server authentication since the agent acts as an HTTP server, but there is no reason it cannot act as a client cert in the push model). And that TLS certification contains the agent UUID in the subject field.

But you are right: if you have a PKI in place that can issue certs for a given UUID, then you side step this issue. This should not be a requirement, however, as like you say, it is a pain to deal with (and hard to do correctly).

In the case of the current default behavior whereby the cert is auto generated by the agent: the issue here is that the certificate is self-signed and so there is no assurance that the certificate is actually linked to that UUID. If you are using randomly-generated UUIDs (the default in Keylime), this is less of a concern because a network attacker would have to guess the UUID, assuming that registration is protected by server-authenticated TLS (an attack would still be possible if the attacker is resident on the attested device, however). But there is no requirement to use UUIDs. The agent identifier may be the hostname of the node, or something else that is predictable.

This is something which has been bothering me since the end of last week. I think I have a reasonably good solution to this problem which covers a bunch of different deployment scenarios. I ran through it with Thore yesterday and he seems to think it could work. So, I am busy updating the gist to describe this new approach (it is not a massive departure from the current status quo but does replace the auto-generated NKcert with a different mechanism). Hoping to have the updated doc ready soon

@stefanberger
Copy link

stefanberger commented Aug 3, 2023

In the case of the current default behavior whereby the cert is auto generated by the agent: the issue here is that the certificate is self-signed and so there is no assurance that the certificate is actually linked to that UUID. If you are using randomly-generated UUIDs (the default in Keylime), this is less of a concern because a network attacker would have to guess the UUID, assuming that registration is protected by server-authenticated TLS (an attack would still be possible if the attacker is resident on the attested device, however). But there is no requirement to use UUIDs. The agent identifier may be the hostname of the node, or something else that is predictable.

I would think that no agent should be allowed to create a self-signed certificate for itself, that would make it too easy to connect to the server side and claim any uuid. There should at least be a known CA involved for the (push-model) client-side certs and then on the next level possibly the UUID written into the client-side cert.

@stringlytyped
Copy link
Author

@stefanberger it would be good to get your feedback on the updated document, especially the new authentication mechanisms proposed (in the Changes to Authentication subsection). I think the concerns you highlighted before, which I give some specific replies to below, should now be addressed.

I would think that no agent should be allowed to create a self-signed certificate for itself, that would make it too easy to connect to the server side and claim any uuid. There should at least be a known CA involved for the (push-model) client-side certs and then on the next level possibly the UUID written into the client-side cert.

Yes, totally agree. In the latest version of the document, I have added a specific recommendation (in the Remove Requirement for Agent TLS Certificate subsection) not to support the automatic generation of the NKcert in the push model. But, the user can still use pre-provisioned client TLS certs to authenticate agents, if they wish.

Should the UUID of the agent be written into the client side certificate and make the client side certificate unique?

I've added a requirement (in that same subsection) that the registrar and verifier must check that the presented client certificate has been issued for the given node: "Additionally, all endpoints which accept authentication via mTLS should check the node identifier contained within the Subject field of the presented certificate to ensure that there is a binding to the identifier."

If nothing happens before a policy is set we could require that a policy be set first using the tenant tool. This ordering of requiring a policy first could be used to open access for a particular agent to the verifier while preventing it to connect to the verifier first and keeping it busy for no reason...

This is now the case: a policy is required to be set with the verifier before it will accept attestations from an agent (see step 4 in Understanding the Lifecycle of an Agent in the Push Model). However, the backoff is retained, as I mentioned before, as the agent cannot know when the policy has been given to the verifier.

@stefanberger
Copy link

The proposal is now very long going even into the direction of IDevID/IAK etc. It should be possible to treat this in a separate proposal.

Also there are quite a few directories now. In what case would I have to use ek_intermediate_certs versus ek_trusted_certs? Are we expecting to not be able to see the root certificate of an EK CA that we would need to use intermedate certs and end verification there?

@stefanberger
Copy link

This would allow the user to trust EKs on an individual basis by adding EKcerts to the trust store directly, instead of blindly trusting all EKs from a particular TPM manufacturer. The same mechanism can be used for verification of agent certificates when mTLS is enabled and for IDevIDs (discussed in the next section).

Has this been an issue that people have had problems with for EKs?

@stringlytyped
Copy link
Author

The proposal is now very long going even into the direction of IDevID/IAK etc. It should be possible to treat this in a separate proposal.

That's a fair observation. I think it is important to consider DevIDs at this stage, as we need to develop trust/authentication mechanisms which are general enough that they can be applied to any cryptographic identity supported by Keylime today or in the future (in so far as they are reasonably foreseeable). Without this effort now, we risk having to make breaking changes later. For example, the JSON structures suggested for the webhook mechanism are informed directly by the need to consider multiple identities.

That said, the idea here is not to implement the DevID-specific parts right from the beginning: only EKs will be supported at first. But, when it is time to add support for DevIDs, this proposal minimises the amount of work that is needed at that stage (because the mechanisms for EKs will work almost unchanged for IDevIDs/IAKs also).

I am happy to break the DevID stuff out into a separate document, if you think that will improve readability.

Also there are quite a few directories now. In what case would I have to use ek_intermediate_certs versus ek_trusted_certs? Are we expecting to not be able to see the root certificate of an EK CA that we would need to use intermedate certs and end verification there?

The idea here was to mimic the trust stores present in operating systems and browsers which store a collection of intermediate certificates, the trust status of which is indeterminate and which are only used to establish a link between a leaf certificate and a root certificate (if such a cert exists in the collection of trusted root certs). This is needed for robust support of users with internal PKI.

However, on further reflection, it occurs to me that having separate trust stores for each type of identity is probably unnecessary. So, we could just have a single trusted_certs and intermediate_certs directory in which the user would place all their TLS-, EK- and DevID-related certificates.

This would allow the user to trust EKs on an individual basis by adding EKcerts to the trust store directly, instead of blindly trusting all EKs from a particular TPM manufacturer. The same mechanism can be used for verification of agent certificates when mTLS is enabled and for IDevIDs (discussed in the next section).

Has this been an issue that people have had problems with for EKs?

I am not sure if it has been an issue per se, but I would say that trusting all EKcerts from a certain TPM manufacturer should be discouraged in most cases (unless you have a reliable way of binding them to some other identifier). It seems that whitelisting specific EKs is something that KL has contemplated before, since there is the option today to use the hash of the EKpub as the node identifier (which has the same effect).

However, that requires the user to retrieve the EKpub from the node in question (and hash it themselves) when specifying the verification policy for that node (hopefully they aren't retrieving it from the registrar because then they would not have any assurance that the EK is actually from the expected node). And it also means that there is no "friendly" identifier for the node.

The feature/use case described in the quoted passage just provides an alternate option for such users which some might find easier/more flexible. And you get it for free by virtue of having EKs use the same trust store mechanism which is needed for mTLS anyway and will eventually be needed for DevID support.

@stefanberger
Copy link

The proposal is now very long going even into the direction of IDevID/IAK etc. It should be possible to treat this in a separate proposal.

I am happy to break the DevID stuff out into a separate document, if you think that will improve readability.

I think it would help since they should be dealt with individually.

Also there are quite a few directories now. In what case would I have to use ek_intermediate_certs versus ek_trusted_certs? Are we expecting to not be able to see the root certificate of an EK CA that we would need to use intermedate certs and end verification there?

The idea here was to mimic the trust stores present in operating systems and browsers which store a collection of intermediate certificates, the trust status of which is indeterminate and which are only used to establish a link between a leaf certificate and a root certificate (if such a cert exists in the collection of trusted root certs). This is needed for robust support of users with internal PKI.

However, on further reflection, it occurs to me that having separate trust stores for each type of identity is probably unnecessary. So, we could just have a single trusted_certs and intermediate_certs directory in which the user would place all their TLS-, EK- and DevID-related certificates.

This would allow the user to trust EKs on an individual basis by adding EKcerts to the trust store directly, instead of blindly trusting all EKs from a particular TPM manufacturer. The same mechanism can be used for verification of agent certificates when mTLS is enabled and for IDevIDs (discussed in the next section).

Has this been an issue that people have had problems with for EKs?

I am not sure if it has been an issue per se, but I would say that trusting all EKcerts from a certain TPM manufacturer should be discouraged in most cases (unless you have a reliable way of binding them to some other identifier). It seems that whitelisting specific EKs is something that KL has contemplated before, since there is the option today to use the hash of the EKpub as the node identifier (which has the same effect).

IMO there's currently nothing better out there than the certificates of the EKs along with the manufacturers' CAs to proof that you're using a TPM. While TPM 1.2 may have had the certs missing, they are now widely available on TPM 2. So if we cover >95% of TPMs (possibly more like 99%) with the current EKCerts method then adding support for IAK & IDevID just adds a 'different' method to it that seems to open up more trouble with certificate chains than if we didn't support it. Otherwise what are the shortcomings of proofing usage of a TPM than with EKCert? In which scenarios did we encounter issues where the EKcert was not good?

@stringlytyped
Copy link
Author

stringlytyped commented Aug 17, 2023

I think it would help since they should be dealt with individually.

Sure thing, I'll work on that.

IMO there's currently nothing better out there than the certificates of the EKs along with the manufacturers' CAs to proof that you're using a TPM. While TPM 1.2 may have had the certs missing, they are now widely available on TPM 2. So if we cover >95% of TPMs (possibly more like 99%) with the current EKCerts method then adding support for IAK & IDevID just adds a 'different' method to it that seems to open up more trouble with certificate chains than if we didn't support it. Otherwise what are the shortcomings of proofing usage of a TPM than with EKCert? In which scenarios did we encounter issues where the EKcert was not good?

I agree that relying on the TPM manufacturer certificate together with the EKcert is sufficient when you want assurance as to the authenticity and accuracy of measurements, i.e., "have these measurements been produced by trusted firmware/kernel-space code and are they free of tampering?"

However, TPM manufacturer certificates and EKcerts alone tell you next to nothing about the identity of the node from which a set of measurements originate (as the EKcert does not contain any useful identifying information unlike, e.g., a TLS certificate which contains the hostname of the website).

Keylime lets a user specify a verification policy for a given node, based on that node's identifier (UUID, hostname, etc.). In the default, out-of-the-box configuration (in which EKcerts are compared against a list of TPM manufacturer certs only), the verifier can be sure that it is receiving accurate measurements, but it has no idea whether those measurements are actually coming from the node with that UUID, hostname, or other identifier. So, a consumer of the information reported by the verifier knows the verification decision is "correct" (the policy has been applied to some measurements) but has no assurance that the decision is correct for the node with that identifier.

Because of this, ideally, the user should bind the EK to the node identifier via some out-of-band process (e.g., by querying their IaaS provider's API). And of course, trusting individual EKcerts does not actually achieve that on its own.

So, if the user connects to the node over a secure channel, extracts the EKcert, and places it in the registrar's trust store, they will have assurance that any given verification decision is correct for some legitimate node in the network. This is still a significantly stronger guarantee than "the verifier decision is correct for some node with a genuine TPM" (as that could include nodes belonging to an attacker).

But for assurance that an attacker hasn't swapped the measurements from two different nodes, it would be best to create a list of EKs/EKcerts, associated with their expected node identifier, and check this via the webhook mechanism.

This subtly is not clear in the quoted passage, so I'll fix that.

@stefanberger
Copy link

I think it would help since they should be dealt with individually.

Sure thing, I'll work on that.

IMO there's currently nothing better out there than the certificates of the EKs along with the manufacturers' CAs to proof that you're using a TPM. While TPM 1.2 may have had the certs missing, they are now widely available on TPM 2. So if we cover >95% of TPMs (possibly more like 99%) with the current EKCerts method then adding support for IAK & IDevID just adds a 'different' method to it that seems to open up more trouble with certificate chains than if we didn't support it. Otherwise what are the shortcomings of proofing usage of a TPM than with EKCert? In which scenarios did we encounter issues where the EKcert was not good?

I agree that relying on the TPM manufacturer certificate together with the EKcert is sufficient when you want assurance as to the authenticity and accuracy of measurements, i.e., "have these measurements been produced by trusted firmware/kernel-space code and are they free of tampering?"

However, TPM manufacturer certificates and EKcerts alone tell you next to nothing about the identity of the node from which a set of measurements originate (as the EKcert does not contain any useful identifying information unlike, e.g., a TLS certificate which contains the hostname of the website).

The EKcert verified against the manufacturer CA gives you assurance that you're dealing with a TPM and not some other type of device. Makecredential + activatecredential gives you assurance that the AIK being used is from a particular TPM.

Keylime lets a user specify a verification policy for a given node, based on that node's identifier (UUID, hostname, etc.). In the default, out-of-the-box configuration (in which EKcerts are compared against a list of TPM manufacturer certs only), the verifier can be sure that it is receiving accurate measurements, but it has no idea whether those measurements are actually coming from the node with that UUID, hostname, or other identifier. So, a consumer of the information reported by the verifier knows the verification decision is "correct" (the policy has been applied to
some measurements) but has no assurance that the decision is correct for the node with that identifier.

Because of this, ideally, the user should bind the EK to the node identifier via some out-of-band process (e.g., by querying their IaaS provider's API). And of course, trusting individual EKcerts does not actually achieve that on its own.

So, if the user connects to the node over a secure channel, extracts the EKcert, and places it in the registrar's trust store, they will have assurance that any given verification decision is correct for some legitimate node in the network. This is still a significantly stronger guarantee than "the verifier decision is correct for some node with a genuine TPM" (as that could include nodes belonging to an attacker).

But for assurance that an attacker hasn't swapped the measurements from two different nodes, it would be best to create a list of EKs/EKcerts, associated with their expected node identifier, and check this via the webhook mechanism.

I agree that for this we have been relying on trusted measurements that neither the kernel nor the agent perform some sort of forwarding/relay with another colluding node that actually holds the measurement list and has that TPM to which the EK belongs and produces the quotes.

However, we still need to assure that the device we're getting the quotes from is a TPM, so this requirement doesn't go away. Now on top of this we need trusted infrastructure tying a host's identity (agent UUID and/or networking credential like client cert) to the EK/EKCert. What I don't see is that IDevID/IAK make this easier, especially since they are not commonly available while EKs and EKcerts are.

This subtly is not clear in the quoted passage, so I'll fix that.

@stringlytyped
Copy link
Author

The EKcert verified against the manufacturer CA gives you assurance that you're dealing with a TPM and not some other type of device.

Sure, I agree. That's why you can trust the accuracy of the measurements. Because you trust that the TPM is implemented according to the spec.

I agree that for this we have been relying on trusted measurements that neither the kernel nor the agent perform some sort of forwarding/relay with another colluding node that actually holds the measurement list

Certainly, depending on how you have configured your verification policies, it may be possible to catch attacks of the sort I describe. But caveats apply. Preventing these attacks at the protocol level is much more reliable.

However, we still need to assure that the device we're getting the quotes from is a TPM, so this requirement doesn't go away.

Definitely not. But, if you whitelist individual EKs, that has the same effect as whitelisting all EKs from that particular manufacturer's TPMs. You are just applying your trust more narrowly.

Now on top of this we need trusted infrastructure tying a host's identity (agent UUID and/or networking credential like client cert) to the EK/EKCert.

True, but this infrastructure can be fairly simple if the user chooses (a CSV file with UUIDs and EKs in it, for example). And it can itself be verified by Keylime.

And nothing precludes the user from relying solely on TPM manufacturer certs, if they really want (although, again, I would discourage that). If the TPM manufacturer certs are present in the registrar's trust store and the agents are configured to use the EK hash as the identifier, there is no additional infrastructure needed. And, from the user's perspective, there is no difference in behaviour: one random identifier (the randomly-generated UUID) has just been swapped for another or, at least, one that appears random (the hash of the EK).

What I don't see is that IDevID/IAK make this easier [...]

DevIDs/IAKs/LAKs are tied to the node's identity, by definition. The Subject field uniquely identifies the device (usually by serial number but really it can be any useful user-facing identifier) and the CA that issues it is required to ensure that's the case. So, you get that binding (and thus proper entity authentication) for free out of the box.

In other words, if you have an IDevID and IAK cert, you can place your device manufacturer's CA cert in the registrar's trust store, like you can put your TPM manufacturer's CA cert in the trust store. The former gives you all the benefits of the later + cryptographic binding to a logical identifier. And this is without any additional infrastructure (other than Keylime itself).

[...] especially since they are not commonly available while EKs and EKcerts are.

You are right that IDevIDs/IAKs are not as widely available as EKcerts. But there are indications they are gaining traction:

  • NIST considers device identity to be a core component of integrity verification and gives HPE-issued IDevIDs as an example of how to meet that requirement [1].
  • The IoT Security Foundation, an IoT industry group, also requires binding of a hardware root of trust to a logical identifier [2].
  • NIST is putting work into increasing IDevID adoption in the IoT space [3] and Microsoft, Infineon and others have already started using it for this purpose [4].
  • Dell [5] and Huawai [6] have started issuing IDevIDs for certain uses.

Regardless, DevIDs are not intended to supersede EKs in Keylime. And by considering the possibility of other root identities now, at the protocol level, we open the door to support other future standards for device identity, TEE-related credentials, etc.


[1] NIST SP-1800-34: Validating the Integrity of Computing Devices, December 2022 https://csrc.nist.gov/pubs/sp/1800/34/final
[2] IoT Security Assurance Framework, November 2021 https://iotsecurityfoundation.org/wp-content/uploads/2021/11/IoTSF-IoT-Security-Assurance-Framework-Release-3.0-Nov-2021-1.pdf
[3] Trustworthy Networks of Things project, NIST, April 2022 https://www.nist.gov/programs-projects/trustworthy-networks-things
[4] Securing the IoT begins with Zero-Touch Provisioning at Scale whitepaper, Microsoft, April 2021 https://azure.microsoft.com/mediahandler/files/resourcefiles/secure-iot-begins-with-zero-touch-provisioning-at-scale/INF1037_ART%20Secure%20IoT%20Whitepaper.pdf
[5] Port-based Network Access Control (IEEE 802.1x), iDRAC 9 User Guide, Dell https://www.dell.com/support/manuals/en-uk/idrac9-lifecycle-controller-v6.x-series/idrac9_6.xx_ug/port-based-network-access-control-ieee-8021x?guid=guid-eedbd0a3-b63e-4a2c-a63f-e573e6c6904d&lang=en-us&lwp=rt
[6] SZTP Configuration, NetEngine Configuration Guide, Huawai, November 2022 https://support.huawei.com/enterprise/en/doc/EDOC1100279002/1ab3a718/sztp-configuration

@stefanberger
Copy link

However, we still need to assure that the device we're getting the quotes from is a TPM, so this requirement doesn't go away.

Definitely not. But, if you whitelist individual EKs, that has the same effect as whitelisting all EKs from that particular manufacturer's TPMs. You are just applying your trust more narrowly.

Now on top of this we need trusted infrastructure tying a host's identity (agent UUID and/or networking credential like client cert) to the EK/EKCert.

True, but this infrastructure can be fairly simple if the user chooses (a CSV file with UUIDs and EKs in it, for example). And it can itself be verified by Keylime.

I would say client side certs with hostname/IP (and possibly UUIDs?) in them and EKs that each tie the two together via this CSV file that the IT department issues. This way the verifier knows that the client it talks to and the client IP address (resolved via DNS) is colocated with a particular TPM/EKcert. In this case the IT department can issue CSV files with any AIK and hostname/IP (+UUID) combination and we could regard it as fact.

And nothing precludes the user from relying solely on TPM manufacturer certs, if they really want (although, again, I would discourage that). If the TPM manufacturer certs are present in the registrar's trust store and the agents are configured to use the EK hash as the identifier, there is no additional infrastructure needed. And, from the user's perspective, there is no difference in behaviour: one random identifier (the randomly-generated UUID) has just been swapped for another or, at least, one that appears random (the hash of the EK).

What I don't see is that IDevID/IAK make this easier [...]

DevIDs/IAKs/LAKs are tied to the node's identity, by definition. The Subject field uniquely identifies the device (usually by serial number but really it can be any useful user-facing identifier) and the CA that issues it is required to ensure that's the case. So, you get that binding (and thus proper entity authentication) for free out of the box.

It seems to be the SubjectAltName per table 22 in this https://trustedcomputinggroup.org/wp-content/uploads/TCG_IWG_DevID_v1r2_02dec2020.pdf that identifies a device as TPM 2 and we have to check for this along with CA verification.

Per table 21 the Subject field 'SHOULD include Serial Number attribute' -- SHOULD.

If we go back to the example of a collusion between two devices where A relays quote requests to B, how do I get the binding in this case without binding the TPM to A and knowing that I am talking to A so that I am sure that TPM on A is where the quotes are from? I guess the client TLS cert must contain the TPM serial number as well, but then availability of the serial number is only a 'SHOULD' for the TPM cert.

If it was a MUST then I would say that the It dept. CVS file wasn't necessary, but it's only a 'SHOULD'.

In other words, if you have an IDevID and IAK cert, you can place your device manufacturer's CA cert in the registrar's trust store, like you can put your TPM manufacturer's CA cert in the trust store. The former gives you all the benefits of the later + cryptographic binding to a logical identifier. And this is without any additional infrastructure (other than Keylime itself).

Where else does the logical identifier appear again? Would it be in the client TLS cert to tie network endpoint and TPM together?

[...] especially since they are not commonly available while EKs and EKcerts are.

You are right that IDevIDs/IAKs are not as widely available as EKcerts. But there are indications they are gaining traction:

* NIST considers device identity to be a core component of integrity verification and gives HPE-issued IDevIDs as an example of how to meet that requirement [1].

* The IoT Security Foundation, an IoT industry group, also requires binding of a hardware root of trust to a logical identifier [2].

* NIST is putting work into increasing IDevID adoption in the IoT space [3] and Microsoft, Infineon and others have already started using it for this purpose [4].

* Dell [5] and Huawai [6] have started issuing IDevIDs for certain uses.

Regardless, DevIDs are not intended to supersede EKs in Keylime. And by considering the possibility of other root identities now, at the protocol level, we open the door to support other future standards for device identity, TEE-related credentials, etc.

Fine. Though we need to clearly state what the advantages are of using the other identifiers that cannot be achieved with EK/EKcerts alone.

Thanks for the links. I may have a look at some of them.

@stringlytyped
Copy link
Author

I would say client side certs with hostname/IP (and possibly UUIDs?) in them and EKs that each tie the two together via this CSV file that the IT department issues. This way the verifier knows that the client it talks to and the client IP address (resolved via DNS) is colocated with a particular TPM/EKcert. In this case the IT department can issue CSV files with any AIK and hostname/IP (+UUID) combination and we could regard it as fact.

If this is a desirable deployment model, it is easy to tweak the proposal to include that. Simply, if the registrar receives an EKcert from an agent, the EKcert is signed by a CA certificate in the trust store (in this case, the CA will be an internal IT department CA), and the EKcert's Subject field contains reference to the node identifier, then the registrar will both trust the EK and consider it bound to the identifier.

Still, at the end of the day, the binding between EK and node identifier must be established by additional infrastructure outside Keylime.

It seems to be the SubjectAltName per table 22 in this https://trustedcomputinggroup.org/wp-content/uploads/TCG_IWG_DevID_v1r2_02dec2020.pdf that identifies a device as TPM 2 and we have to check for this along with CA verification.

Sure, we can do that.

Per table 21 the Subject field 'SHOULD include Serial Number attribute' -- SHOULD.

You are right, the serial number is not strictly required (which is why I said the subject field usually contains the serial number). However, the whole point of DevIDs is to identify devices: so, in practice, the subject should always contain some user-facing identifier unique to the device, whether that's a serial number, or "service tag" or whatever.

From page 55: "In compliance with IEEE 802.1AR [1], Section 8.6, OEMs creating DevIDs MUST uniquely identify the device within
the issuer’s domain of significance. This field MUST contain a unique X.500 Distinguished Name (DN). The subject
field’s DN encoding SHOULD include the “serialNumber” attribute with the device’s unique serial number."

But yes, if the serial number is not present, then the registrar would not be able to bind the DevID to the node identifier automatically and you would need the webhook mechanism to check the binding.

If we go back to the example of a collusion between two devices where A relays quote requests to B, how do I get the binding in this case without binding the TPM to A and knowing that I am talking to A so that I am sure that TPM on A is where the quotes are from?

When a IDevID/IAK is used for a device, the IAK becomes the root identity of that device in place of the EK. The way that trust trickles down is therefore different. You trust the device manufacturer to have generated the IAK using the TPM of the device described by the cert's Subject field.

You are not supposed to try and link the IDevID/IAK to an EK via the subjectAltName. The binding between the IDevID/IAK and the TPM can be assumed implicitly by trusting the issuer.

Where else does the logical identifier appear again? Would it be in the client TLS cert to tie network endpoint and TPM together?

If client TLS certs are used, then yes, the node identifier should be present in the certificates. They are also used by the verifier to associate policies with nodes. And in the protocols to allow nodes to self-identify themselves to other components in the system.

@stefanberger
Copy link

I would say client side certs with hostname/IP (and possibly UUIDs?) in them and EKs that each tie the two together via this CSV file that the IT department issues. This way the verifier knows that the client it talks to and the client IP address (resolved via DNS) is colocated with a particular TPM/EKcert. In this case the IT department can issue CSV files with any AIK and hostname/IP (+UUID) combination and we could regard it as fact.

If this is a desirable deployment model, it is easy to tweak the proposal to include that. Simply, if the registrar receives an EKcert from an agent, the EKcert is signed by a CA certificate in the trust store (in this case, the CA will be an internal IT department CA), and the EKcert's Subject field contains reference to the node identifier, then the registrar will both trust the EK and consider it bound to the identifier.

Still, at the end of the day, the binding between EK and node identifier must be established by additional infrastructure outside Keylime.

Correct. IMO we would have to issue a TLS certificate that carries a TPM identifier in some way, like hash(EKpub), that then ties the network endpoint to the TPM on that endpoint, ideally using the private key in the TPM for the TLS connection.

It seems to be the SubjectAltName per table 22 in this https://trustedcomputinggroup.org/wp-content/uploads/TCG_IWG_DevID_v1r2_02dec2020.pdf that identifies a device as TPM 2 and we have to check for this along with CA verification.

Sure, we can do that.

Per table 21 the Subject field 'SHOULD include Serial Number attribute' -- SHOULD.

You are right, the serial number is not strictly required (which is why I said the subject field usually contains the serial number). However, the whole point of DevIDs is to identify devices: so, in practice, the subject should always contain some user-facing identifier unique to the device, whether that's a serial number, or "service tag" or whatever.

From page 55: "In compliance with IEEE 802.1AR [1], Section 8.6, OEMs creating DevIDs MUST uniquely identify the device within the issuer’s domain of significance. This field MUST contain a unique X.500 Distinguished Name (DN). The subject field’s DN encoding SHOULD include the “serialNumber” attribute with the device’s unique serial number."

But yes, if the serial number is not present, then the registrar would not be able to bind the DevID to the node identifier automatically and you would need the webhook mechanism to check the binding.

If we go back to the example of a collusion between two devices where A relays quote requests to B, how do I get the binding in this case without binding the TPM to A and knowing that I am talking to A so that I am sure that TPM on A is where the quotes are from?

When a IDevID/IAK is used for a device, the IAK becomes the root identity of that device in place of the EK. The way that trust trickles down is therefore different. You trust the device manufacturer to have generated the IAK using the TPM of the device described by the cert's Subject field.

You are not supposed to try and link the IDevID/IAK to an EK via the subjectAltName. The binding between the IDevID/IAK and the TPM can be assumed implicitly by trusting the issuer.

Understood.

The current public TPM IAK/IDevID specs don't show an NVRAM index for the IAK and IDevID cert. Where are people getting these certs from, do you know? This seems critical at this point that the 2 certs and the CA are available.

Where else does the logical identifier appear again? Would it be in the client TLS cert to tie network endpoint and TPM together?

If client TLS certs are used, then yes, the node identifier should be present in the certificates. They are also used by the verifier to associate policies with nodes. And in the protocols to allow nodes to self-identify themselves to other components in the system.

In my understanding we could rely on the IAK + IAK cert but we need to check its cert chain and check that its SAN has hwType '2.23.133.1.2' so we don't end up accepting a cert that is not from a TPM device. If we now wanted to move the security needle a bit then we have to use the IDevID for TLS and the IDevID has to have matching fields (hwSerialNum) with the IAK so we know its from the same device. Therefore, the TPM has to be involved in the TLS connection for which the TLS stack now has to be able to use the the IDevID key. If we use any other TLS cert then we haven't gained anything. Correct?

@stringlytyped
Copy link
Author

stringlytyped commented Sep 5, 2023

@stefanberger Sorry for the radio silence; I was away from work.

Correct. IMO we would have to issue a TLS certificate that carries a TPM identifier in some way, like hash(EKpub), that then ties the network endpoint to the TPM on that endpoint, ideally using the private key in the TPM for the TLS connection.

In the latest revision of the document (above), I've generalised this and included it in the section on mTLS-based authentication: Additionally, all endpoints which accept authentication via mTLS should check that the expected node identifier is contained within either Subject or Subject Alternative Name fields of the presented certificate to ensure that there is a binding to the identifier.

I've also clarified the limitations of relying solely on TPM2_Certify to bind the mTLS certificate to the TPM at registration time in footnote [2].

The current public TPM IAK/IDevID specs don't show an NVRAM index for the IAK and IDevID cert. Where are people getting these certs from, do you know? This seems critical at this point that the 2 certs and the CA are available.

We discussed this in the last Keylime community meeting, so I won't re-hash that again. But hopefully the situation changes in the near future such that there is a standardised way of obtaining IDevID/IAK certs.

I have extracted most of the DevID-related content from the push proposal above and put it in a separate document here: 802.1AR Secure Device Identity and the Push Model. I've included discussion about how IDevID/IAK certs are obtained and can be made available to Keylime in the Obtaining IDevID/IAK Certificates section.

In my understanding we could rely on the IAK + IAK cert but we need to check its cert chain and check that its SAN has hwType '2.23.133.1.2' so we don't end up accepting a cert that is not from a TPM device.

Sounds reasonable; we can certainly do that.

If we now wanted to move the security needle a bit then we have to use the IDevID for TLS and the IDevID has to have matching fields (hwSerialNum) with the IAK so we know its from the same device.

If you wanted to establish a binding between the IDevID and IAK (which you would need if you are using the IDevID for some signing task, like to secure a TLS connection, as you point out), checking that the Subjects of both certificates match would be necessary, I agree. I think it would also be good to check that both certificates are signed by the same CA

Therefore, the TPM has to be involved in the TLS connection for which the TLS stack now has to be able to use the the IDevID key.

There is actually an extension to OpenSSL which I believe enables the use of TPM-stored keys to secure TLS connections. But getting the web stack used by Keylime to use OpenSSL in that way might be hard (but I don't really know, I've not given it much thought). Also, not sure if the performance limitations of TPMs would make this impractical or not.

If we use any other TLS cert then we haven't gained anything.

If the agent authenticates to the server using mTLS, it is best if whatever private key it uses to secure the connection is resident in the TPM. But it is not the end of the world either if it is not. This is because what an agent is authorised to do is limited:

  • It can register itself or update its existing registration at the registrar. This is not an issue because whatever information is provided to the registrar is verified before it is trusted.
  • It can retrieve the information it needs from the verifier to prepare an attestation, like the PCR mask and IMA offset. This may not be something which you would want public for the whole world to see, but it is also not terribly sensitive information: an attacker would not be able to achieve anything with this.
  • It can send an attestation to the verifier. This is authenticated against the AK which, of course, is resident in the TPM.

That said, it would be cool if Keylime could use a private key stored in the TPM to secure mTLS connections (whether an IDevID is used as the certificate or the user provides their own cert, obtained by generating a CSR using the OpenSSL TPM extension). But that's beyond the scope of the push proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment