Skip to content

Instantly share code, notes, and snippets.

@THS-on
Last active March 4, 2022 17:37
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save THS-on/aedfd139ac1cb012745abeb0276d5e5c to your computer and use it in GitHub Desktop.
Save THS-on/aedfd139ac1cb012745abeb0276d5e5c to your computer and use it in GitHub Desktop.
Keylime Push Model

Push Model for Keylime

Issue

Keylime currently operates on a pull basis which means that the tenant or verifier connect to the agent to collect attestation data. Therefore they need to know the IP and Port to connect to beforehand and this currently cannot change during attestation. This works fine in most virtualized environments where all the devices are in the same network, but not for edge devices or in BYOD contexts. There are workarounds using VPNs/overlay networking using OpenVPN, ZeroTier, Nebula etc. but none of them provide an ideal solution.

Actions that require connections to the agent

  • Identity quote: The purpose of the identity quote is to prove to the tenant that the NK (also called transport key) belongs to the same TPM as the agent. The NK is used for encrypting the U and V key during transport and is the also the key of mTLS certificate of the agent. The tenant uses this feature. This is also done to ensure that the agent behind that IP is still the same that registered by validating the quote against the registered AK.
  • Integrity quote: The purpose of the integrity quote is to get a TPM quote with all the necessary data for attestation (PCR values, UEFI log, IMA log)
  • Sending the payload and U key: After the tenant validated the identity quote, it can sent a payload (like a small cloud-init) to the agent. The payload is encrypted with a key that is split into a U and V part. The U part is sent from the tenant to say I want to bootstrap this agent and the V part will be sent by the verifier if initial attestation was successful.
  • Sending the V key: The V key is sent by the verifier if initial attestation was successful
  • (Checking if UV decryption was successful)

Design

First we remove all unnecessary interactions with the agent.

Removing the identity quote

Motivation: Removes one interaction with the agent and the registrar already is trusted for the make/activate credential process for the AK.

We want to replace the two functions of the identity quote with a new mechanism that does not require a separate connection to the agent.

1. Proving that the NK belongs to the same EK/AK

Instead of using a resettable PCR like PCR 16 and generating a quote, we can load the NK temporarily in the TPM use TPM2_Ceritfy to generate signature with AK to proove that the NK belongs to the same TPM.

2. Verifying that this is still the same agent

This is mostly relevant for the payload mechanism. Here we do not want to sent a payload to the wrong agent. In the current model the agent cannot decrypt the payload if the AK changes they will never get the V key from the registrar. This still holds after eliminating the identity quote. The identity confirmation of the NK is moved from the tenant to the registrar. This does not change the trust model, because we already trust the registrar for the AK belonging the EK.

In the push model the registrar becomes the main contact point with the server components.

New Registration Protocol

The registration is a three way protocol.

First the agent sends the following information (new) over HTTPS using the mTLS certificate:

  1. Agent UUID
  2. EK certificate (normally provided by the manufacturer)
  3. AK: Used for signing the quotes
  4. new Public portion of the NK loaded on the TPM (pubkey, attributes, name etc.) and a signature of TPM2_Certify for the NK.
  5. mTLS certificate (contains the public portion of the NK)
  6. Contact IP/port (only relevant for the pull model)

Then the registrar then does the following:

  1. new check if the mTLS certificate and the one used for authentication match
  2. (new optional) run user provided checks on UUID, EK, mTLS certificate. (For example only allow agents to register where the mTLS certificate is signed by a specific CA)
  3. Verify that the NK signature matches the public portion of the NK and the AK
  4. Generate make credential challenge for AK
  5. new Save the registration data in the DB with the challenge values as the primary key
    1. This allows for multiple agents start the registration process for the same UUID but only the one that completes the challenge gets it.
  6. Return the challange as a response of the agent initial request.

Next the agent does:

  1. Do activate credential for and AK with the challenge provided by the registrar
  2. Sent the the challenge values to the registrar (new) over HTTPS with the mTLS certificate

To complete the registration process:

  1. Check if the sent challenge values match any open registration
  2. check if mTLS certificate matches the agent that started the registration with the provided challenge values
  3. Mark agent as registered and allow the agent to use the registered UUID

Note that all state is in stored in a DB, so that the registrar can be easily scaled.

Agent polling the Registrar

Once the agent is registered it polls the registrar for the following information:

  • Is there a verifier active where attestation data should be sent
  • Is attestation stopped for a specific verifier
  • Is there a payload to download and where can it be found

Agent pushing Attestation Data to a Verifier

Once the agent got the information that a verifier wants attestation data it starts pushing to the verifier.

This is done in three steps:

  1. Agent connects the the verifier to get what information should be sent
  2. Verifier responds with PCR selection, nonce, starting points for incremental attestation. Also potentially the V key if the first attestation was sucessful.
  3. Agent pushes quote and required data to the verifier

Changes to the Verifier

Only the event loop and REST API interface require major changes to support the push model.

Event Loop

The push event loop is very simple. Check if after a grace period the agent has pushed data and then check that the pushes from the agent match the push interval.

REST API

The agent connects to a endpoint like: /agent/{UUID} this only works with the agent uses mTLS with the same certificate provided during registration. If authentication was successful the verifier responds with

  • Nonce
  • PCR selection
  • Next entry for IMA incremental attestation

Then the agent collects the necessary data and posts it to the same endpoint. Here the verifier needs to check if the time period between providing the data for the agent and receiving the attestation data is not too long.

Persisting Agent State

In the pull model not the entire agent state is committed to the DB because there was no need to do that. To make the push model easier scalable the entire agent state must be committed to the DB.

Managing Agents with the Tenant

The user interface for managing a agent only has minor changes, but the steps done by the tenant change.

Adding an agent to a verifier with a payload

Input: agent UUID, unencrypted data for the payload, which verifier should be used, policies (IMA, measured boot, static PCRs)

Steps:

1. Connect to the registrar and retrieve agent information
2. Check if there were two registrations with the same UUID but different EKs *TODO: check if this is still necessary or if we fully move that feature into the registrar*
3. *(Optional)* Validate EK against cert store
4. *(New optional)* Validate registrar data using custom scripts (was only possible for the EK before)
5.  Generate U and V key for payload and encrypt all of them with the NK
6. Add the agent to the verifier with the following data: UUID, mTLS certificate, AK, V key, policies, push interval, grace period. The grace period is there to give the agent the chance to notice the verifier wants attestation data and not failing it automatically.
7. Now notify the agent by adding the necessary data to the registrar:
 	1. Add to the entry of the agent that the verifier wants attestation data with the given push interval
 	2. Upload the payload to the registrar for the agent to download

Note that all the steps for revocations are just part of the payload generation and therefore ignored in the above steps.

When the agent now finds this new information it does the following:

  1. Starts pushing attestation data to the verifier
  2. Downloads the payload
  3. Decrypts the payload once it also has the V key and marks decryption successful in the registrar

Remove agent from attestation

Input: agent UUID, verifier

Steps:

  1. Mark the verifier as no longer interested in attestation data at the registrar
  2. Remove the agent from the verifier (should not require changes to the current API)

Verifying that payload decryption was successful

This can be done with a lookup at the registrar.

Notes on Authentication

We already have a CA for Keylime that can be used by the agent to verify connections from and to the verifier/tenant/registrar. This can be reused for the agent to verify if it actually trusts those components.

On the server side we want the agent to authenticate itself with the mTLS certificate provided during the registration process. In practice we noticed that doing this is not really a good idea doing that with the web server frameworks written in Python. Instead authentication and validation of the client certificate should be done by reverse proxies like nginx and passed as an HTTP header. This makes also simpler for load balancing and putting the registrar and verifier on the Internet.

General Considerations

@DanielFroehlich
Copy link

Looks very good to me, thanks for writing this down! Just one comment which is not totally clear to me:

In the agent-push model, the agent always has to initiate the mTLS connection to the server component. In the "new protocol section", it says e.g. "Send challenge to agent" - that is in response to a previous request on the existing mTLS connection, correct?

@THS-on
Copy link
Author

THS-on commented Feb 11, 2022

In the agent-push model, the agent always has to initiate the mTLS connection to the server component. In the "new protocol section", it says e.g. "Send challenge to agent" - that is in response to a previous request on the existing mTLS connection, correct?

Yes this the response to the POST request that the agent initiated. (The current registration process also already does this)

I've updated the document to make that more clear.

@THS-on
Copy link
Author

THS-on commented Feb 14, 2022

As discussed using mTLS might be challenging in some environments. Instead we should use a more generic way for authentication like API keys which are issued by the registrar. If we build the correct abstraction around that mTLS certificate can then be a special form of an API key.

Adding an abstraction on how the authentication of the agents against Keylime has the benefit of possibly reusing also other existing authentication mechanisms already in place for that device.

I still want to keep the NK in some form for encrypting the payloads in a way so that (non rouge) registrar and verifier cannot decrypt them.

@kgold2
Copy link

kgold2 commented Feb 25, 2022

Issue: I would say that the bigger issue is security, not connection information. The attesting device does not want to run a web server, and it does not want to open a port through its firewall to the internet. A secondary issue is power consumption in a battery powered device. The attestor may be powered down at times.

I still wonder about the whole U,V design. While I saw it in the original MIT paper, it appears have nothing to do with attestation and therefore with the core keylime application. Is it being removed (Design section)?

I understand that NK is being used for the TLS session. A simple way to prove that it comes from the same TPM is TPM2_Certify. It's easier than make/activate.

Consider that the registrar is very security sensitive, what does the statement "In the push model the registrar becomes the main contact point with the server components." mean?

@THS-on
Copy link
Author

THS-on commented Feb 26, 2022

@kgold2 thank you for taking a closer look at this draft.

Issue: I would say that the bigger issue is security, not connection information. The attesting device does not want to run a web server, and it does not want to open a port through its firewall to the internet. A secondary issue is power consumption in a battery powered device. The attestor may be powered down at times.

I agree you ideally do not want to open a web server on devices if it is not required. This proposal will fix that. We just ran in practise into the problem that devices are behind a NAT first, before we considered the security impact of running a web server on the device.

I still wonder about the whole U,V design. While I saw it in the original MIT paper, it appears have nothing to do with attestation and therefore with the core keylime application.

The idea behind the U,V design is, to bootstrap an agent/device once they pass initial attestation. The U part of the key with the payload is sent to (or collected in the case of the push model) the agent by the tenant to say I want to bootstrap this agent. The V part is sent by the verifier once the device passes initial attestation. The payload is currently used for distributing revocation actions, the CA for revocations and custom files (In the case of the Lernstick we deploy a token that the device can use to authenticate against other services).
This could be also implemented by other bootstrapping/provisioning tools with API hooks added to Keylime.

Is it being removed (Design section)?

No the payload mechanism will still be a part of Keylime, but it should be optional.

I understand that NK is being used for the TLS session. A simple way to prove that it comes from the same TPM is TPM2_Certify. It's easier than make/activate.

Yes, TPM2_Certify a better way to do that. Thanks for the hint.

Consider that the registrar is very security sensitive, what does the statement "In the push model the registrar becomes the main contact point with the server components." mean?

We need to have some contact point for agent on the server side for polling for new data. In the current Keylime model it makes sense to extend the registrar to be that role. This can be split into two services. The registrar with the same function as before and another service that handles the constant polling of the agents (and providing a way to download the payloads). The split is probably a good idea, to keep the registrar as simple as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment