Skip to content

Instantly share code, notes, and snippets.

@armon
Last active December 26, 2015 13:29
Show Gist options
  • Save armon/7159161 to your computer and use it in GitHub Desktop.
Save armon/7159161 to your computer and use it in GitHub Desktop.

Serf Security Model

Relevant branch: https://github.com/hashicorp/memberlist/compare/f-encrypt

The security model used by Serf is designed to provide confidentiality, integrity and authentication. Below is the threat model considered for the design of the model. The security model is built on around a symmetric key, or shared secret system. All members of the Serf cluster must be provided the shared secret ahead of time. This places the burden of key distribution on the user.

To support confidentiality, all messages are encrypted using the AES-128 standard. The AES standard is considered one of the most secure and modern encryption standards. Additionally, it is a fast algorithm, and modern CPUs provide hardware instructions to make encryption and decryption very lightweight. Because AES works on block sizes of 16 bytes, we make use of the PKCS7 padding algorithm.

AES is used with the Galois Counter Mode (GCM), using a randomly generated nonce. The use of GCM additionally provides message integrity, as the ciphertext is suffixed with a 'tag' that is used to verify message integrity before decryption.

Message Format

In the overview we describe the various crypto primitives that are used. In this section we cover how messages are framed on the wire and interpretted to ensure confidentiality, integrity and authentication are provided.

UDP Message Format

UDP messages do not require any framing since they are packet oriented. This allows the message to be somewhat simpler and saves some space. The format is as follows:

-------------------------------------------------------------------
| Version (byte) | Nonce (12 bytes) | CipherText | Tag (16 bytes) |
-------------------------------------------------------------------

The UDP message thus has a minimum overhead of 29 bytes, plus up to an additional 15 bytes of padding or 44 bytes. There is no length specified, since the UDP packet is already framed. Tampering or bit corruption of any of the packet will cause the GCM tag verification to fail.

Once we receive a packet, we first verify the GCM tag, and only on verification, continue to decrypt the payload. The version byte is provided to allow future versions to change the algorithm they use. It is currently always set to 0.

TCP Message Format

TCP provides a stream abstraction and therefor we must provide our own framing. This is trickey as it is a potential attack vector. We cannon verify the tag until the entire message is received, and we must be provided the length in plaintext. Our current approach is to limit the maximum size of a framed message to 10MB to prevent an enormous amount of data being sent causing a Denial of Service. The wire format is as follows:

-------------------------------------------------------------------------------------------------------
| MsgType (byte) | Length (4 bytes) | Version (byte) | Nonce (12 bytes) | CipherText | Tag (16 bytes) |
-------------------------------------------------------------------------------------------------------

The TCP format is very similar to the UDP format, but it prepends the message with a message type byte (similar to other Serf messages). It also adds a 4 byte length field, encoded in Big Endian format. This increases its maximum overhead to 49 bytes.

When we first receive a TCP encrypted message, we check the message type. If any party has encryption enabled, the other party must as well. Otherwise we are vulnerable to a downgrade attack where one side can force the other into a non-encrypted mode of operation.

Once this is verified, we determine the message length and if it is less than our 10MB limit, read in the rest of the message. The tag that is provided verifies the entire payload, including the message type and length, ensuring that nothing has been tampered with.

Threat Model

The following are the various aspects of our threat model:

  • Non-members getting access to events or membership information
  • Cluster state corruption due to malicious messages being processed
  • Fake event generation due to malicious messages
  • Tampering of messages causing state corruption
  • Denial of Service against a node

As with most security systems, no system is unbreakable. Our goal is not to protect top secret data but to provide a "reasonable" level of security that would require an attacker to commit a considerable amount of resources to defeat.

It is worth mentioning that we are specifically not concerned about replay attacks, as the gossip protocol is designed to handle that due to the nature of its broadcast mechanism.

Future Consideration

Some considerations for the future are:

  • Using the Version field to change the algorithm we use
  • Supporting different algorithms via configuration
  • Supporting key rotation
  • Cluster membership can be inferred by observing random probing

Appendix

  • Node communication is modeled after the SWIM protocol
    • Periodic health probing over UDP of random nodes
    • Frequent gossip over UDP to random nodes
    • Infrequent state push/pull over TCP to random nodes
@armon
Copy link
Author

armon commented Oct 26, 2013

I see what you are saying. If we mandate that we are provided with a key instead of a password, I think it is reasonable to dump PBKDF2, and instead provide links to help the user. (Or better yet, just add serf key-gen that will automatically generate a key using /dev/random or similar).

Lets just dump it, mandate a key is provided, and provide a CLI level tool to generate keys.

@armon
Copy link
Author

armon commented Oct 26, 2013

Updated the Gist to reflect the latest protocol decisions

@mitchellh
Copy link

Agreed. Makes total sense.

@sit
Copy link

sit commented Oct 26, 2013

The doc could use more description of the threat model and communication patterns (or links to where that is described). What sort of attacks and what strength of attacker should it defend against?

I would call out that authentication here is limited to the sense of "group membership". There is no sense of identity here and compromise of a single host will contaminate the entire cluster; there's no way to revoke membership or trace the provenance of messages.

A single secret is used for everything: integrity/confidentiality, and authentication. It may be worth separating this out so that they can be rotated independently. The doc mentions key rotation for the future; having some form of PFS and key rotation would definitely be useful. I'm not up-to-date on the literature but a search for "secure group communication" shows there is some research on systems that may provide some of these properties. Having a threat model will help decide how important this is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment