Skip to content

Instantly share code, notes, and snippets.

@armon
Last active December 26, 2015 13:29
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save armon/7159161 to your computer and use it in GitHub Desktop.
Save armon/7159161 to your computer and use it in GitHub Desktop.

Serf Security Model

Relevant branch: https://github.com/hashicorp/memberlist/compare/f-encrypt

The security model used by Serf is designed to provide confidentiality, integrity and authentication. Below is the threat model considered for the design of the model. The security model is built on around a symmetric key, or shared secret system. All members of the Serf cluster must be provided the shared secret ahead of time. This places the burden of key distribution on the user.

To support confidentiality, all messages are encrypted using the AES-128 standard. The AES standard is considered one of the most secure and modern encryption standards. Additionally, it is a fast algorithm, and modern CPUs provide hardware instructions to make encryption and decryption very lightweight. Because AES works on block sizes of 16 bytes, we make use of the PKCS7 padding algorithm.

AES is used with the Galois Counter Mode (GCM), using a randomly generated nonce. The use of GCM additionally provides message integrity, as the ciphertext is suffixed with a 'tag' that is used to verify message integrity before decryption.

Message Format

In the overview we describe the various crypto primitives that are used. In this section we cover how messages are framed on the wire and interpretted to ensure confidentiality, integrity and authentication are provided.

UDP Message Format

UDP messages do not require any framing since they are packet oriented. This allows the message to be somewhat simpler and saves some space. The format is as follows:

-------------------------------------------------------------------
| Version (byte) | Nonce (12 bytes) | CipherText | Tag (16 bytes) |
-------------------------------------------------------------------

The UDP message thus has a minimum overhead of 29 bytes, plus up to an additional 15 bytes of padding or 44 bytes. There is no length specified, since the UDP packet is already framed. Tampering or bit corruption of any of the packet will cause the GCM tag verification to fail.

Once we receive a packet, we first verify the GCM tag, and only on verification, continue to decrypt the payload. The version byte is provided to allow future versions to change the algorithm they use. It is currently always set to 0.

TCP Message Format

TCP provides a stream abstraction and therefor we must provide our own framing. This is trickey as it is a potential attack vector. We cannon verify the tag until the entire message is received, and we must be provided the length in plaintext. Our current approach is to limit the maximum size of a framed message to 10MB to prevent an enormous amount of data being sent causing a Denial of Service. The wire format is as follows:

-------------------------------------------------------------------------------------------------------
| MsgType (byte) | Length (4 bytes) | Version (byte) | Nonce (12 bytes) | CipherText | Tag (16 bytes) |
-------------------------------------------------------------------------------------------------------

The TCP format is very similar to the UDP format, but it prepends the message with a message type byte (similar to other Serf messages). It also adds a 4 byte length field, encoded in Big Endian format. This increases its maximum overhead to 49 bytes.

When we first receive a TCP encrypted message, we check the message type. If any party has encryption enabled, the other party must as well. Otherwise we are vulnerable to a downgrade attack where one side can force the other into a non-encrypted mode of operation.

Once this is verified, we determine the message length and if it is less than our 10MB limit, read in the rest of the message. The tag that is provided verifies the entire payload, including the message type and length, ensuring that nothing has been tampered with.

Threat Model

The following are the various aspects of our threat model:

  • Non-members getting access to events or membership information
  • Cluster state corruption due to malicious messages being processed
  • Fake event generation due to malicious messages
  • Tampering of messages causing state corruption
  • Denial of Service against a node

As with most security systems, no system is unbreakable. Our goal is not to protect top secret data but to provide a "reasonable" level of security that would require an attacker to commit a considerable amount of resources to defeat.

It is worth mentioning that we are specifically not concerned about replay attacks, as the gossip protocol is designed to handle that due to the nature of its broadcast mechanism.

Future Consideration

Some considerations for the future are:

  • Using the Version field to change the algorithm we use
  • Supporting different algorithms via configuration
  • Supporting key rotation
  • Cluster membership can be inferred by observing random probing

Appendix

  • Node communication is modeled after the SWIM protocol
    • Periodic health probing over UDP of random nodes
    • Frequent gossip over UDP to random nodes
    • Infrequent state push/pull over TCP to random nodes
@c00w
Copy link

c00w commented Oct 25, 2013

This doesn't solve what I want to use serf for. Here is a trivial example.

We have web servers, application servers and databases.

The web servers should only know where the application servers are.
The application servers should only know where the database servers are.

This is possible under this model with multiple serfdom murmor networks but it seems excessive.

@armon
Copy link
Author

armon commented Oct 25, 2013

@c00w I think what you are looking for is orthogonal to this proposal. This is to simply ensure that Serf's communications are secure from eavesdropping or tampering. If I understand correctly, you want some notion of role level ACLs. You want to be able to say "only nodes with role:app should get this update". Is that correct? I think that the system you want would sit above the low-level security provided by this RFC.

@armon
Copy link
Author

armon commented Oct 25, 2013

Comments from @cemeyer, putting here to have on record: Recommends moving to SHA2 for HMAC. Potentially replace PBKDF2 with scrypt or bcrypt.

@armon
Copy link
Author

armon commented Oct 25, 2013

I think moving to SHA224 is probably a good idea, and only adds another 8 bytes of overhead. I will make the switch for the HMAC, and use SHA256 for the PBKDF2. I want to stick with PBKDF2 since it is much more well studied and widely used than scrypt/bcrypt.

@codahale
Copy link

Why AES-CBC+HMAC and not an AEAD mode like AES-GCM?

Or, why not just use NaCl's secretbox?

@codahale
Copy link

Also, the key expansion seems unnecessary. In given deployment scenario, it'd be easy for users to simply generate a high-entropy 512-bit key (e.g., head -c 64 < /dev/urandom | base64) which you could split into encryption and HMAC keys.

(Or just use a 128-bit key and GCM.)

@armon
Copy link
Author

armon commented Oct 25, 2013

@codahale Go 1.1 doesn't support the GSM cipher mode, but DOES as of 1.2. I think it is reasonable enough to only support Go 1.2, as it will be shipping shortly. I think moving to AES-GCM does make sense. I'd rather avoid something like secretbox just because its based on algorithms that are not as widely deployed or studied.

@armon
Copy link
Author

armon commented Oct 25, 2013

@codahale Additionally, as part of the "hope for the best, plan for the worst" mentality, I don't trust user key inputs. There is no reason not to use PBKDF2 to ensure high entropy keys. If they provide a high quality key then great, the resulting key will be that much better. There is no real downside.

@titanous
Copy link

+1 for using NaCl secretbox, the underlying algorithms are trusted enough that Google is going to deploy them in their TLS implementations and it reduces the possibility of screwing up something silly.

-1 on using PBKDF2 for the keys, just read entropy from /dev/urandom. There should be no situation where the user is supplying anything but a completely random key.

What will happen if I start replaying UDP packets that I've capture off the wire?

@armon
Copy link
Author

armon commented Oct 25, 2013

@titanous I think AES-GCM is "strong enough", that we don't need secretbox for the V1. However because versioning will be built into the protocol, it can always be added as a configurable alternate. Again, on the key derivation front, "should be no situation" and what happens in practice is very different. I literally guarantee people will use simple english phrases and they simply cannot be trusted without a KDF. AFAIK there is no downside to using a KDF, and it just guards against bad inputs. I've modified our use to increase it to 4096 rounds using SHA256 so the keys will be very high quality. Lastly, because of the nature of the gossip protocol, it is already designed to handle packet replay, since this is an expected condition of the gossip based broadcast.

@mitchellh
Copy link

@coda @titanous What is the hesitation against a KDF? From my experience, "trust the user" is not very reliable. The KDF seems to me to ensure that the key is cryptographically suitable. I agree they SHOULD give us some input from /dev/urandom, but I don't have faith they'll actually do that. In all likelihood they'll probably just say serf agent -secret=rainbows

@titanous
Copy link

@armon Sure, AES-GCM is totally fine.

@mitchellh It's mostly gut reaction, I don't have any grounded justification for it. Adding moving parts to cryptosystems when they aren't required rubs me the wrong way. I'm unaware of any other uses of KDF in a similar context where the key is explicitly supposed to be random bytes of a specific length.

One approach would be to use the KDF only if the exact amount of required entropy is not provided.

@codahale
Copy link

I'm not suggesting just using a password, I'm suggesting requiring the user to specify an AES key (i.e., 128 bits of whatever, base64 or hex-encoded to allow for all the bits to be specified in a config file or as a command line param). So if they try rainbows, bounce them out with an error. (Shit, you could even suggest a key for them.)

If you start with passwords—that is, if passwords are actually required because a human is going to have to remember the damn thing—then a KDF is certainly required. In this case, you're not necessarily starting with passwords.

Essentially, I don't think a KDF earns its keep here. If, at some point, you decide to change the parameters of the KDF, the wire format version has to change. That's an unfortunate bit of coupling, and all it gets you is 12 bits of resistance to offline attacks—and not a particularly hard 12 bits, given how easily optimized PBKDF2 is.

@armon
Copy link
Author

armon commented Oct 26, 2013

I see what you are saying. If we mandate that we are provided with a key instead of a password, I think it is reasonable to dump PBKDF2, and instead provide links to help the user. (Or better yet, just add serf key-gen that will automatically generate a key using /dev/random or similar).

Lets just dump it, mandate a key is provided, and provide a CLI level tool to generate keys.

@armon
Copy link
Author

armon commented Oct 26, 2013

Updated the Gist to reflect the latest protocol decisions

@mitchellh
Copy link

Agreed. Makes total sense.

@sit
Copy link

sit commented Oct 26, 2013

The doc could use more description of the threat model and communication patterns (or links to where that is described). What sort of attacks and what strength of attacker should it defend against?

I would call out that authentication here is limited to the sense of "group membership". There is no sense of identity here and compromise of a single host will contaminate the entire cluster; there's no way to revoke membership or trace the provenance of messages.

A single secret is used for everything: integrity/confidentiality, and authentication. It may be worth separating this out so that they can be rotated independently. The doc mentions key rotation for the future; having some form of PFS and key rotation would definitely be useful. I'm not up-to-date on the literature but a search for "secure group communication" shows there is some research on systems that may provide some of these properties. Having a threat model will help decide how important this is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment