Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

  BIP: 324
  Layer: Peer Services
  Title: Version 2 Peer-to-Peer Message Transport Protocol
  Author: Jonas Schnelli <dev@jonasschnelli.ch>
  Status: Draft
  Type: Standards Track
  Created: 2019-03-08
  License: PD

Table of Contents

Abstract

This BIP describes a new Bitcoin peer to peer transport protocol with opportunistic encryption.

Motivation

The current peer-to-peer protocol is partially inefficient and in plaintext.

With the current unencrypted message transport, BGP hijack, block delay attacks and message tampering are inexpensive and can be executed covertly (undetectable MITM)[1].

Adding opportunistic encryption introduces a high risk for attackers of being detected. Peer operators can compare encryption session IDs or use other form of authentication schemes [2] to identify an attack.

Each current version 1 Bitcoin peer-to-peer message uses a double-SHA256 checksum truncated to 4 bytes. Roughly the same amount of computation power would be required for encrypting and authenticating a peer-to-peer message with ChaCha20 & Poly1305.

Additionally, this BIP describes a way to identify data which has been manipulated by peers (intercepting, then blocking or tampering with messages).

Encrypting traffic between peers is already possible with VPN, tor, stunnel, curveCP or any other encryption mechanism on a deeper OSI level, however, most of those solutions require significant technical experience in setting up a secure channel and are therefore not widely deployed.

Specification

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119[3].

A peer that supports the message transport protocol as defined in this proposal MUST accept encryption requests from all peers.

Both communication directions share the same shared-secret but have different symmetric cipher keys.

The encryption handshake MUST happen before sending any other messages to the responding peer.

If the responding peer closes the connection after sending the handshake request, the initiating peer MAY try to connect again with the v1 peer-to-peer transport protocol. Such reconnects allow an attacker to "downgrade" the encryption to plaintext communication and thus, accepting v1 connections MUST not be done when the Bitcoin peer-to-peer network has almost entirely embraced v2 communication.

NODE_P2P_V2

Peers supporting the transport protocol after this proposal MUST signal NODE_P2P_V2

NODE_P2P_V2 = (1 << 11)

A peer usually learns an address along with the expected service flags which MAY be used to filter possible outbound peers.

A peer signaling NODE_P2P_V2 MUST accept encrypted communication specified in this proposal.

Peers MAY only make outbound connections to peers supporting NODE_P2P_V2.

Handshake

 ----------------------------------------------------------------------------------------
 | Initiator                             Responder                                      |
 |                                                                                      |
 | x, X         := SECP256k1_KEYGEN()                                                   |
 | CLIENT_HDATA := X                                                                    |
 |                                                                                      |
 |               --- CLIENT_HDATA --->                                                  |
 |                                                                                      |
 |                                       y, Y           := SECP256k1_KEYGEN()           |
 |                                       ECDH_KEY       := SECP256k1_ECDH(X,y)          |
 |                                       SERVER_HDATA   := Y                            |
 |                                                                                      |
 |               <-- SERVER_HDATA ----                                                  |
 |                                                                                      |
 | ECDH_KEY     := SECP256k1_ECDH(x,Y)                                                  |
 ----------------------------------------------------------------------------------------

To request encrypted communication (only possible if yet no other messages have been sent or received), the initiating peer generates an EC secp256k1 ephemeral key and sends the corresponding 32-byte public key to the responding peer and waits for the remote 32-byte public key from the counterparty.

ODD secp256k1 public keys MUST be used (public keys starting with 0x02). If the public key from the generated ephemeral key is an EVEN public key (starting with 0x03), its public key SHOULD be negated and then recalculated. Only using ODD public keys makes it more complex to identify the handshake based on analyzing the traffic.

The handshake request and response message are raw 32byte payloads containing no header, length or checksum (the pure 32byte payload) and MUST be sent before anything else.

Public keys starting with the 4-byte network magic are forbidden and MUST lead to local regeneration of an ephemeral-key.

Pseudocode for the ephemeral-key generation

do {
    ecdh_key.MakeNewKey();
    if (ecdh_key.GetPubKey()[0] == 3) {
        ecdh_key.Negate();
    }
} while (m_ecdh_key.GetPubKey()[0..3] == NETWORK_MAGIC);

Once a peer has received the public key from its counterparty, the shared secret MUST be calculated by using secp256k1 ECDH.

Private keys will never be transmitted. The shared secret can only be calculated if an attacker knows at least one private key and the counterparty's public key. This key-exchange is based on the discrete log problem and thus not sufficiently strong against known forms of possible quantum computer algorithms. Adding an additional quantum resistant key exchange like NewHope is possible but out of scope for this proposal.

After a successful handshake, messages MUST use the "v2 messages structure". Non-encrypted v1 messages from the initiating peer MUST lead to an immediate connection termination.

After a successful handshake, both peers MUST wipe the ephemeral-session-key from memory and/or persistence storage.

A peer not supporting this proposal will not perform the described handshake and thus send a v1 version message. Peers supporting this BIP MAY optionally allow unencrypted v1 communication by detecting a v1 version message by the initial 11-byte sequence of 4byte net magic || "version".

Symmetric Encryption Cipher Keys

Once the ECDH secret (ECDH_KEY) is calculated on each side, the symmetric encryption cipher keys MUST be derived with HKDF [4] after the following specification:

1. HKDF extraction PRK = HKDF_EXTRACT(hash=SHA256, salt="BitcoinSharedSecret||INITIATOR_32BYTES_PUBKEY||RESPONDER_32BYTES_PUBKEY||NETWORK_MAGIC", ikm=ECDH_KEY).

2. Derive Key_1_A (K_1 communication direction A) K1A = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinK_1_A", L=32)

2. Derive Key_2_A (K_2 communication direction A) K1B = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinK_2_A", L=32)

3. Derive Key_1_B (K_1 communication direction B) K2 = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinK_1_B", L=32)

3. Derive Key_2_B (K_2 communication direction B) K2 = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinK_2_B", L=32)

Session ID

Both parties MUST also calculate the 256bit session-id using SID = HKDF_EXPAND(prk=PRK, hash=SHA256, info="BitcoinSessionID", L=32). The session-id can be used for authenticating the encryption-session (identity check).

The session-id MUST be presented to the user on request.

ChaCha20-Poly1305@Bitcoin Cipher Suite

Background

ChaCha20 is a stream cipher designed by Daniel Bernstein and described in [5]. It operates by permuting 128 fixed bits, 128 or 256 bits of key, a 64 bit nonce and a 64 bit counter into 64 bytes of output. This output is used as a keystream, with any unused bytes simply discarded.

Poly1305 [6], also by Daniel Bernstein, is a one-time Carter-Wegman MAC that computes a 128 bit integrity tag given a message and a single-use 256 bit secret key.

The chacha20-poly1305@bitcoin combines these two primitives into an authenticated encryption mode with addition of encryption of the packet lengths.

Detailed Construction

The chacha20-poly1305@bitcoin cipher requires two 256 bits of key material as output from the key exchange. Each key (K_1 and K_2) are used by two separate instances of chacha20.

The instance keyed by K_1 is a stream cipher that is used for the per-message metadata, specifically for the poly1305 authentication key as well as for the length encryption. The second instance, keyed by K_2, is used to encrypt the entire payload.

Two separate cipher instances are used here so as to keep the packet lengths confidential (best effort; for passive observing) but not create an oracle for the packet payload cipher by decrypting and using the packet length prior to checking the MAC. By using an independently-keyed cipher instance to encrypt the length, an active attacker seeking to exploit the packet input handling as a decryption oracle can learn nothing about the payload contents or its MAC (assuming key derivation, ChaCha20 and Poly1305 are secure). Active observers can still obtain the message length (ex. active ciphertext bit flipping or traffic shemantics analysis)

The AEAD is constructed as follows: generate two ChaCha20 streams, initially keyed with K_1 and K_2 and IV of 0 and a block counter of 0 and a sequence number 0 as IV. After encrypting 4064 bytes, the following 32 bytes are used to re-key the ChaCha20 context.

Byte-level forward security is possible by precomputing 4096 bytes of stream output, caching it, resetting the key to the final 32 bytes of the output, and then wiping the remaining 4064 bytes of cached data as it gets used.

For each packet, use 3 bytes from the remaining ChaCha20 stream generated using K_1 to encrypt the length. Use additional 32 bytes of the same stream to generate a Poly1305 key.

If we reach bytes 4064 on the ChaCha20 stream, use the next 32 bytes (byte 4065-4096) and set is as the new ChaCha20 key, reset the counter to 0 while incrementing the sequence number + 1 and set is as IV (little endian encoding).

For the payload, use the ChaCha20 stream keyed with K_1 and apply the same re-key rules.

Re-keying can happen in the middle of an encryption (payload, length, poly1305 key).

Packet Handling

When receiving a packet, the length must be decrypted first. When 3 bytes of ciphertext length have been received, they MUST be decrypted.

Once the entire packet has been received, the MAC MUST be checked before decryption. A per-packet Poly1305 key is generated as described above and the MAC tag is calculated using Poly1305 with this key over the ciphertext of the packet length and the payload together. The calculated MAC is then compared in constant time with the one appended to the packet and the packet decrypted using ChaCha20 as described above.

Detection of an invalid MAC MUST lead to immediate connection termination.

To send a packet, first encode the 3 byte length and encrypt it using the ChaCha20 stream keyed with K_1 as described above. Encrypt the packet payload (using the ChaCha20 stream keyed with K_2) and append it to the encrypted length. Finally, calculate a MAC tag and append it.

The initiating peer MUST use K_1_A, K_2_A to encrypt messages on the send channel, K_1_B, K_2_B MUST be used to decrypt messages on the receive channel.

The responding peer MUST use K_1_A, K_2_A to decrypt messages on the receive channel, K_1_B, K_2_B MUST be used to encrypt messages on the send channel.

Optimized implementations of ChaCha20-Poly1305@bitcoin are relatively fast, therefore it is very likely that encrypted messages will not require additional CPU cycles per byte when compared to the current unencrypted p2p message format (ChaCha20/Poly1305 versus double SHA256).

 ------------------------------------------------------------------------------------------
 | Initiator                          Responder                                           |
 |                                                                                        |
 | AEAD() = ChaCha20Poly1305Bitcoin()                                                     |
 | MSG_A_CIPH = AEAD(k=K_1_A, K_2_A, payload_nonce=0, aad_nonce=0, aad_pos=0, msg)        |
 |                                                                                        |
 |                         --- MSG_CIPH --->                                              |
 |                                                                                        |
 |                                    msg   := AEAD(k=K_1_A,K_2_A, n=0, ..., MSG_A_CIPH)  |
 |                                                                                        |
 ------------------------------------------------------------------------------------------

AEAD Test Vectors

message   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k1 (DATA) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k2 (AAD)  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ciphertext
76 b8 e0 76 b8 e0 ad a0 f1 3d 90 40 5d 6a e5 53 86 bd 28 bd d2 19 b8 a0 8d ed 1a a8 36 ef cc 8b

MAC
df cd 91 c8 75 ee 88 71 8b 63 34 5d 75 0c 68 4a
message   01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k1 (DATA) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k2 (AAD)  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

ciphertext
77 b8 e0 76 b8 e0 ad a0 f1 3d 90 40 5d 6a e5 53 86 bd 28 bd d2 19 b8 a0 8d ed 1a a8 36 ef cc 8b 

MAC
fb 6c f9 dc d7 e2 ee 80 7d 5f f9 81 eb 4a 13 5a
message
ff 00 00 f1 95 e6 69 82 10 5f fb 64 0b b7 75 7f 57 9d a3 16 02 fc 93 ec 01 ac 56 f8 5a c3 c1 34 a4 54 7b 73 3b 46 41 30 42 c9 44 00 49 17 69 05 d3 be 59 ea 1c 53 f1 59 16 15 5c 2b e8 24 1a 38 00 8b 9a 26 bc 35 94 1e 24 44 17 7c 8a de 66 89 de 95 26 49 86 d9 58 89 fb 60 e8 46 29 c9 bd 9a 5a cb 1c c1 18 be 56 3e b9 b3 a4 a4 72 f8 2e 09 a7 e7 78 49 2b 56 2e f7 13 0e 88 df e0 31 c7 9d b9 d4 f7 c7 a8 99 15 1b 9a 47 50 32 b6 3f c3 85 24 5f e0 54 e3 dd 5a 97 a5 f5 76 fe 06 40 25 d3 ce 04 2c 56 6a b2 c5 07 b1 38 db 85 3e 3d 69 59 66 09 96 54 6c c9 c4 a6 ea fd c7 77 c0 40 d7 0e af 46 f7 6d ad 39 79 e5 c5 36 0c 33 17 16 6a 1c 89 4c 94 a3 71 87 6a 94 df 76 28 fe 4e aa f2 cc b2 7d 5a aa e0 ad 7a d0 f9 d4 b6 ad 3b 54 09 87 46 d4 52 4d 38 40 7a 6d eb 3a b7 8f ab 78 c9

k1 (DATA) 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
k2 (AAD)  ff 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f

ciphertext
39 40 c1 c8 68 cd 14 5b d5 46 91 e9 b6 b4 02 c7 8b d7 ea 9c 37 24 fc 50 df c6 9a 4a 96 be 8d ec 4e 70 e9 58 18 8a a6 92 22 ea ef 3f 47 f8 00 3f 1b c1 3d cf 9e 66 1b e8 e1 b6 71 e9 cf 46 ba 70 5b ca 96 3e 04 77 a5 b3 c2 e2 c6 6f eb 82 07 26 9d db 01 b1 37 2a ad 68 56 3b b4 aa d1 35 af b0 6f be 40 b3 10 b6 3b ef 57 8f f9 39 f3 a0 0a 6d a9 e7 44 d2 8b a0 70 29 4e 57 46 d2 ca 7b b8 ac 2c 8e 3a 85 5a b4 c9 bc d0 d5 85 5e 11 b5 2c ac aa 2d db 34 c0 a2 6c d0 4f 4b c1 0d e6 dc 15 1d 4e e7 ce d2 c2 b0 de 8d ed 33 ff 11 f3 01 e4 02 75 59 e8 93 8b 69 bc eb 1e 5e 25 9d 41 22 05 6f 6a db d4 8a 06 28 b9 12 f9 0d 72 83 8f 2f 3a af 6b 88 34 2c f5 ba c3 cb 68 8a 9b 0f 7a fc 73 a7 e3 ca d8 e7 12 54 c7 86 ea 00 02 40 ae 7b d1 df 8b cf ca 07 f3 b8 85 72 3a 9d 7f 89 73 64 61 

MAC
7a c8 d9 35 a4 1b f9 54 64 32 36 0e 1c 54 37 08

v2 Messages Structure

Field Size Description Data type Comments
3 length 24 bits Encrypted length of ciphertext payload (not counting the MAC tag) in number of bytes
1-13 encrypted message-type variable ASCII message-type (or one byte message-type-ID)
? encrypted payload ? The actual data
16 MAC tag ? 128bit MAC-tag

Encrypted messages do not have the 4byte network magic.

The maximum message size is 2^24 (16’777’216) bytes. Future communication MAY exceed this limit and thus MUST be split into different messages.

Decrypting and processing the message MUST take place only AFTER successful authentication (MAC verification).

The 4byte sha256 checksum is no longer required because the AEAD (MAC).

Both peers MUST keep track of the message sequence numbers (uint32) of sent and received messages for building a 64-bit symmetric cipher IV.

The message-type field MUST start with a byte that defines the length of the ASCII message-type string up to 12 chars (1 to 12) or a message-type-ID (see below).

Message-Type-ID

To save valuable bandwidth, the v2 message format supports message-type-IDs. The ID/string mapping is a peer to peer arrangement and MAY be negotiated between the initiating and responding peer. A peer conforming to this proposal MUST support message-type-IDs based on the table below and SHOULD use message-type-IDs for outgoing messages.

Number Message Type
13 ADDR
14 BLOCK
15 BLOCKTXN
16 CMPCTBLOCK
17 FEEFILTER
18 FILTERADD
19 FILTERCLEAR
20 FILTERLOAD
21 GETADDR
22 GETBLOCKS
23 GETBLOCKTXN
24 GETDATA
25 GETHEADERS
26 HEADERS
27 INV
28 MEMPOOL
29 MERKLEBLOCK
30 NOTFOUND
31 PING
32 PONG
33 REJECT
34 SENDCMPCT
35 SENDHEADERS
36 TX
37 VERACK
38 VERSION
39 GETCFILTERS
40 CFILTER
41 GETCFHEADERS
42 CFHEADERS
43 GETCFCHECKPT
44 CFCHECKPT
45 WTXIDRELAY
46 ADDRV2
47 SENDADDRV2

Length comparisons between v1 and v2 messages

v1 in: 4(Magic)+12(Message-Type)+4(MessageSize)+4(Checksum)+37(Payload) == 61
v2 inv: 3(MessageSize)+1(Message-Type)+37(Payload)+16(MAC) == 57
(93.44%)

v1 ping: 4(Magic)+12(Message-Type)+4(MessageSize)+4(Checksum)+8(Payload) == 32
v2 pong: 3(MessageSize)+1(Message-Type)+8(Payload)+16(MAC) == 28
(87.5%)

v1 block: 4(Magic)+12(Message-Type)+4(MessageSize)+4(Checksum)+1’048’576(Payload) = 1’048’600
v2 block: 3(MessageSize)+1(Message-Type)+1’048’576(Payload)+16(MAC) = 1’048’596
(99.9996%)

Test Vectors

message   verack
k1 (DATA) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k2 (AAD)  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
sequence: 0

Message before encryption
01 00 00 25

Ciphertext (3byte AD, 1byte message-type, 16 bytes MAC)
77 b8 e0 53 14 05 09 d3 48 60 7a 07 58 00 77 44 be 48 21 ef
message   PING (nonce=123456)
k1 (DATA) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
k2 (AAD)  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
sequence: 0

Message before encryption (3 bytes packet length, 1byte message-type-ID, 4 byte message payload [ping nonce])
05 00 00 1f 40 e2 01 00

Ciphertext (3byte AD, 1byte message-type-id, 4 byte message payload [ping nonce], 16 bytes MAC)
73 b8 e0 69 f8 02 ac a0 19 62 2f 1a 1d d3 d3 be de 22 2e d9 ff 17 51 40

Risks

The encryption does not include an authentication scheme. This BIP does not cover a proposal to avoid MITM attacks during the encryption initialization. However, peers MUST show the session-id to the user on request which allows to identify a MITM by a manual verification on a secure channel.

Optional authentication schemes may be covered by other proposals [2].

An attacker could delay or halt v2 protocol enforcement by providing a reasonable amount of peers not supporting the v2 protocol.

Compatibility

This proposal is backward compatible (as long as not enforced). Non-supporting peers can still use unencrypted communications.

Reference implementation

References

  1. ^ Hijacking Bitcoin: Routing Attacks on Cryptocurrencies - M. Apostolaki, A. Zohar, L.Vanbever
  2. a b BIP150
  3. ^ RFC 2119
  4. ^ HKDF (RFC 5869)
  5. ^ ChaCha20
  6. ^ Poly1305

Acknowledgments

  • Pieter Wuille and Gregory Maxwell for most of the ideas in this BIP.
  • Tim Ruffing for the review and the hint for the enhancement of the symmetric key derivation.
  • Jonathan Cross for re-wording and grammar corrections

Copyright

This work is placed in the public domain.

@schildbach

This comment has been minimized.

Copy link

@schildbach schildbach commented Mar 22, 2019

What about Short Command IDs for GETDATA and TX messages? Both are quite frequent. Also during blockchain sync BLOCK and MERKLEBLOCK messages?

But generally I think transport encrytion should be separated from the protocol changes itself, into 2 specs which are independant. Otherwise we risk very slow adoption again, as we experienced with segwit because it stuffed so many small changes into a big spec.

(But nice to see Bitcoin Code is finally open minded about encrypting the transport!)

@flack

This comment has been minimized.

Copy link

@flack flack commented Mar 22, 2019

So I'm no cryptography expert, but could it be that instead of "tempering" you meant "tampering with"?

tampering:

  • to interfere so as to weaken or change for the worse —used with with

tempering:

  • to dilute, qualify, or soften by the addition or influence of something else
  • to anneal or toughen (glass) by a process of gradually heating and cooling
  • to make stronger and more resilient through hardship
  • to bring to a suitable state by mixing in or adding a usually liquid ingredient
@jcnelson

This comment has been minimized.

Copy link

@jcnelson jcnelson commented Mar 25, 2019

What is the advantage of encrypting the messages when anyone can connect to your node? A MITM can deduce that a remote endpoint is running the Bitcoin protocol by inspecting its traffic patterns, and if the remote endpoint is open to accepting new peers, the MITM can handshake with it and then receive data it can decrypt. If the purpose of this BIP is to prevent message tampering, then would it be sufficient for Bitcoin peers to simply sign their messages?

Also, while it is a stated motivation, this BIP doesn't appear to address IP prefix hijacking.

@naumenkogs

This comment has been minimized.

Copy link

@naumenkogs naumenkogs commented Mar 25, 2019

@jcnelson

In the case you explained, MITM would act as an inbound connection for a given node. We already have biases to make these inbound connections learn significantly less than outbound (13298), to make it more difficult for them to affect tx relay (14897), and perhaps other things in the codebase I don't remember. What I want to say here is connecting to someone is not equal to reading all the traffic they produce.

Although I see a possibility for a MITM to re-route outbound connections of a given node to MITM itself. But I believe then

  • we're in much bigger trouble than eavesdropping
  • authentication might solve this problem
@jonathancross

This comment has been minimized.

Copy link

@jonathancross jonathancross commented Apr 19, 2019

Hi @jonasschnelli - Here are some small corrections to the English:
https://gist.github.com/jonathancross/2e002268445b450b60a34c0e771027ef/revisions

Nice work!

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Apr 20, 2019

Thanks @jonathancross! Pulled in.

@mytwocentimes

This comment has been minimized.

Copy link

@mytwocentimes mytwocentimes commented Jun 10, 2019

J

I liked your presentation at Breaking ... I revisited your BIP ... here below some small corrections to the English and some nits. But, there is an ODD/EVEN mismatch. You refer to ODD and show "public keys starting with 0x02" ... should be 0x03.

Learned a lot by reading Overhaul ... thnx

https://gist.github.com/mytwocentimes/a040f51dc1ba94ac7a69bc2e3a1fb41c/revisions

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Jun 11, 2019

Thanks D!
I'll pull that in...

@onvej-sl

This comment has been minimized.

Copy link

@onvej-sl onvej-sl commented Jul 2, 2019

Only using ODD public keys makes it more complex to identify the handshake based on analyzing the traffic.

Could you explain it to me in more detail?

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Jul 3, 2019

Could you explain it to me in more detail?

Thanks for the comment. I'll try to expand this section.

Using a standard 33byte pubkey with 0x02 or 0x03 at the beginning would make a handshake trivially identifiable by someone listening on the wire. If we only use ODD pubkeys, the handshake looks pseudo-random (it is still easy to identify a bitcoin handshake by checking if the first 32 bytes is a valid secp256k1 curve point). It is a low hanging fruit that improves the identifiability-robustness slightly with almost no cost.

@wtogami

This comment has been minimized.

Copy link

@wtogami wtogami commented Sep 2, 2019

Adding opportunistic encryption introduces a high risk for attackers of being detected.

Confusingly worded

@pinheadmz

This comment has been minimized.

Copy link

@pinheadmz pinheadmz commented Nov 5, 2019

There are a few related schemes like Noise Protocol and Elligator I wonder if they would apply here? Why not use Noise Protocol's chaining key instead of key rotation? And wouldn't obfuscating EC points with Elligator improve privacy?

@hebasto

This comment has been minimized.

Copy link

@hebasto hebasto commented Apr 12, 2020

@jonasschnelli

If the public key from the generated ephemeral key is an EVEN public key (starting with 0x03), its public key SHOULD be negated and then recalculated.

It seems s/public/private/ is required.

@hebasto

This comment has been minimized.

Copy link

@hebasto hebasto commented Apr 12, 2020

@jonasschnelli
It seems possible to drop support for the REJECT command since it has been removed from the code base (bitcoin/bitcoin#15437).

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Apr 14, 2020

@hebasto: I'm unsure about the REJECT drop. BIP342 is mainly transport protocol proposal (message parsing stays intact). It's also ready complex and I fear a scope creep into the message processing layer if we start to including small things like the removal of the REJECT message.

@ariard

This comment has been minimized.

Copy link

@ariard ariard commented Apr 30, 2020

The handshake request and response message are raw 32byte payloads containing no header, length or checksum, and MUST be sent before anything else

Is this a DoS vector, you may ask to receiving peers performing KeyGen-and-ECDH-for-free ? I don't have secp256k1 bench in mind, but it's an issue we may ban inbound peer after too much session abandon.

You should also precise what receiving peer should do when key isn't valid (like starting with a 4-byte network magic or not secp256k1).

After a successful handshake, both peers MUST wipe the ephemeral-session-key from memor and/or persistence storage

You should destroy shared secret after session termination, for any reason.

Future communication MAY exceed this limit and thus MUST be split into different messages

Given v2 message structure, how fragmentation may be signaled ?

must be done after every 1GB of data

Given key is symmetric but data sent asymmetric who should bear data accounting exactly ?

More generic:

  • what's about a version byte ? (should have been already raised but as you mention future handshake replacement, or key size == protocol to use?)
  • are shared secrets validity time-bounded even without data exchanged to trigger re-keying?
  • pre-shared symmetric key mode ?

Overall, that's already huge, I guess it's wise to indicate that some parts are left to future specification.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Apr 30, 2020

The handshake request and response message are raw 32byte payloads containing no header, length or checksum, and MUST be sent before anything else

Is this a DoS vector, you may ask to receiving peers performing KeyGen-and-ECDH-for-free ? I don't have secp256k1 bench in mind, but it's an issue we may ban inbound peer after too much session abandon.

It might be. Wouldn't that be true as well for a public openssh/ssh server as example?
Also, in V1, one can also DoS by requesting tons of blocks resulting in a lot of SHA256 operations.
Avoiding DoS is probably relatively hard (without blocking legitimate clients). IP based attempt counting, etc. is flawed IMO (easy to circumvent by a large and cheep IPv6 range).

You should also precise what receiving peer should do when key isn't valid (like starting with a 4-byte network magic or not secp256k1).

Good point. Will do.

After a successful handshake, both peers MUST wipe the ephemeral-session-key from memor and/or persistence storage

You should destroy shared secret after session termination, for any reason.

Good point. Will add. I think the ECDH shared secret can be cleansed from memory directly after HKDF-deriving the symmetric keys (and the session id).

Future communication MAY exceed this limit and thus MUST be split into different messages

Given v2 message structure, how fragmentation may be signaled ?

I think split messages would be a message processing layer problem rather than a transport problem.
Transport is up to 2^23 bytes.
If you need to transport 16MB, the message could hint the "page" (rather than the transport layer).

must be done after every 1GB of data

Given key is symmetric but data sent asymmetric who should bear data accounting exactly ?

Each channel (send/recv) has its own symmetric key, thus its own 1GB rule/counter. It's not 100% clear from the specs,... I'll try to add a note to make this more clear.

More generic:

  • what's about a version byte ? (should have been already raised but as you mention future handshake replacement, or key size == protocol to use?)

This is still in discussion.

  • are shared secrets validity time-bounded even without data exchanged to trigger re-keying?

Yes. The BIP states: The Re-Keying must be done after every 1GB of data sent (recommended by RFC4253 SSH Transport) or if the last rekey was more than an hour ago.

  • pre-shared symmetric key mode ?

Outside the current BIP scope.

@elichai

This comment has been minimized.

Copy link

@elichai elichai commented May 3, 2020

I'm missing the rationale of encrypting the length, especially without MACing it separately, what exactly does it give us?

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented May 6, 2020

@elichai:
The chacha20poly1305@bitcoin AEAD is almost identical to the OpenSSH version of the AEAD: https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.chacha20poly1305

Two separate cipher instances are used here so as to keep the packet
lengths confidential but not create an oracle for the packet payload
cipher by decrypting and using the packet length prior to checking
the MAC. By using an independently-keyed cipher instance to encrypt the
length, an active attacker seeking to exploit the packet input handling
as a decryption oracle can learn nothing about the payload contents or
its MAC (assuming key derivation, ChaCha20 and Poly1305 are secure).

Encrypted packet lengths result in a complete pseudo-random stream of data (except the handshake where one can identify a secp256k1 pubkey). But yes, an attacker can still analyse the traffic based on buffer sizes, etc. But the pseudo-random stream has better properties for increasing the censorship robustness (in case we want that later).

@ariard

This comment has been minimized.

Copy link

@ariard ariard commented May 8, 2020

It might be. Wouldn't that be true as well for a public openssh/ssh server as example?
Also, in V1, one can also DoS by requesting tons of blocks resulting in a lot of SHA256 operations.
Avoiding DoS is probably relatively hard (without blocking legitimate clients). IP based attempt counting, etc. is flawed IMO (easy to circumvent by a large and cheep IPv6 range).

With SSH, you avoid at least the KeyGen due to a static pubkey for the server? But I agree banning inbound IPv6 wouldn't work? Maybe leveraging some asmap work? But yes not for now..

From the review:

I guess there are multiple ways how peers want to agree on a mapping (protocol version, dedicated command exchange, versionstring, port, etc.).

In section Short Command ID, maybe you can add a sentence like "We defer mapping negotiation for future specification. In the meanwhile experimentation MAY happen using protocol version..."

Thanks for other answers.

@ariard

This comment has been minimized.

Copy link

@ariard ariard commented May 8, 2020

@jonasschnelli

I'm missing the rationale of encrypting the length, especially without MACing it separately, what exactly does it give us?

And I made the same point in PR review, but actually not MACing the length seems an issue, you may decrypt to some invalid length and therefore manipulate MAC-seek? I remember @cdecker explaining us why they do actually it in https://github.com/lightningnetwork/lightning-rfc/blob/master/08-transport.md, Elichai was there too.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented May 9, 2020

@ariard
Thanks for bringing this up. I think the MACing of the length field need thoughts (let me dive into this further).
One difference to the BOLT #8 encryption protocol is that BIP324 uses a independent ChaCha20 cipher (keystream) for the packet length where AFAIK the LN protocol uses the same key for length & payload.

@elichai

This comment has been minimized.

Copy link

@elichai elichai commented May 9, 2020

In BOLT 8 they encrypt+mac the length separately from the data, but you're right they use the same keystream for both.
(I personally don't have any comments on the scheme itself as it's the same one from openSSH, so I feel the right thing here is to read the research about openSSH :) )

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented May 16, 2020

To be honest, I'm not sure if I can fully parse the algorithm, so I'm not sure what all goes into the MAC computation and which key it uses. (I think it would be great to have pseudocode for the entire thing, not just parts.) But the length included in the K_2 MAC (as AD), right?

Not MACing the length seems dangerous at first glance. Maybe splitting into K_1 and K_2 makes it secure. But this requires some thoughts These things can be very subtle. What about the following attack on the confidentiality of the message length?

Assume two messages are sent by one of the peers, one after the other, with lengths l_1 and l_2. The attacker sees the two ciphertexts and wants to learn the length of the first message. It guesses l_1, which gives it also a guess for l_2. The guess of l_1 gives it the position of the length field of the second message (that is supposed to be an encryption of l_2). The attacker modifies the length field to be very large, and it also modifies some bytes in the ciphertext of the second message. Then the attacker forwards the modified two messages to the other peer.
If the guess of the attacker was wrong, the other peer will terminate the connection (wrong MAC). If the guess was correct, the other peer won't terminate the connection because it still waits for the rest of the very long second message. So the attacker can learn whether the guess about the length was correct.

BUT: The attacker can learn the length of a message in flight anyway, namely by simply flipping some bits in the ciphertext of the content (not length), and forwarding the message byte by byte. Then as soon as the other peer terminates the connection, the attacker knows that the message was over now (and MAC failed).

So is confidentiality of the length the security goal here? When we develop protocols, we must state what security they should provide. Apparently the message length is not confidential against active attackers here. The draft here says "Two separate cipher instances are used here so as to keep the packet lengths confidential", which was copied from the OpenSSH doc. I guess what's meant here is that it's a best effort to encrypt packet lengths confidential because this makes traffic analysis harder but not impossible. Anyway, you can usually infer message lenghts by timing and by looking at how messages are split into lower level protocol messages (if it starts a second TCP packet, it was probably a second message). The only way to prevent leaking lengths is by padding messages. TLS1.3 supports this actually but I'm not aware that anyone uses it.

I agree with @elichai. Given that OpenSSH does it, I assume more people have looked into this?!

A more high-level comment is that given that BOLT has "overtaken" the progress here, it may be a good idea to steal some of their stuff, and if it's only key derivation etc. I think there's no need to reinvent the wheel ever time. (I hate to say this because I know progress is slow here anyway.)

@ariard

This comment has been minimized.

Copy link

@ariard ariard commented Jun 17, 2020

Reading "Two seperate cipher instances are used here so as to keep the packet lengths confidential but not create an oracle for the packet payload cipher by decrypting and using the packet length prior to checking the MAC" I don't understand an oracle on which properties of the packet it would create ? Assuming we use only K_1 and receiver must decrypt-then-MAC, if an attacker does a random truncation, a pending connection would mean length is equal superior, a closed connection would mean length is inferior. But this sounds a less effective way than byte-by-byte scenario described above. Unless we assume an attacker able to do only a one-time modification ?

Digging into SSH security analysis (circa 2006), I've found attacks on CBC mode of operation or MAC leaking information in the absence of rekeying. But nothing on MACing the length field of the binary packet. It may mean either it wasn't even considered as a theoretical attack or there are new concerns since then on authenticated protocol weaknesses which may explain BOLT8. I can't see by flipping length field byte and thus extending or truncating packet an attacker can leverage this, integrity check will fail in early steps of reception.

With regards to traffic analysis, even without looking on lower level protocols, Bitcoin one is so much characteristic (transaction size and interactivity of flooding, max-sized addr messages, block size and timing, ...) that analysis isn't that much hardened. Erlay may change this, at least for transaction propagation.

If we keep a separate key for encrypting length field, at least we should explicitly state its security purpose and known limitations.

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented Jun 17, 2020

Reading "Two seperate cipher instances are used here so as to keep the packet lengths confidential but not create an oracle for the packet payload cipher by decrypting and using the packet length prior to checking the MAC" I don't understand an oracle on which properties of the packet it would create ?

I think the goal is that a recipient decrypting the lengths will not act as an oracle for the actual plaintexts. And this is indeed pretty clear if the lenghts are encrypted with K_1 and the actual plaintexts are encrypted with K_2. Given that hiding lengths seems to be a "best-effort" thing working against passive observers but an active attackers can learn the lengths (as we both observed in our posts), this is indeed important.

@ariard

This comment has been minimized.

Copy link

@ariard ariard commented Jun 17, 2020

Digging further in recent literature pointed to me by @elichai, it seems BIP 324 cipher isn't secure with regards to byte-counting fragmentation and fragmentation-related DoS active attacks, see https://degabriele.info/publications/Surfeit.pdf and https://eprint.iacr.org/2015/059.pdf. As we pointed out in above posts, current scheme is vulnerable to traffic analysis by active attacker to learn plaintext length.

Considering a) byte-counting fragmentation and bit flipping attacks we may add same counter-measure than OpenSSH, namely after detecting an invalid MAC wait receiving more packets until reaching MAX_PACKET_LENGTH and dropping connection at that point. Thus uniforming error cases between invalid MACs and max packet length, see Section 4.2 of 1st paper. A time-out should be implemented after first packet reception to avoid amplifying b).

Achieving a) would provide theoretically ciphertext boundary hiding security, even if you can observe timing and lower level protocols fragmentation, it wouldn't leak boundaries at the encryption layer itself. We may introduce in the future deliberate fragmentation of the ciphertext from the sender to blur further traffic analysis, at the cost of some latency.

For b) fragmentation-related DoS attacks we may add a MAC on encrypted length field generated with K_1. If length field is tampered, a probabilistic decryption to some higher range value may turn as a DoS vector, i.e allocating resources for a staling connection. Authenticating the field should prevent this behavior.

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Aug 24, 2020

I don't know much experience with transport encryption protocol engineering. I am basically learning as I go along. Hopefully my remarks are useful.
At a high level, I think all the primitive choices here are good and optimal for Bitcoin's requirements. Well done!

ODD secp256k1 public keys MUST be used (public keys starting with 0x02). If the public key from the generated ephemeral key is an EVEN public key (starting with 0x03), its public key SHOULD be negated and then recalculated. Only using ODD public keys makes it more complex to identify the handshake based on analyzing the traffic.

Can this just say use XOnly keys as described in BIP340 instead? This requires the x-coord is even (which starts with 0x02 I think in contradiction to this statement).

Re-keying can be signaled by setting the most significant bit in the length field before encryption. A peer signaling a rekey MUST use the next key for encrypted messages AFTER the message where the signaling has been done.

What is the motivation for explicit re-keying? It adds complexity because then you have to introduce rules for how long a key should last.
You are never going to get much security against an adversary who can adaptively own someone's machine because they saw some traffic they want to know pass through it.
Having only the 1GB rule would be sufficient to me.

@real-or-random wrote:

I think the goal is that a recipient decrypting the lengths will not act as an oracle for the actual plaintexts. And this is indeed pretty clear if the lenghts are encrypted with K_1 and the actual plaintexts are encrypted with K_2. Given that hiding lengths seems to be a "best-effort" thing working against passive observers but an active attackers can learn the lengths (as we both observed in our posts), this is indeed important.

I don't understand the concept of a "recipient decrypting lengths". Recipients decrypt the length then payload and check MAC. I don't see how to use such a recipient "as an oracle" for anything in the payload if the length and payload are encrypted with the same keystream. This seems like an important point but the motivation from the SSH document is not clear on this:

Two separate cipher instances are used here so as to keep the packet
lengths confidential but not create an oracle for the packet payload
cipher by decrypting and using the packet length prior to checking
the MAC

Isn't the correct solution here just to never do this? It seems like an easy thing to avoid doing.

@ariard wrote:

And I made the same point in PR review, but actually not MACing the length seems an issue, you may decrypt to some invalid length and therefore manipulate MAC-seek?

If you manipulate the length of the MAC won't it necessarily fail anyway because the MAC is for a message of certain length?

I have a feeling that MACing the length in BOLT-8 is just a byproduct of the way noise is designed. For each noise frame you must MAC it. Since you can't even find the MAC for the payload without the length of the message you have to send it first.
But everything you send has to have a MAC so you have to MAC the length and send it first.
If this is the case then it might be a good idea to avoid noise because of this superfluous MAC (depending on how it affects performance).

This is unfortunate because using something like noise could make the proposal more compelling since review of the general structure of the key exchange and transport encryption would have been done for us.
You could argue that openssh should provide the same standard of review since it is solving almost the same problem. The downside is that it wasn't designed as a re-usable "framework" so we have to guess a bit as to why there are separate keystreams for length and payload and no references to any extended discussion explaining it.

@real-or-random wrote:

So is confidentiality of the length the security goal here? When we develop protocols, we must state what security they should provide. Apparently the message length is not confidential against active attackers here. The draft here says "Two separate cipher instances are used here so as to keep the packet lengths confidential", which was copied from the OpenSSH doc.

I agree. Encrypting the length is justified here as part of a more general goal of keeping the packet data sent between two Bitcoin nodes as indistinguishable from random as possible. Tackling active attackers is a whole other ball game. It would be good if room for doing that can be left for a future BIP because they will need further research and I think this BIP is useful enough without them.

@sipa has an idea here that would be DH more indistinguishable. I'm not sure how appropriate he may think it is for BIP324 https://gist.github.com/sipa/29118d3fcfac69f9930d57433316c039

@ariard wrote:

For b) fragmentation-related DoS attacks we may add a MAC on encrypted length field generated with K_1. If length field is tampered, a probabilistic decryption to some higher range value may turn as a DoS vector, i.e allocating resources for a staling connection. Authenticating the field should prevent this behavior.

I can't imagine how probabilistic tampering of message lengths could be a DoS vector. If it was then you could just connect to someone and DoS them the same way by sending them long messages.

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented Aug 26, 2020

@LLFourn

x-only public keys: Indeed, that's a natural choice. This draft simply predates BIP340.

re-keying: I agree, only having the 1GB rule needs less logic and avoids fingerprinting attacks etc.

I don't understand the concept of a "recipient decrypting lengths". Recipients decrypt the length then payload and check MAC. I don't see how to use such a recipient "as an oracle" for anything in the payload if the length and payload are encrypted with the same keystream.

If you use one key, some bytes in the stream are length and some are payload. For an attack you'd somehow need to the trick the recipient into believing that a byte at a payload position is a length byte. I don't see an attack either here but if things can be very subtle. In a security proof, you would exactly need to argue that the sender and recipient agree on which bytes are length and payload, and I don't think this is trivial. Using two keys makes this problem disappear and it's much easier to argue about the construction.

Two separate cipher instances are used here so as to keep the packet
lengths confidential but not create an oracle for the packet payload
cipher by decrypting and using the packet length prior to checking
the MAC

Isn't the correct solution here just to never do this? It seems like an easy thing to avoid doing.

How would you avoid it (without MACing the lengths)?

If you manipulate the length of the MAC won't it necessarily fail anyway because the MAC is for a message of certain length?

You mean "If you manipulate the length of the message"? I think your argument is correct but again, I think MACing the length gives us a protocol which is easier to analyze. Then you just don't need to think about what happens if the attacker plays around with the message lengths. MAC will fail, easy.

If this is the case then it might be a good idea to avoid noise because of this superfluous MAC (depending on how it affects performance).

Symmetric crypto is so cheap, why not just MAC everything?

Encrypting the length is justified here as part of a more general goal of keeping the packet data sent between two Bitcoin nodes as indistinguishable from random as possible

I'm not sure. Hiding message lengths can be an interesting goal on its own (but it seems hard, see above). I think this is different from censorship-resistance/having a pseudorandom byte string.

@sipa has an idea here that would be DH more indistinguishable. I'm not sure how appropriate he may think it is for BIP324 https://gist.github.com/sipa/29118d3fcfac69f9930d57433316c039

Yeah, that gist is somewhat outdated. If we want this, then the proper solution is to use Elligator squared as pointed out by one of the comments there. I think Pieter was just not aware of Elligator Squared. But after some discussions, we're converging towards making a pseudorandom byte string a non-goal. (This is somewhat inconsistent with the odd keys requirement but this one is just super cheap to do, I don't know.)

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Sep 3, 2020

@real-or-random

If you use one key, some bytes in the stream are length and some are payload. For an attack you'd somehow need to the trick the recipient into believing that a byte at a payload position is a length byte. I don't see an attack either here but if things can be very subtle. In a security proof, you would exactly need to argue that the sender and recipient agree on which bytes are length and payload, and I don't think this is trivial. Using two keys makes this problem disappear and it's much easier to argue about the construction.

Yes, I think that likely captures what the ssh devs were thinking.
While perhaps not trivial it does seem straightforward to prove the sender and receiver will agree on the position of the lengths. The fixed size length is at the beginning of each frame and each frame is terminated by a MAC. If the sender and receiver were to disagree on the position of any of the 3-byte lengths then they must also disagree on the length of the last payload because the length comes immediately after the last payload in terms of the keystream. However, if they disagree on the length of the last payload then the receiver must have accepted a MAC for a different payload that the sender did not send i.e. it has been forged.

Symmetric crypto is so cheap, why not just MAC everything?

I don't deal with symmetric crypto much so I don't have biases tuned to it. My general bias is against using cryptographic primitives unless they can actually be proved to do something for security (hopefully with a proof reducing to some property of the primitive). It's the same feeling I have against having to hash everything to 32 bytes in BIP340 before signing them.

I'm not sure. Hiding message lengths can be an interesting goal on its own (but it seems hard, see above). I think this is different from censorship-resistance/having a pseudorandom byte string.

I think I'm convinced that encrypting the length does close to nothing. I don't think it's quite nothing since it at least makes it harder to see how many frames were involved in a particular burst of communication and therefore what kind of messages were being sent.

Yeah, that gist is somewhat outdated. If we want this, then the proper solution is to use Elligator squared as pointed out by one of the comments there. I think Pieter was just not aware of Elligator Squared. But after some discussions, we're converging towards making a pseudorandom byte string a non-goal. (This is somewhat inconsistent with the odd keys requirement but this one is just super cheap to do, I don't know.)

Using XOnly keys is justified without reference to pseudorandomness IMO.
It's good to know that Elligator squared makes this possible with secp256k1.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Sep 8, 2020

@ariard:

For b) fragmentation-related DoS attacks we may add a MAC on encrypted length field generated with K_1. If length field is tampered, a probabilistic decryption to some higher range value may turn as a DoS vector, i.e allocating resources for a staling connection. Authenticating the field should prevent this behavior.

The max package size is 8MB (length field is 23bits). If an attacker tempers with the length field and provokes an invalid decryption, isn't that similar to the status quo of V1 where an attacker can do the same (even with larger packets) and provokes infinite SHA256 calculations (which are on most machines pretty much the ~same level as our AEAD)?

As for a) I agree on unifying the error path for invalid MAC and oversized packets.

@ariard

This comment has been minimized.

Copy link

@ariard ariard commented Sep 9, 2020

re-keying : sounds we'll agree on sticking to 1 GB only ? Better to not rely on local clock.

From @real-or-random

You mean "If you manipulate the length of the message"? I think your argument is correct but again, I think MACing the length gives us a protocol which is easier to analyze. Then you just don't need to think about what happens if the attacker plays around with the message lengths. MAC will fail, easy.

I'm sharing the lean here. We can come up with a practical way to exploit tampering of length field, but if it's computationally cheap to do better to have it.

From @LLFourn

I think I'm convinced that encrypting the length does close to nothing. I don't think it's quite nothing since it at least makes it harder to see how many frames were involved in a particular burst of communication and therefore what kind of messages were being sent.

Letting length in plaintext is lowering the cost for any passive attacker to discover messages exchanged. I agree that's a marginal cost as of today if you assume ability to observe network layer activity from your victim. But in the future it would be beneficial if a day we work on synthetic traffic or path-aware routing.

From @jonasschnelli

The max package size is 8MB (length field is 23bits). If an attacker tempers with the length field and provokes an invalid decryption, isn't that similar to the status quo of V1 where an attacker can do the same (even with larger packets) and provokes infinite SHA256 calculations (which are on most machines pretty much the ~same level as our AEAD)?

You may have a hanging-out-connection-for-free as tampering might be lower in resources committed by an attacker compare to sending huge messages. But thinking further that's the same DoS cost for the victim than plainly dropping packets. It results as a victim has peer resources allocated for nothing. Not that much we can do if we assume network capabilities attacker, so you're right not worthy to worry further.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Sep 10, 2020

@ariard @real-or-random

I'm sharing the lean here. We can come up with a practical way to exploit tampering of length field, but if it's computationally cheap to do better to have it.

If I get this approach right, it would require to MAC the encrypted length separately. Adding 16 bytes to each message. I'm less worried about the additional computational requirements but more about the bandwidth impact. Especially in pruned mode, almost 50% of the messages are <64 bytes. Where the 16byte additional MAC would have a significant impact on the bandwidth/traffic (probably ~>+10%).
Of course, if we have to do this, then that's it.

I still haven't seen practical attack possibilities via tempering the encrypted length field. Plus, wouldn't that also be a concern for the @openssh chacha20poly1305 AEAD? Is there a rational why they haven't MACed the length field?

I'm sorry if I repeat questions.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Sep 10, 2020

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Oct 11, 2020

I stumbled across this design for "NoiseSocket" an extension to the noise protocol which is opinionated about lengths: https://raw.githubusercontent.com/noiseprotocol/noisesocket_spec/master/output/noisesocket.pdf

Note it does not MAC the length separately.

On an unrelated point, I just noticed that in BOLT-08 it makes a similar "decryption oracle" claim to the one made by the openssh devs about their double key thing.

In order to make traffic analysis more difficult, the length prefix for all encrypted Lightning messages is also encrypted. Additionally a 16-byte Poly-1305 tag is added to the encrypted length prefix in order to ensure that the packet length hasn't been modified when in-flight and also to avoid creating a decryption oracle.

https://github.com/lightningnetwork/lightning-rfc/blob/master/08-transport.md

I am either missing something basic about authenticated encryption or there is some powerful "decrption oracle" folklore embedded in people's minds around length fields.

@jesseposner

This comment has been minimized.

Copy link

@jesseposner jesseposner commented Oct 12, 2020

This blog post by Damien Miller the author of the chacha20-poly1305@openssh.com commit looks like it might shed some light on why there is no MAC for the length:

An active attacker can still play games by fiddling with the packet lengths, but doing so will reveal nothing about the packet payloads themselves - they can make the receiving end read a smaller or larger packet than intended, but the MAC will be checked (and the check will fail) before anything is decrypted or used.

So it sounds like there is no separate MAC for the length because if the length is modified then the MAC on the packet payload will fail to verify.

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented Oct 13, 2020

So it sounds like there is no separate MAC for the length because if the length is modified then the MAC on the packet payload will fail to verify.

Yes, indeed, and this confirms our intuition.

I believe we should reach out to more people who worked on this. We're not the first to think about these subtleties and there's no need to reinvent the wheel.

@jesseposner

This comment has been minimized.

Copy link

@jesseposner jesseposner commented Oct 13, 2020

I believe we should reach out to more people who worked on this.

I will reach out to Damien Miller to see if he can provide more clarity on the "decryption oracle" claim. I will also ask if he can review this BIP for any general thoughts.

@jesseposner

This comment has been minimized.

Copy link

@jesseposner jesseposner commented Oct 14, 2020

But after some discussions, we're converging towards making a pseudorandom byte string a non-goal.

@real-or-random Do you think that making the keys pseudorandom should be a non-goal in the context of this BIP? If so, I'd be curious to hear the reasons because at first glance it seems like it would be useful.

@sipa

This comment has been minimized.

Copy link

@sipa sipa commented Oct 14, 2020

@jesseposner A reason to not care too much about pseudorandom keys is that Bitcoin P2P traffic has other ways in which it can be identified, such as traffic patterns, correlation with times that blocks are found, and almost universal usage of port 8333. Unless there is an actual plan through which these can be addressed (which seems very hard), it seems like a waste of effort to try to avoid this 1-bit leak as well.

@jesseposner

This comment has been minimized.

Copy link

@jesseposner jesseposner commented Oct 14, 2020

It might be useful to document the justification of using secp256k1. For example, Curve25519 potentially has some safety features that could be useful here. I suspect that adding a dependency outweighs any potential benefits from other curves, but it would be nice to make that explicit.

@jesseposner

This comment has been minimized.

Copy link

@jesseposner jesseposner commented Oct 14, 2020

@jesseposner A reason to not care too much about pseudorandom keys is that Bitcoin P2P traffic has other ways in which it can be identified, such as traffic patterns, correlation with times that blocks are found, and almost universal usage of port 8333. Unless there is an actual plan through which these can be addressed (which seems very hard), it seems like a waste of effort to try to avoid this 1-bit leak as well.

Ah, that makes sense. Thanks!

@jesseposner

This comment has been minimized.

Copy link

@jesseposner jesseposner commented Oct 15, 2020

Damien Miller confirmed that the reason for an independently keyed cipher instance for the length is not to prevent a known attack, but more as a risk mitigation strategy by avoiding giving the attacker access to a plaintext/ciphertext pair for the packet payload key:

Stream ciphers like chacha20 aren't subject to the same decryption oracle attacks as block ciphers like AES-CBC, but abundance of caution and all...

(He also mentioned that the rekeying might be on the conservative side for chacha20, and that the HKDF for key derivation is an improvement on SSH's KDF.)

So I think the issue can be described as follows: an attacker might learn the length, for example, with a bit-flipping attack, in which case the attacker would learn the plaintext and ciphertext of the length, and thus the length becomes a decryption oracle for the key used to encrypt the length. By using an independently-keyed cipher instance, the attacker might learn the ciphertext of the packet payload (by first learning the length), but would not learn any plaintexts for the packet payload key.

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Oct 16, 2020

Damien Miller confirmed that the reason for an independently keyed cipher instance for the length is not to prevent a known attack, but more as a risk mitigation strategy by avoiding giving the attacker access to a plaintext/ciphertext pair for the packet payload key

Nice! Thanks @jesseposner. That solves an important piece of the puzzle. This confirms that it was defensive engineering rather than some subtle attack that he figured out.

(He also mentioned that the rekeying might be on the conservative side for chacha20, and that the HKDF for key derivation is an improvement on SSH's KDF.)

Does anyone who has studied ChaCha20 actually recommend re-keying at all? it seems like it might be some more folklore. SSH has certainly used ciphers which must re-key like 3DES but I can't find anyone saying you should re-key ChaCha20 (after all it's a completely different design so maybe it doesn't need it?).

So I think the issue can be described as follows: an attacker might learn the length, for example, with a bit-flipping attack, in which case the attacker would learn the plaintext and ciphertext of the length, and thus the length becomes a decryption oracle for the key used to encrypt the length. By using an independently-keyed cipher instance, the attacker might learn the ciphertext of the packet payload (by first learning the length), but would not learn any plaintexts for the packet payload key.

This doesn't make sense to me. I think it sounds like leftover thinking from protecting against padding oracle attacks which he mentions. i.e. If you don't even decrypt any part of the plaintext before checking the MAC then obviously nothing can go wrong. I don't think it's so hard to see that you can get the same assurance using one key. Consider a modification to the following:

The AEAD is constructed as follows: for each packet, generate a Poly1305 key by taking the first 256 bits of ChaCha20 stream output generated using K_2, an IV consisting of the packet sequence number encoded as an LE uint64 and a ChaCha20 block counter of zero

To use only one key to decrypt a packet you just take the first 35 = 32 + 3 bytes. Use the 3 bytes to decrypt the length, then use the 32 bytes to check the MAC. If they validate you then decrypt the plaintext using the remaining bytes of the stream. Observe that these 35 bytes are always fixed at the block 0 of the the stream indexed by the packet sequence. It is guaranteed that no plaintext overlaps with these bytes since they always start after the 35th byte. Since the MAC key implicitly commits to the packet sequence you are always authenticating and decrypting in the same order as the sender intended. IMO Nothing can go wrong™

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Oct 22, 2020

Rekeying periodically also improves forward security. E.g. say I've captured the last month of your network traffic, then I get my old trusty pipe wrench and rubber hose and show up at your facility ready to politely insist on a copy of your processes memory. Tada, now I can retroactively decrypt all that traffic. Even if you don't really care that much about the confidentiality of the traffic, someone else might and so keeping around those secrets is a kind of toxic waste that might attract unfriendly people with pipe wrenches. So as a matter of good design, it's best to not stockpile it. :)

(This is, incidentally, why by default bitcoin doesn't log IPs or most other sensitive data... its undesirable to turn your log files into something valuable to an attacker.)

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Oct 22, 2020

Very interesting and amusing exposition of the idea. I forgot about the forward secrecy. Is there much harm in just re-keying after every single message then? Seems like the most straightforward thing to do.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Oct 22, 2020

Rekeying every message would be a meaningful performance hit-- if you don't have sha-ni, I suspect it would increase cpu usage by more than double.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Oct 22, 2020

a dependency outweighs any potential benefits from other curves

Not just a dependency, but a strong security assumption: Bitcoin security is shot if DL security for secp256k1 doesn't hold. If another curve were used for this, security of the system would depend on secp256k1 AND the other curve. So in that sense the addition of any other curve is pretty much a strict security reduction from the perspective of the system as a whole.

So I think additional dependency and strong security assumption alone justify the choice here. Though if I had to choose parameters absent an existing choice I would likely not select ed25519 today regardless, but I don't see any need to get into that discussion here.

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Oct 22, 2020

Rekeying every message would be a meaningful performance hit-- if you don't have sha-ni, I suspect it would increase cpu usage by more than double.

I guess this is true if you use HKDF with HMAC-SHA256 as this proposal suggests. I was thinking of re-keying from the cipher stream itself.
This seems to be what the noise spec does: http://www.noiseprotocol.org/noise.html#cipher-functions.
Although they did not do that in BOLT 8: https://github.com/lightningnetwork/lightning-rfc/blob/master/08-transport.md#lightning-message-key-rotation

pinging @Roasbeef since he is actually mentioned in the noise document as having contributed input to the re-keying design (despite BOLT-8 doing something else).

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented Oct 22, 2020

Thanks for bringing forward secrecy up. I remember old discussions that made me look up existing literature and it was not really satisfying, even though coming up with constructions and proofs is probably not that hard.

Now was digging again and I found https://eprint.iacr.org/2001/035.pdf which gives two simple (the obvious TM) provably-secure constructions of forward secure stateful PRGs in Sections 2.4 and 2.5. Using ChaCha20 as a PRG, the Construction in Section 2.4 is simply: define the next key as the first 256 bits of the ChaCha20 output use the rest as real PRG output.

See also Section 5, where they use this only for rekeying another symmetric encryption scheme but I believe we would not even need this.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Oct 22, 2020

The downside of rekeying like that would be loss of parallelism. E.g. do you need to toss out the extra keystream generated by the fact that the implementation computes N bytes at a time? (keep in mind the average message size of this protocol is a couple bytes + length + mac).

Even if parallelism considerations made it unattractive to rekey every message... it might make sense to rekey every 4096 bytes or something, which would have better leakage resistance properties while still preserving all the parallelism one could need. (sadly, lengths/macs coming from a separate stream make for some less obvious decisions -- e.g. does each stream just rekey every 4096 bytes of keystream? -- so fs duration wouldn't be equal for them and the key stream generation could just be build to build that many bytes at a time? seems reasonable.)

[or 4096 bytes minus the amount needed for the rekey, I guess]

Without someone else having done the work verifying a construction I'd feel a little sketchy about randomly using the output to rekey. It's extremely bad when a stream cipher output repeats, and IIRC one can be really confident chacha20 will not repeat because its a permutation with a counter as part of its input (and then the input readded at the end to prevent it from being reversible). If you just feed the output immediately back in, you don't obviously retain that property... maybe it's fine, but I would want to check that there aren't security proofs for it that rely on this structure.

Regardless I don't think leakage resistance is worth a non-negligible performance loss. It's a good property to be had and can be had for essentially no runtime performance loss by performing it infrequently. Improvements should preserve the fact that its nearly free. Hopefully someone else has already solved this elsewhere. .. though the fact that the openssh usage leaves so much performance on the floor for no obvious reason makes me suspect that no one has really been thinking all that hard about these things. :)

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented Oct 22, 2020

The downside of rekeying like that would be loss of parallelism. E.g. do you need to toss out the extra keystream generated by the fact that the implementation computes N bytes at a time? (keep in mind the average message size of this protocol is a couple bytes + length + mac).

Sorry, I don't understand that question. What kind of parallelism are you referring to?

Even if parallelism considerations made it unattractive to rekey every message... it might make sense to rekey every 4096 bytes or something, which would have better leakage resistance properties while still preserving all the parallelism one could need. (sadly, lengths/macs coming from a separate stream make for some less obvious decisions -- e.g. does each stream just rekey every 4096 bytes of keystream? -- so fs duration wouldn't be equal for them and the key stream generation could just be build to build that many bytes at a time? seems reasonable.)

Indeed, you don't need to do it for every message, and it's not a big loss of security if you don't do it for every message... This is anyway only for forward secrecy. Lengths aside (I think we're not sure yet what to do with them), there's no point in rekeying the MAC except for after huge amounts of data as in the current draft. There's nothing like forward integrity.

Without someone else having done the work verifying a construction I'd be a little sketchy about randomly using the output to rekey. It's extremely bad when a stream cipher output repeats, and IIRC one can be really confident chacha20 will not repeat because its a permutation with a counter as part of its input (and then the input readded at the end to prevent it from being reversible). If you just feed the output immediately back in, you don't obviously retain that property...

Hm, I don't think there is an issue here. ChaCha20 is a PRG. This means you can use the output as a new key. (Otherwise you can build a distinguisher.) The mentioned paper confirms this intuition, without looking at a structure of ChaCha20 because it does not need to. In a sense, you don't feed the output back in. There's no "back in", you rekey with part of your input. So you simply create new instances.

What you mention (repeating keystreams) is an issue if nonces are repeated with the same key, because then the keystream repeats.. AFAIU all of this is relevant only close to the birthday bound. If you have a permutation such as the ChaCha20 block function and use it in counter mode, then yes, it will repeat only after 2^noncebits. If you have a PRG that rekeys itself, you'll except a loop after 2^keybits, which should not be an issue here.

But I admit that I'm not an expert for symmetric crypto, and it would be helpful to have one. I wonder why forward secure ciphers are the obvious thing everyone uses if they're so easy to build. (I can reach out easily to at least symmetric crypto expert, and I'm also willing to ask this on stackexchange, if it's just for my own curiosity.)

Regardless I don't think leakage resistance is worth a non-negligible performance loss. It's a good property to be had and can be had for essentially no runtime performance loss by performing it infrequently.

Wait, what is leakage resistance? I thought this is about forward secrecy. Are are you talking about different attacks? Just checking if we have the same thing in mind.

Improvements should preserve the fact that its nearly free. Hopefully someone else has already solved this elsewhere. .. though the fact that the openssh usage leaves so much performance on the floor for no obvious reason makes me suspect that no one has really been thinking all that hard about these things. :)

Well you could say the same thing about the use of RFC6979 in libsecp256k1. Symmetric crypto is still very cheap in the end. Sometimes performance is just good enough and people simply don't care.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Oct 22, 2020

The situation for 6979 isn't the same-- a big part of the point of it is so you can compare/audit implementations. So there is value in using the 'standard' construct even though it has horrific performance, and the horrific performance of RFC6979 is because they wanted to define something that was already a FIPS certified random bit generator. :)

Sorry, I don't understand that question. What kind of parallelism are you referring to?

like your AVX512 letting you do 4 chacha20 permutations concurrently. Generally the sales pitch for stream ciphers is that they're embarrassingly parallel, so you can get arbitrary speedups via hardware or SIMD by just doing a bunch of them in parallel.

there's no point in rekeying the MAC except for after huge amounts of data as in the current draft. There's nothing like forward integrity.

Well if the same cipher stream is used to originate both the lengths and the mac keys you'd want to rekey it for length confidentiality.
I agree that there isn't much reason for rekeying the macs... in fact, some protocols have wanted the opposite of forward integrity: things like OTR leak the mac keys after messages are accepted to make it easier to forge transcripts. (not a concern for this protocol, but just mentioning it)

What you mention (repeating keystreams) is an issue if nonces are repeated with the same key, because then the keystream repeats.. AFAIU all of this is relevant only close to the birthday bound. If you have a permutation such as the ChaCha20 block function and use it in counter mode, then yes, it will repeat only after 2^noncebits.

I agree, but I just wouldn't presume that there isn't some security proof that cares about the difference between the expected time until repeat vs the worst case.

Wait, what is leakage resistance? I thought this is about forward secrecy. Are are you talking about different attacks? Just checking if we have the same thing in mind.

Sorry, :) I think I've seen the same property described as leakage resistance or backtracking resistance in CSPRNGs. I'm not sure that it's best to call this forward secrecy (which is what it is) because that term will be reliably confused with stuff like connection level PFS ciphersuites (I think it's kind of absurd that all protocols aren't PFS normally at least at a "session" level...) :)

@sipa

This comment has been minimized.

Copy link

@sipa sipa commented Oct 22, 2020

So specifically for ChaCha20 the idea of using key stream output as new key is pretty appealing, as ChaCha20 has zero key setup cost. This would be very different for AES based constructions, which have a relatively high preprocessing step to expand the key first. So rekeying I think it just literally generating 32 bytes extra ChaCha output from some previous stream (or a small multiple of that, because of the separate length/data streams).

If we're worried about the risk of ending up in a state that loops, I think there is an easy solution: don't reset the message counter when switching to a new key?

The big question is how frequently to do it. If there are no cryptographic arguments for doing based on the amount of data sent, and we're only changing keys to provide forward secrecy, I feel that the best criterion is probably something time-based. I don't think we care about forward secrecy better than a 1-second resolution, and rekeying once per second, in both directions, for 2 key streams, for 1000 concurrent connections, would still only translate to an additional 0.02% of CPU load (on my i7-7820HQ).

I'm not actually advocating for rekeying every second, just saying that even at that short interval, it's likely negligible from a performance standpoint.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Oct 22, 2020

The nice thing about doing it byte based (really, permutation invocation count based) is it could all be pipelined and you don't need any signalling overhead. It also avoids questions like what do you do with a non-conforming counterparty that never rekeys. I totally agree that the timeframes you're discussing are fine ... I also though an hours timeframe was fine for that matter. :)

With byte-based you could define the function as generating 64 blocks of output (4096 bytes), you steal the first or last 32 bytes to key the next group, and output 4064 bytes of random key-stream. Being able to do 64 blocks in parallel even without pipelining should be plenty for any future simd or hardware support... and would achieve 1 second rekeying for 32kbps streams. (I'm not too fixed on 4096-- it just seemed like a reasonable approximation of 'big' for this purpose).

The slowdown is just 4064 vs 4096, 0.78% of time.

If the same construct were used on the stream that generated length encryption it wouldn't have the same forward secrecy speed (and would take a lot longer if messages are large) but I don't think that is an issue.

The idea of just letting the counter continue also did occur to me, I just didn't want to go too far down the rabbit hole. +1 on that.

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented Oct 23, 2020

I just rediscovered https://blog.cr.yp.to/20170723-random.html , which is a long blog post covering exactly this topic (with respect to PRGs, not to encryption itself, where this seems not common for whatever reason.)

The nice thing about doing it byte based (really, permutation invocation count based) is it could all be pipelined and you don't need any signalling overhead. It also avoids questions like what do you do with a non-conforming counterparty that never rekeys. I totally agree that the timeframes you're discussing are fine ... I also though an hours timeframe was fine for that matter. :)

I really don't know enough about the P2P protocol to have an idea about this: Is it likely that peers have connections where they don't send enough data for hours or days so that rekeying would not occur? I don't claim that this is an issue. The attacker can at most look ~ 4096 bytes "in the past". But I still wonder.

@sipa

This comment has been minimized.

Copy link

@sipa sipa commented Oct 23, 2020

I really don't know enough about the P2P protocol to have an idea about this: Is it likely that peers have connections where they don't send enough data for hours or days so that rekeying would not occur? I don't claim that this is an issue. The attacker can at most look ~ 4096 bytes "in the past". But I still wonder.

Assuming a ping/pong message every 20 minutes (the timeout interval for post-BIP31 connections), you'd reach 4096 bytes in just over 24 hours.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Oct 23, 2020

Of course, if your traffic is only ping/pong messages ... that isn't very interesting. If you want faster forgetting, you can ping/pong (or otherwise send padding traffic) at a higher rate. For 4064 bytes a forgetting time of 1 minute requires 541 bits per second.

A smaller interval would probably be fine too-- the widest implementation I can find discussed (for AVX512) is 256 bytes wide... going further for future SIMD hardware is probably a reasonable idea, 64-way is probably a bit overkill it just seemed roughly reasonable to me.

It's also worth mentioning that an implementation can have more forward security than implied by the rekey interval: Just generate the key stream one re-key interval ahead and throw the key stream out as you use it, giving you effectively bit-at-a-time forward security and then the only requirement on the rekey interval is that it's short enough that the memory required for a key-ahead buffer per-peer isn't burdensome.

Edit: Ah, I see the DJB link also suggests the compute ahead and erase approach and actually a pretty similar construction to what we were discussing here.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Oct 30, 2020

Did a minor update of the BIP. Changes "command" to "message-type" as well as "command short id" to "message-type-ID".
Added the BIP157 message-type-IDs.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Nov 13, 2020

If we would change the AEAD to byte based (really, permutation invocation count based [read more in the last comments]), ... would not hashing the new key (coming from the key stream of the current chacha20 instance) with the session-ID (a HKDF-hashed output of the ECDH-secret) result in being more prone to chacha20 key extraction attacks?

@sipa

This comment has been minimized.

Copy link

@sipa sipa commented Nov 14, 2020

So I think there are a number of different reasons why rekeying may be desirable and/or is done in real protocols:

  • The underlying stream cipher may have limitations on how much data can be securely encrypted with the same key. I believe ChaCha20 has no such limitation, except that the counter can't wrap around (within the same key/IV). With a 64-bit counter it won't ever wrap around in practice (needs a zebibyte).
  • Providing forward security (if an attacker gets access to the cipher key, he can't decrypt historical encrypted data he may have intercepted). Any kind of rekeying is sufficient for this, including just using the ChaCha20 output.
  • Providing "backward" security (not sure if that's the right term, but I mean guaranteeing that an attacker who gets access to the stream key at one point in time, can't use that to decrypt all future data in the stream). I'd say there are two variants here:
    • Situations where the attacker has access to the node's internal secret state and gets to read the session key (and other keys), but then loses this access. Dealing with that requires redoing DH, which is annoying, but it's also a weird security model: how did the attacker lose their access. Furthermore, they may have gotten access to your private (wallet) keys as well, or even have overwritten your binary. If we only want this on a sufficiently long timescale, there is a simpler solution: just disconnect and reconnect. OpenSSH can't do this, but for us on a scale of days or more probably isn't a problem.
    • Situations where the attacker manages to extract the key from just observing the ciphertext. I'm not sure that's worth dealing with, as my understanding is that this is effectively a break of ChaCha20. It is indeed resolved by rekeying using a non-exposed session key, but if an attack on ChaCha20 is known, we should probably just switch to a different stream cipher, rather than patching it up with an infrequent session hash.

So concretely, here is what I suggest:

  • We define 4 stream ciphers from the session, 2 in each direction:
    • One for per-message metadata keys (poly1305 authentication keys, length encryption), called M.
    • One for variable-length data keys (payload encryption), called P.
  • Each stream cipher has its own 256-bit key (derived from the session key), a 64-bit IV (which is always 0), and a 64-bit counter (which starts at 0). Each time 4064 bytes have been produced, the next 32 bytes become the stream's next key (while the IV stays 0 and the counter keeps incrementing; it does not reset). This is transparent, and the output of the stream is just a byte stream, but the counter goes up by 64 every 4064 bytes instead of every 4096 bytes.
    • Maybe the IV can be derived from the session key too, effectively turning it into a 320-bit key. Won't add much, won't hurt.
  • Every packet draws 35 bytes from M (32 bytes for poly1305, 3 bytes for length encryption), and N bytes from the P stream (for payload encryption). Everything is byte-aligned; there is no 21 lengths per counter increment or so; just use all output bytes.
  • Add a recommendation to not keep connections open more than a week perhaps.
  • Byte-level forward security is possible by precomputing 4096 bytes of stream output, caching it, resetting the key to the final 32 bytes of the output, and then wiping the remaining 4064 bytes of cached data as it gets used.
@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Nov 15, 2020

The nice thing about doing it byte based (really, permutation invocation count based) is it could all be pipelined and you don't need any signalling overhead. It also avoids questions like what do you do with a non-conforming counterparty that never rekeys. I totally agree that the timeframes you're discussing are fine ... I also though an hours timeframe was fine for that matter. :)

I can't see how you could do time frame rekeying without magically synchronized clocks. You either have to do byte based or message based re-keying (or explicit rekeying as originally suggested in this proposal).

We define 4 stream ciphers from the session, 2 in each direction:

  • One for per-message metadata keys (poly1305 authentication keys, length encryption), called M.
  • One for variable-length data keys (payload encryption), called P.

I like this design. I was originally skeptical of it since the motivation seemed to be that it added some kind of security. Having two black box streams and just pulling data out for each purpose is a nice idea.

Each stream cipher has its own 256-bit key (derived from the session key), a 64-bit IV (which is always 0), and a 64-bit counter (which starts at 0). Each time 4064 bytes have been produced, the next 32 bytes become the stream's next key (while the IV stays 0 and the counter keeps incrementing; it does not reset). This is transparent, and the output of the stream is just a byte stream, but the counter goes up by 64 every 4064 bytes instead of every 4096 bytes.

Alternatively you could just increment the IV and reset the block counter after each 4096 byte chunk. This is aesthetically pleasing to me because then you could just make the next key always be the first 32 bytes of block=0 rather than the last 32 of a 4096 byte boundary. This is also what noise does:

Note that rekey only updates the cipherstate's k value, it doesn't reset the cipherstate's n value, so applications performing rekey must still perform a new handshake if sending 2^64 or more transport messages.

Where n referes to the nonce/IV. Though noise is talking about message based re-keying.

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented Nov 16, 2020

I like this design. I was originally skeptical of it since the motivation seemed to be that it added some kind of security. Having two black box streams and just pulling data out for each purpose is a nice idea.

Indeed, it appears simple to reason about.

@sipa's suggestion sounds fine. We should make sure that our use of ChaCha20 can be done with a "standard API" which allows setting the key, the IV and the counter. But I think this is the case, as long as we reset the first 128 bits of the state to the fixed init string.

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Nov 16, 2020

We should make sure that our use of ChaCha20 can be done with a "standard API" which allows setting the key, the IV and the counter

ChaCha APIs do not always let you set the counter I think because stream ciphers per se don't have "blocks" e.g. the rust-crypto API looks like: fn new(key, nonce). It starts at block 0 and and increments the counter internally on demand. There is another API that lets you seek to a certain byte position in the stream which would make it at least possible to do @sipa's suggestion.
I think just incrementing the nonce for each new chunk and pulling out the first 4096 bytes from the more generic StreamCipher API would be a bit simpler though.

@real-or-random

This comment has been minimized.

Copy link

@real-or-random real-or-random commented Nov 17, 2020

I think just incrementing the nonce for each new chunk and pulling out the first 4096 bytes from the more generic StreamCipher API would be a bit simpler though.

Since everything else is equal, this is indeed better if you have an API that doesn't let you control the counter. And it still works with both the original and the IETF variant.

@sipa

This comment has been minimized.

Copy link

@sipa sipa commented Nov 19, 2020

Alternatively you could just increment the IV and reset the block counter after each 4096 byte chunk. This is aesthetically pleasing to me because then you could just make the next key always be the first 32 bytes of block=0 rather than the last 32 of a 4096 byte boundary. This is also what noise does:

I like the idea of resetting the counter but incrementing IV every 4064(4096) bytes. That makes the output of the rekeying stream cipher just a concatenation of 4064 bytes out of 4096-byte consecutive messages for the underlying stream cipher. (technically I don't think there is a reason to even increment the IV, as updating the key also has a progress effect - but by incrementing the IV, we guarantee that even if the key ends up in a cycle, the stream output doesn't).

It's orthogonal to the question of using the first or last 32 bytes, though. I used the last 32 bytes because it means a very-low-memory or very-low-latency implementation is possible that doesn't precompute, and doesn't need to remember the future key or compute a block twice.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Dec 8, 2020

@sipa:

Every packet draws 35 bytes from M (32 bytes for poly1305, 3 bytes for length encryption), and N bytes from the P stream (for payload encryption). Everything is byte-aligned; there is no 21 lengths per counter increment or so; just use all output bytes.

Is there a reason why you switched the stream cipher instance for the poly1305 key?
ChaCha20-Poly1305@openssh as well as this current proposal take the poly1305 key from the payload cipher instance (and not from the length encryption cipher instance). I don't know it if makes a difference but was wondering if you had a reason for that change.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Dec 12, 2020

Poly and length use a fixed number of bits per messages sent while payload is variable, so changing lengths can't cause either set of bits to get misinterpreted as the wrong kind. Keeping them separate I think is much more obvious in its protection against decryption failure from acting as an oracle for the other key-stream. E.g. corrupting the length of the prior packet won't cause the receiver to use a different but highly related poly1305 key for this packet.

I think it also has a side benefit of making the usage of the two streams more equal which would make the secrecy duration more uniform in the case where the implementation isn't computing the streams ahead and forgetting the initial values.

@jonasschnelli

This comment has been minimized.

Copy link
Owner Author

@jonasschnelli jonasschnelli commented Jan 19, 2021

I have changed this document (including the test vectors) to adapt the internal 4096byte rekeying rule. Thanks for an additional review.
Implementation of that new AEAD is here: bitcoin/bitcoin#20962

@sipa

This comment has been minimized.

Copy link

@sipa sipa commented Jan 20, 2021

@jonasschnelli Cool, happy to see progress here.

As for explaining the protocol, perhaps it's clearer if it's described as a number of layers? The current flat description makes it a bit hard to see the big picture.

Layers I can think of:

  • Existing cryptographic primitives used:
    • The ChaCha20 PRF: a function from (in our usage) 256-bit key, 64-bit IV to a byte sequence (up to 2^70 bytes)
    • The Poly1305 MAC: a function from a single-use 256-bit key and a variable-length message to a 128-bit authenticator
  • The ChaCha20/forward4064 PRF: a function taking a 256-bit key and producing a byte sequence:
    • This is constructed by starting with k0=key and concatenating:
    • c0=ChaCha20(key=k0, IV=0)[0:4096]; call k1=c0[4064:4096]; output c0[0:4064]
    • c1=ChaCha20(key=k1, IV=1)[0:4096]; call k2=c1[4064:4096]; output c1[0:4064]
    • ...
  • The bitcoin cipher stream, taking as input two 256-bit keys k1 and k2 up front, and then a sequence of variable-length messages that are converted into a byte stream.
    • At initialization time, instantiate two ChaCha20/forward4064 PRFs called F (fixed, with key k1) and V (variable, with key k2).
    • To encode, for every message M with length L:
      • Read 3 bytes from F, which are XOR'ed with the 24-bit LE encoding of L, and output.
      • Read 32 bytes from F, which are used to instantiate a Poly1305 MAC P with that key
      • Feed the 3 unencrypted (or encrypted?) length bytes into P (this is the additional data part of the AEAD). Anything else? Perhaps the network magic? This part seems unspecified in the current text. We may want this padded to a multiple of 16 bytes, so that the ChaCha20 32-byte blocks align with the Poly1305 blocks.
      • For every byte B in M:
        • Read 1 byte from V, XOR it with B, and output it.
        • Feed the resulting byte (B xor V) into P.
      • Compute the 16-byte authenticator from P and output it.
    • To decode, ...
    • (this isn't actually a AEAD anymore, as there is no discernible per-message encryption/authentication, and things only make sense when described as operating on a sequence of messages. I don't think this is a problem, but it may be worth explaining. It probably also explains some of the oddities in the openssh protocol, which still can be described more or less on a per-message basis)
  • The overall protocol which consists of:
    • ECDH negotiation
    • Instantiating two cipher streams (one in each direction), using the two derived HKDF-derived keys from the ECDH output
    • Switching over to the application protocol, with a particular packet layout (each packet is command, either 1 byte or spelled out, plus payload), turning every packet into one

I think using this kind of structure makes it easier to explain the various design decisions and rationale for every step.

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Feb 1, 2021

It's exciting to see this new design moving along. Thanks @jonasschnelli.

I was wondering what people thought of the idea of using the ECDH + transport encryption described here for applications other than Bitcoin p2p e.g. Lightning or other Bitcoin L2 protocols. We can easily anticipate many layer 2 protocols will need an encrypted transport. The DLC group is already in need of a proposal. Although it would be possible to re-use some parts of BOLT-8 the 2-byte max frame length that noise enforces means they cannot map authenticated frames to messages as they do in lightning (DLCs require quite large messages). I've also heard this is also a bit of a nuisance for LN developers in sending large routing messages (but I don't know the details and would happy to be corrected).

The only difficulty is that these protocols will often need authentication against static public keys e.g. Lightning. Also I guess Bitcoin p2p may want authentication at some point in the future? I am not suggesting we include how authentication should work here but I was wondering what others thought it could look like. It would be nice if the spec here could be modular so the line between Bitcoin p2p and the key exchange + transport message encryption bit were very clear.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Feb 1, 2021

The way auth is accomplished has already been covered in other old drafts, you authenticate a hash of the session key inside the encrypted session.

@sipa and I had intended to propose a fairly novel ZKP that, in the context of bitcoin p2p like usage, would prevent an attacker from knowing if authentication was in use at all-- greatly increasing the risk of detection for large scale active observers. This avoids the problem in more traditional protocols where an evesdropper just drop the connection and let the connection re-establish without interception when they detect authentication is in use. ( https://gist.github.com/sipa/d7dcaae0419f10e5be0270fada84c20b .. I think thats the latest update on that? )

The nice thing about authenticating an encrypted session is that it fits nicely with doing whatever adaptive application supported and potentially user interactive stuff you might want to do. The down side is that it may require an additional protocol round trip-- totally irrelevant for long lived connections like in Bitcoin, but I presume that is a reason that separation of concerns isn't more common in established standards.

@gmaxwell

This comment has been minimized.

Copy link

@gmaxwell gmaxwell commented Feb 2, 2021

@LLFourn out of curiosity, are you aware of any protocol that works like this: An authentication scheme like poly1305 produces a 256-bit authentication tag and 96 bits of it are emitted in the message, but then the remaining 160 bits are fed in as the initial state of the authenticator for the next message. As a result even though the probability of accepting a false single message is the kind-of-high 1:2^96 the probability of not disconnecting at a subsiquent message becomes extremely low extremely fast.

@sipa

This comment has been minimized.

Copy link

@sipa sipa commented Feb 2, 2021

@gmaxwell Poly1305 only has 128-bit auth tags, so it isn't that much of a difference.

@LLFourn Yeah, this proposal grew out of an earlier one that added both encryption and (optional, obviously) authentication. The encryption part became this document, and the authentication part turned into the optional authentication scheme that @gmaxwell came up with originally and linked to. Have a look, it's pretty cool. Once we're a bit further here I'll try to get working on that again.

@jonasschnelli That brings me to this: I remember having discussions with @gmaxwell and @real-or-random about an ability to request protocol upgrades before the application-level stuff starts. This could be used for e.g. augmenting the protocol (say, a PQC key negotiation) but may also allow doing authentication early instead of inside the application layer. I don't remember the details, but for now it may suffice to just say: the side that is creating the connection (outbound) must start by sending an (properly encrypted, ...) empty packet (or some specific thing) to indicate "we're done negotiating". Anything else can then be reserved for future updates. Thoughts?

@LLFourn

This comment has been minimized.

Copy link

@LLFourn LLFourn commented Feb 9, 2021

@gmaxwell

The way auth is accomplished has already been covered in other old drafts, you authenticate a hash of the session key inside the encrypted session.

Seems reasonable. Although, this does leak some information over what noise does. Receiving the signature allows you to convince a third party that you (or someone) did indeed establish a connection with the party.

There is a cool trick I found reading Damgard's notes on witness indistinguishable sigma protocols [1]. Instead of proving I know secret key for X, I can prove that I know the secret key for X OR the DH shared secret. Since the verifier knows I don't know the secret key for the shared secret the verifier is convinced of the same thing (i.e. I know the secret key for X). However they cannot re-use the proof to convince anyone else because no one else could be convinced that the DH key is unknown to them.

I am not sure that this is a practical issue though.

@sipa and I had intended to propose a fairly novel ZKP that, in the context of bitcoin p2p like usage, would prevent an attacker from knowing if authentication was in use at all-- greatly increasing the risk of detection for large scale active observers. This avoids the problem in more traditional protocols where an evesdropper just drop the connection and let the connection re-establish without interception when they detect authentication is in use. ( https://gist.github.com/sipa/d7dcaae0419f10e5be0270fada84c20b .. I think thats the latest update on that? )

I think I had seen this briefly -- it looks cool. Having authenticated connections covertly probe for interception is a nice idea esp in the case of Bitcoin p2p. I wonder if it achieves the property I mentioned above against a malicious verifier just because of its structure.

The nice thing about authenticating an encrypted session is that it fits nicely with doing whatever adaptive application supported and potentially user interactive stuff you might want to do. The down side is that it may require an additional protocol round trip-- totally irrelevant for long lived connections like in Bitcoin, but I presume that is a reason that separation of concerns isn't more common in established standards.

This is what I was thinking too. Looking at what is done in the noise XK [2] that lightning uses I don't think there actually would be an extra round of communication doing in session auth with a simple signature (or WI proof).

@LLFourn out of curiosity, are you aware of any protocol that works like this: An authentication scheme like poly1305 produces a 256-bit authentication tag and 96 bits of it are emitted in the message, but then the remaining 160 bits are fed in as the initial state of the authenticator for the next message.

I haven't seen anything like this. I haven't studied how Poly1305 works at all so I have no idea what's gonna happen when you use it in a non-black-box way :)

[1] https://www.cs.au.dk/~ivan/Sigma.pdf (end of Section 7).
[2] http://www.noiseprotocol.org/noise.html#handshake-patterns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment