Skip to content

Instantly share code, notes, and snippets.

@jonasnick
Last active September 18, 2024 19:18
Show Gist options
  • Save jonasnick/6a56ae6bdc7c3a444d01817a897fbcf6 to your computer and use it in GitHub Desktop.
Save jonasnick/6a56ae6bdc7c3a444d01817a897fbcf6 to your computer and use it in GitHub Desktop.
Requirements for a nested MuSig/FROST scheme

Requirements for a nested MuSig/FROST scheme

This document summarizes requirements for a nested MuSig or nested FROST scheme. In a nested multi- or threshold-signature scheme the signers are organized in a tree instead of a multiset. It is important to note that the ideas and concepts presented in this document are a collective effort, drawing from the knowledge and expertise of various individuals.

Possible applications

  • "multisig" Lightning where a node is split into multiple sub-signers. For example, the signing policy of the node could be 2-of-2 or 2-of-3.
  • A "federation" consisting of multiple parties that jointly operate a Lightning node (e.g. Fedimint). Here we'd have a t-of-n signing policy for potentially large t and n. Also we have that t > 2/3*n because the federation would presumably run some asynchronous Byzantine failure tolerant consensus protocol.

Since all applications have trees of height 2, we use the term "multisig signers" to refer to the signers in the deepest level of the tree unless noted otherwise.

Assumption: we want to have liveness even if some of the multisig signers are offline or corrupted

Otherwise, it wouldn't be very attractive to run a multisig Lightning node as it decreases the availability compared to the singlesig case. In the case of a n-of-n signing policy for a multisig Lightning node, this requirement implies that the signers do not only have backups but also reliable failover. T-of-n policies will benefit from using ROAST for robustness.

Does the signing tree ever exceed height 2?

For all known applications, the signing tree has height two: there's a 2-of-2 on the top level and each of the two keys can be some t-of-n aggregate. Would it useful if each of those keys is again an aggregate? If not, the nesting scheme and security proof can be simplified.

Does Lightning need concurrent signing or is sequential signing enough?

Sequential signing is when there's always only one nonce "in-flight" for the signer, i.e., the signer does not broadcast a new nonce before having signed with an already broadcasted nonce. Sequential signing corresponds to the following sequence:

nonce <- NonceGen
Broadcast nonce
sig <- Sign(seckey, nonce, ...)
nonce <- NonceGen
Broadcast nonce
sig <- Sign(seckey, nonce, ...)
...

Concurrent signing is when there are multiple nonces in-flight, for example:

nonce1 <- NonceGen
Broadcast nonce
nonce2 <- NonceGen
Broadcast nonce2
sig1 <- Sign(seckey, nonce1, ...)
sig2 <- Sign(seckey, nonce2, ...)

Concurrent signing is also in effect when the signer signs for a differently tweaked seckey with multiple in-flight nonces (if the tweak is known to the attacker). For example, the following is not sequential signing:

nonce1 <- NonceGen
Broadcast nonce1
nonce2 <- NonceGen
Broadcast nonce2
sig1 <- Sign(seckey1, nonce1, ...)
sig2 <- Sign(seckey2, nonce2, ...)

if seckey1 = tweak(seckey, tweak1) and seckey2 = tweak(seckey, tweak2). But it would be sequential signing if the seckey1 and seckey2 were each drawn independently uniformly at random. If multisig Lightning only required sequential signing, the signing scheme and security proof would be significantly simpler.

Special Requirement in Lightning: Delayed Signing

Lightning has an additional important requirement: After sending the signature nonce of the multisig to the channel counterparty and the counterparty replies with a partial signature, the multisig signers store the partial signature and may only complete it much later. At that later point, the local signers must be able to complete the signature. We can't generally guarantee this in FROST because one of the multisig signers who contributed to the nonce can just refuse to sign. N-of-n MuSig with good backups/failover obviously does not suffer from this problem. Delayed signing may not be a requirement in eltoo.

It is not possible that the signers immediately complete the partial signature from the counterparty and store it until needed. The reason is that this signature is only needed in some circumstances and can lead to loss of funds when used in the wrong circumstances. In particular, a malicious multisig signer could use the completed signature to broadcast a revoked state.

On the other hand, the multisig signers' must always be able to complete the signature. They cannot just "close the channel" if that fails. One could argue that in this regard multisig-LN is more fragile than singlesig, for which it is already difficult enough to broadcast transactions in a timely manner.

There are, however, a couple of potential solutions to this problem:

  • It would be possible to use distributed key generation (DKG) for the nonce such that t-of-n can reconstruct the nonce. This would then be similar to the classic Stinson & Strobl (7 round?) threshold signature protocol. At nonce generation time it requires that all signers are online and a secure broadcast channel (a.k.a reliable broadcast) between them.
  • Another possibility is to have the signers verifiably encrypt their nonces to each other. Assume a 2-of-3 where one signer is offline at nonce generation and signing time. The two online signers send an encrypted secret nonce to the other online signer, such that the ciphertext can be decrypted by the offline signer and can be verified to be the encryption of the sending signer's public partial nonce. The online signers ACK the channel counterparty's signature only when they've received the (valid) encrypted nonce. Once the offline signer comes online and one of the online signers goes down, the previously offline signer receives the ciphertext from the remaining functioning online signer and decrypts it. This allows them to produce a signature. The communcation complexity of this approach grows exponentially with larger t and n. There are a couple of possible options for verifiable encryption:
    • Classic cut-and-choose which is rather inefficient (maybe ~100ms per verifiable encryption for prover and verifier).
    • Batched cut-and-choose: 100 verifiable encryptions at the 128-bit security level it takes around 750ms for the prover and 250ms for the verifier
    • Juggling: This could require very roughly and unoptimized about 256*( El-Gamal-{encryption,decryption}, rangeproof{,-verify}, signature{,-verify}, scalar_mul) for the prover {verifier}. Benchmarks indicate 138ms for each verifiable encryption, which would be very slow.
  • It may be possible to ignore the delayed signing requirement by having the multisig signers sign immediately when receiving the partial signature from the counterparty but don't output the plain signature. Instead they encrypt the signature to the t-of-n while signing. To do this, the multisig signers would create an adaptor signature that can be combined with the counterparty's partial signature after adapting it with the discrete log t of some adaptor T. This adaptor would need to be created through some distributed key generation protocol that will allow t-of-n multisig signers to recover t. But then it appears as if there's no benefit of this technique over running a DKG for the nonce.
  • Another option for a t-of-n is to send all partial nonces from all n signers to the other side of the LN channel, who replies with one signature for each subset of t partial nonces, (requiring n choose t many signatures in total). Then every subset of t signers has a signature it can complete at a later time. This requires all signers to be online at nonce generation time. Besides the possible blowup of communication complexity for large n, the downside is that this protocol would "leak into the state machine": the other side of the channel must be aware of t and n.
  • If we accept that the protocol leaks into the Lightning state machine, then another option is to replace the t-of-n FROST with a n-of-n MuSig but additionally commit to a backup script path that allows spending with the t-of-n. But this is far from simple and "requires significantly extending the Lightning protocol".

The inconvenience of having all signers be present at nonce generation time can be mitigated by pregenerating nonces during key setup. Moreover, since the signers are only going to produce one delayed signature per channel, it should be possible to use just use a single nonce for all delayed signatures in a channel. This requires exceptional care to avoid reusing nonces.

There's a problem with above solutions (including Stinson-Strobl and verifiable encryption) in a 2-of-3 that's somewhat related to the potential BFT requirement. If an honest signer A goes offline and comes online again, a malicious signer M who stayed online may refuse to relay the nonces which were committed to while A was offline (or verifiable encryptions thereof) to A. Also, M could try to send verifiable encryptions of nonces that have already been used in signing.

Would the signer set in a t-of-n multisig Lightning node be known before sending nonce?

If the signers would know which size t subset of n signers would produce a signature before generating the nonce, then the nested signing scheme would (presumably) end up being slightly simpler (because otherwise the signers in the multisig need an extra local nonce coefficient b_extra for concurrent security).

In Lightning, the answer to this question according to the Delayed Signing requirement seems to be no and the same answer holds for Fedimint.

Does the Lightning application need Byzantine failure tolerant (BFT) consensus between the multisig signers?

To see why this might be necessary, consider the following scenario where the owner of the 2-of-3 Lightning multisig wants to open a channel. To do that, one of the multisig signers proposes a transaction to the other signers. If that proposer is malicious it could create two transactions with non-overlapping inputs, send transaction 1 to cosigner 1 and transaction 2 to cosigner 2 for signing. This results in two valid, non-conflicting transactions that can end up on chain even though the owner only requested a single channel opening. This couldn't happen if the signers would run a protocol that establishes consensus on the transactions to sign.

Note that in this particular example, the signers could use some sort of deterministic coin selection to make sure that the two transactions would conflict because they share inputs. However, it's possible to imagine scenarios in LN where using the blockchain to prevent Byzantine failures does not work. For example, the malicious signer may control a different node from which it tries to send a payment through the multisig node. Perhaps the malicious signer can trick one cosigner to route it through channel 1 and the other cosigner to route through channel 2, which would result in two outbound HTLCs for just single inbound HTLC. This would be prevented if the signers had consensus on their state. Maybe there are other very subtle issues that arise if they don't.

Another way to state the problem is to look at a 2-of-3 where honest signers go offline at unfortunate times. Assume a scenario where we have honest signers A and B and malicious signer M. If A is offline for a while and comes online at the same time as B goes offline, then M can lie to A about what the latest state is. In this case, A could refuse to sign unless it can be synced to the latest state from two other signers.

If BFT consensus is necessary, then 2-of-3 doesn't work because it's impossible in an asynchronous setting. For f faulty signers need n=3*f+1 and t=f+1, so for f=1 we need 2-of-4 threshold signing. But it could simplify dealing with the previous requirements (e.g., BFT consensus somewhat simplifies the DKG)?

Note: 2-of-3 can be set up much simpler than other t-of-n policies

2-of-3 is a special case where we can use MuSig instead of FROST and thereby avoid distributed key generation (H/T @real-or-random).

This works by starting the key setup process with regular MuSig key aggregation between the three signers. Assume that the signers are numbered 0, 1, 2. Every signer i sends their signing key to signer i + 1 mod 3. When signer i receives the signing key from i - 1 mod 3 it sends an ACK to i + 1 mod 3. The signer is ready to sign once it receives the signing key and an ACK from i - 1 mod 3. After this key setup, every subset of two signers can sign for the 3-of-3 multisig because the signing key of the missing signer is known to one signer in the subset.

@ariard
Copy link

ariard commented Feb 13, 2023

  • Lightning: {1, 2-of-2}-of-2
    • requires good backups/failover

I think you can have a dissociation between backup: i.e the primary device crash hosting the 1-of-2, there is an assumption the user will take manual recovery action (e.g re-derive the key from a seed backup on a new mobile) from failover, where there is an assumption under a threshold of faults/corruptions), the off-chain system can keep going without external interruption of operational continuity (e.g for a Lightning routing node HTLC forwarding). And still, you might have recovery flow blurring the line, such as "cloud node", where the Lightning node is hosted on a third-party infrastructure, e.g there is an encrypted backup of the secret key which is delivered to the end-user on a "human-friendly" password ?

  • Lightning: {1, 2-of-3}-of-2
    • needs multiple b's on the FROST layer, or no concurrency
    • also need to know who is signing?
    • Simple 2-of-3 DKG: 3-of-3 MuSig, share your own key with right neighbor

I think if your consider latency performance of the threshold/multisig scheme for the enteprise node use-case, not only concurrency
but also robustness as defined by ROAST matters. As you would like at least one of the concurrent session to outcome a valid signature
if the signer threshold is met. Now latency of the signing scheme itself might not matter as the worst-performance bottleneck might your
network connection between your signing share servers. For multiple backups on the FROST layer, assuming a mobile deployment, it's unclear where those redundant backups are laying, like if the primary device is offline this could strongly imply the user does not avail network connectivity ?

To determinate who is signing, I think a simple deployment can envision to have the entity performing the signature aggregtor role electing the set of signer in a client-server fashion. A more advanced deployment could be to hedge the election server fault could be to rely on consensus algorithms such as Raft/Honey Badger.

I don't know if there is an unified security proof existent for aggregated key nesting (e.g nesting a FROST 2-of-3 in a 3-of-3) ? IIRC, the MuSig2 libsecp256k1-zkp implem with the two-points nonce variant is proven under the AGM model and FROST scheme presents proofs only under the DL assumption only.

  • Larger Federation operating a Lightning node (e.g. Fedimint): {t-of-n}-of-2
    • n > 3*(n - t) due to async BFT
    • Larger Federation -> wants to use ROAST

If the joint signature scheme would like to match today deployment of "plainscript" multi-sig, I think higher shares bound can be thought
of like 3-of-5 or 5-of-7, beyond I would say there is an operational complexity issue with handling signing devices hardware cycles. I think the large federation threshold as deployed for side-chains (e.g 11-of-15) have a high operational continuity disruption cost (e.g inability to move sidechain tip forward) and in this context ROAST for robustness sounds valuable. For deployment like Lightning entreprise nodes and things like Fedimint, the operational continuity disruption cost sounds less-defined ?

  • Sequential vs. Concurrent Signing
    • Signing in parallel with two different tweaks (that are known to the attacker) would also be considered "concurrent signing".
    • Maybe not necessarily needed for Lightning, maybe not needed in Fedimint

For Lightning, in that context the tweak operational and security model sounds to me to require better definition ? Like if they consume only local client source of entropies/states, or if we assume the per_commitment_point channel to be included (e.g could be a source of uniqueness for secp256k1_musig_nonce_gen's extra_input). For fedimint, if there is a Lightning HTLC-to-ecash swap, the swap invoice could be included.

I think this is reasonable assumption that the tweaks can be known, and even partially forgeable by the attacker. With Taproot, one can have a wallet where the Taproot tapleaves can be composed by trust-minimized participants.

  • Is FROST signer set known before sending nonce?
    • Lightning: a priori (excluding the Lightning requirement below), yes
    • Fedimint: no

I think the answer is yes for Lightning if we assume only channel signing operations matters. If the HTLC/offers onions signing is included, there might have more Fedimint-like flows to consider.

  • Additional Requirement in Lightning: After sending the signature nonce to the channel counterparty and the counterparty replies with a partial signature, the local node/signers stores the partial signature and may only complete it much later. At that later point, the local signers must be able to complete the signature. We can't generally guarantee this in FROST because the signer who contributed to the nonce can just refuse to sign.
    • Potential solutions:
      • Use a DKG for the nonce, aka the classic Stinson & Strobl (7 round?) threshold signature protocol
      • Send all nonces from all three local signers to the other side of the LN channel, who replies with 3 signatures (one for each nonce pair)
        • this would "leak into the state machine", the other side of the channel must be aware that our local side consists of 3 signers
        • requires that all signers to be online at noncegen time, or pregeneration a lot of nonces during key setup
      • If we accept leaking into other layers, would it be simpler to replace the 2-of-3 FROST with a 2-of-2 MuSig which commits to a backup script path that allows spending with a 2-of-2 MuSig of the remaining to subsets of the original 2-of-3?
        • "requires significantly extending the Lightning protocol"
      • Replace the 2-of-3 FROST with a 2-of-2 MuSig or 3-of-3 MuSig and make sure you have very good backups/failover?
        • "ugh"
      • Each of the two online signers sends their encrypted secret nonce to the other signer, where the ciphertext is verifiable and can be decrypted by the offline signer? Each signer ACKs the channel counterparties signature only when they've received the (valid) encrypted nonce. Once the offline signer comes online, it receives the ciphertext from the remaining functioning online signer and decrypts it to produce a signature. Verifiable encryption is maybe practical using juggling from https://arxiv.org/pdf/2007.14423v1.pdf for example (256*( El-Gamal-{encryption,decryption}, rangeproof{,-verify}, signature{,-verify}, scalar_mul) but that can be optimized.).
      • Maybe in eltoo this wouldn't be necessary?

For the classic 7-round threshold signature protocol, I think the main downside is the latency hit, if latency matters as a dimension for the Lightning enterprise node, large sidechains federation, fedimint flows. On announcing all the nonces to the other side of the LN channel, I think this interferes with any "packet-level" channel type upgrade like point-time-locked-contract or DLC-over-Lightning, and probably will be a restriction of some flows. Requiring all the signers to be online at noncegen time can be lightweighted if we batch the nonce generation on the channel-level (e.g every 1000 states with a new BOLT2-like message the state machine already strictly order the states), however if you assume backup signing offline device, with restraint access (e.g "cold storage" operational complexity), this could be too much restrictive ?

Fallbacking on a 2-of-2 MuSig which commits to a backup script path that allows spending with any combination of a n-of-n musig interferes with the Lightning state machine (and I think a lot of new latency on the network-wide propagation of HTLCs). Script path consumes more witness weights, which is a change in term of fee-bumping reserves management. So this direction sounds to me a "ugh" too. A pure n-of-n MuSig sounds to re-include in your operational/security model all the hardware faults, I think ?

I think the verifiable encrypted secret direction suffers too from a fault-tolerance downgrade if the offline signer finalizing the
signature gets down, you would like fast failover, at least I think for the large federation use-case.

I don't think eltoo changes fundamentally this "partial signature finalization" issue, as the update mechanism still assume counter-signing
by both counterparties. Or you might have weird workarounds such as assigning one eltoo state to each signer, though sounds to classify
in the "significantly extending the Lightning protocol" category ?

Does the Lightning application need BFT consensus?

If so, then 2-of-3 wouldn't work in an async setting (need 3-of-4)
Does having BFT consensus help with the previous requirements?

Depends how much latency matters for Lightning application, and the operational cost of component fallback such as the signature aggregator/producer. If yes, I think BFT consensus help with previous requirements such as the signature producer getting down, or if only client-server flows should be considered ?

@jonasnick
Copy link
Author

@ariard Thanks for the feedback and sorry that you had to read through this mess. I barely understand what I scribbled down a few months ago. I rewrote the doc to clarify a lot of the points you brought up.

To determinate who is signing, I think a simple deployment can envision to have the entity performing the signature aggregtor role electing the set of signer in a client-server fashion. A more advanced deployment could be to hedge the election server fault could be to rely on consensus algorithms such as Raft/Honey Badger.

The main question is whether it makes sense to have a simple transaction proposer because surely we want to maintain security if the proposer is malicious. This may necessarily require BFT consensus. I clarified this part in the document and added an example.

I think this is reasonable assumption that the tweaks can be known, and even partially forgeable by the attacker. With Taproot, one can have a wallet where the Taproot tapleaves can be composed by trust-minimized participants.

I updated the doc to clarify the question. It boils down to: is the same secret key (that is potentially tweaked with different tweaks) of an individual signer engaged in multiple signing sessions at the same time?

For the classic 7-round threshold signature protocol, I think the main downside is the latency hit, if latency matters as a dimension for the Lightning enterprise node, large sidechains federation, fedimint flows.

Note that Stinson-Strobel requires a secure broadcast channel (aka reliable broadcast) at signing time, which is not required by FROST. But if we need BFT consensus anyway, then we already have secure broadcast.

@ariard
Copy link

ariard commented Mar 1, 2023

Some more feedback, hoping to reconciliate mental models!

A bit of terminology (including watchtowers deployment)

  • Local signer: a logical signer providing BOLT-level cryptographic operations to the off-chain coordinator (e.g ChannelManager in LDK parlance) communicating with the channel counterparty, or to a monitor replica instance (e.g ChainMonitor in LDK parlance).

  • Internal signer: a physical device holding sensitive key material and coordinating with other internal signers to constitute a local signer (indifferently FROST or MuSig2).

  • Monitor replica: An instance of a highly-available distributed channel-monitor. It can hold its fully autonomous local signer or connect to a global local signer servicing all monitor replicas and the off-chain coordinator.

E.g, a deployment configuration could be one off-chain coordinator connected to 3 monitor replicas. Each monitor replica's local signer and the coordinator's local signer are embedding enough internal signer (e.g 3 internal signer for a signing threshold of 3) to be autonomous.

Channel uptime, forwarding latency & node availability

A Lightning node servicing HTLC forwarding requests on a wide set of public channels could suffer from high cost in case of node/channel downtime. Operational continuity disruptions can be measured as the routing fees msat loss due to the inability to serve valid HTLC forwarding requests arising from an internal error/failure (in opposition to HTLC errors like BOLT4's unknown_next_peer).

The source of the internal error/failure can be multiple (rotten hardware/network connectivity downtime/software upgrade/malicious compromise). In this context, the internal error that matters is a buggy/compromised internal signer or a coordination fault.

A more exhaustive operational continuity definition can evaluate secondary costs such as counterparty downgrading the local node "competitiveness" for future rounds of liquidity allocation, or the local node reputation in the senders scoring algorithms.

Additionally, a Lightning node servicing HTLC forwarding requests could consider its own forwarding latency as a performance criteria. HTLC forwarding latency has been evaluated as a decision criteria included in Lightning scoring algorithms. A multi-sig scheme design with a high number of round-trips is expected to downgrade the payment UX network-wide. A multi-sig scheme deployment with a lot of latency (including in case of fault due to lack of robustness) could penalize its serviced Lightning node.

If forwarding latency is considered as a valuable dimension for the performance of a Lightning node, it seems concurrent signing with multiple "nonces" in-flight should decrease signing latency, and as such forwarding latency.

While signing round-trips and latency is not expected to be the average limiting factor of a Lightning node behind IO disk operations for channel state storage, the worst-case fault could consistute a shutdown of node availability (e.g a rotten signing device requiring a manual intervention by node operators).

The internal signers liveliness dance

A solution to the Delayed Signing requirement could be for the online internal signers to verifiably encrypt their nonce contributions to each other. Then, once the verifiable encrypted nonces have been received, the ACKs for the counterparty partial signatures can be returned to the local signer, which is transferred back to the off-chain coordinator.

A subset of internal signers go offline, as long as there is one signer remaining online with all the encrypted nonces satisfying the threshold t and sufficient internal signers are back online. If you have more than a 2-of-3 multisig, it sounds the size set of verifiable encrypted nonce must be equivalent to the remaining encrypted signers and this set must be exchanged with each online signer. If you have more than a 2-of-3 multisig, it is uncertain if you have some factorial complexity arising in the number of communication rounds.

Does the "nonce dance liveliness" changes the multi-sig security model ?

Some operational hypothesis can be formulated: there is an uncertainty of the online/offline internal signers during all the operational periods. It seems the online internal signers temporary depositary of the encrypted nonce become semi-trusted coordinators, and a simple compromise (e.g network connectivity DoS) could consistute a channel fund safety compromise. Uncapacity to produce a multi-sig under Lightning time-sensitive security model is a severe security risk.

@ariard
Copy link

ariard commented Mar 1, 2023

@jonasnick Thanks for the feedback and the re-write!

I updated the doc to clarify the question. It boils down to: is the same secret key (that is potentially tweaked with different tweaks) of an > individual signer engaged in multiple signing sessions at the same time?

I can see for operational complexity reasons (e.g simple backup or key ceremony) the same secret key duplicated in multiple signing devices, disconnected themselves (e.g monitor replicas), so I would say yes an individual signer can be engaged in multiple signing sessions ? However we can make more assumptions on what level of operational complexity are acceptable.

If the signers would know which size t subset of n signers would produce a signature before generating the nonce, then the nested
signing scheme would (presumably) end up being slightly simpler (because otherwise the signers in the multisig need an extra local
nonce coefficient b_extra for concurrent security).

Assuming the simple taproot channels flow under specification by the BOLT process, I think the size t subset of n internal signers should be known at public nonce exchange (i.e the local signer for the revocation_key encumbering the CS of next counterparty state) ? Note, I think we have a bit of flexibility here between the static point announced at channel opening and the dynamic per-commitment point.

@real-or-random
Copy link

2-of-3 is a special case where we can use MuSig instead of FROST and thereby avoid distributed key generation (H/T @real-or-random).

I wonder if we could do the same for nonces. Then, if we want to avoid BFT/broadcast channel for the nonce round, that would probably mean that we simply need to stall when one signer is offline (or sends wrong invalid messages). But maybe that's enough. We'd just need to guarantee that have preshared enough properly nonces (1?) to be able to perform a cooperative close. That means during setup, the 3 signers setup keys and some "backup" nonces using the 2-of-3 trick described above, and wait for ACKs. After the setup, they can start signing. If one of the signers goes offline, the other two can't generate nonces without risking that things, but they can use the backup nonces to close the channel properly.

This is not exactly full liveness because channels will be closed as soon as one signer goes offline, but maybe it's still good enough if it makes things considerably simpler? Not that I wrote all of the above, I think things could probably even simpler if one starts with the idea of this reduced liveness requirement: If some signer A thinks that some other signer B is not available, A will simply have the right to initiate a cooperative close together with C. Then you don't even need backup nonces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment