Skip to content

Instantly share code, notes, and snippets.

@kayabaNerve
Last active May 21, 2024 06:34
Show Gist options
  • Save kayabaNerve/0e1f7719e5797c826b87249f21ab6f86 to your computer and use it in GitHub Desktop.
Save kayabaNerve/0e1f7719e5797c826b87249f21ab6f86 to your computer and use it in GitHub Desktop.
FCMP+SA+L

Full-Chain Membership Proofs + Spend Authorization + Linkability

This proposes an extension to FCMPs to make them a drop-in replacement for the exsting CLSAG. In order to be such a replacement, the proof must handle membership (inherent to FCMPs), spend authorization, and linkability.

Terminology

  • FCMP: Full Chain Membership Proof
  • GBP: Generalized Bulletproof, a modified Bulletproof supporting Pedersen Vector Commitments in its arithmetic circuit proof. Security proofs are currently being worked on by Cypher Stack.
  • GSP: Generalized Schnorr Protocol, a proven protocol for 2009 for proving: For a m*n matrix
    [
        [A, B, C],
        [D, E, F]
    ]`
    
    And a n-length row
    [x, y, z]
    
    The m-length output column:
    [
      T,
      U,
    ]
    
    is
    [
      xA + yB + zC,
      xD + yE + zF,
    ]
    
    This is useful to prove knowledge of the opening of a vector commitment and consistency between vector commitments.
  • BP+: Bulletproof+. A weighted-inner-product proof currently in use in Monero.
  • F-S: Forward secret. An adversary who can solve for the elliptic curve discrete log problem cannot find the spent output.

Recap

FCMPs, when instantiated with GBPs, are a O(n) proof of an O(log m) program (where m is the amount of members in the set). The program is a Merkle tree proof, where the same layer program is executed multiple times. The layer program is summarizable as follows:

  1. Unblind the Pedersen Commitment input.
  2. Verify the unblinded variable is present in the layer being hashed.
  3. Hash the layer, outputting a new Pedersen Vector Commitment (blinding the hash for the layer).

For the first layer, under Seraphis, the input would be the re-randomized squashed-enote (effectively a Pedersen Vector Commitment). The "unblinding" would remove the re-randomization and continue.

The final layer would use randomness = 0 such that the output Pedersen Vector Commitment is the tree root (or the protocol would take the difference and require a discrete log PoK, forcing the user ).

Difficulty

The naive solution would be to open K, the output key, inside the proof, then prove for the key-image inside the proof. This would require hardware wallets perform the FCMP proof, which is extremely undesirable/entirely unacceptable. Accordingly, the goal is for the membership proof to perform re-randomization without knowledge of x, letting a second proof prove for spend authorization and the key image.

Solution

Instead of building a Merkle tree of squashed enotes, we build a Merkle tree with elements (K, hash_to_point(K), C) where C is the amount commitment. We output re-randomized versions of all three and then prove the key image correctly formed.

Originally Proposed Modifications for Deployment During RingCT

EDIT: These should not be moved forward with. The alternative modifications in the next section achieve much better properties and are the basis of the forward-secret variant posted below. This is only present as it was discussed below and therefore would be confusing to remove entirely.

For the first layer, we don't take in the re-randomized squashed-enote (as this is pre-Seraphis), yet (r K, r**-1 H(K), C + a G). We denote this (K', H', C').

The first layer does not perform the traditional unblinding (addition of a G for a known a) for K', H'. Instead, it multiplies K by r (checking it's equal to K') and H' by r (checking it's equal to H(K)). We then perform the traditional unblinding of C' and check membership of (K, H(K), C).

This does not prove the spend is authorized nor that any claimed linking tag is correctly formed. It solely shows that the K', H' are derived as expected from a member of the tree.

We then extend this with a Discrete Log Equality proof for K', I over G, H'. The discrete logarithm of K' is r x, which we notate x'. x' H' expands to r x r**-1 H(K), with the r terms cancelling out for x H(K), the expected key image. This introduces linkability into the proof. By having the DLEq proof's transcript be over the message signed, spend authorization as well.

Proposed Modifications

With newly introduced generators T, U, V, and ignoring the amount commitment for now:

  • K' = K + aT
  • I' = hash_to_point(K) + bU
  • B = bV

The FCMP would only perform three standard unblinds (the third being C', ignored here), and show consistency between the bU term in I' with B. Instead of performing a DLEq proof after, we'd provide a proof of the following.

For matrix:

[
  [G,  T,  0], // Open K'
  [0,  0,  V], // Open B
  [U,  0,  0], // Create xU as a hint, Z, to prove for bxU without revealing bU
  [I', 0, -Z]  // x (H(k) + bU) + -bxU = x H(k)
]

The multi-scalar-multiplication by the consistent row of scalars:

[x, a, b]

The output is:

[
  K',
  B,
  Z,
  I
]

where Z is part of the proof data (formed as x U) and I is the key image.

This proves linkability, and again performs spend-authorization so long as the message is part of the proof's transcript.

Integrity

Obviously, implementation must be done correctly, and the proving system (GBPs) must be proven secure. Forging a member requires breaking the Merkle tree OR r, r**-1 consistency (for the original proposal) or bU, bV consistency (for the current proposal). The original proposal also assumes the trivial r = 0 case is banned, which it already should be by the rule against key images of identity (though we'd still explicitly prevent it).

Privacy

The privacy of the originally proposed scheme relies on it being infeasible to link K', H' to K, H(K). The most trivial way to solve this is by checking if DH(K, H(K)) == DH(K', H'), which is presumably reducible to the Decisional Diffie-Hellman. The Decisional Diffie-Hellman game asks if DH(A, B) == C. One who can check if two Diffie-Hellmans would be equivalent can check if a single Diffie-Hellman is equivalent via DH(A, B) == DH(C, G).

The second scheme's privacy is much more clear-cut. Given the commitment bV, one must calculate bU, which is the Computational Diffie-Hellman (with the caveat Monero key images are currently considered private under the weaker Decisional Diffie-Hellman problem, so this being stronger is irrelevant).

Features

Not only would this introduce FCMPs to Monero, it would also enable transaction chaining and post-proved membership if we don't bind the membership proof into the secondary proof's (whether DLEq or GSP) transcript.

Relation to Seraphis

Seraphis offers more efficient FCMPs, should provide forward privacy, and moves from hacked-together formal properties inherent to RingCT to formalized-from-the-start properties.

Unfortunately, Seraphis has been estimated as 2 to 5 years out by UkoeHB, with inclusion of FCMPs making it closer to 5 years. I disagree with that estimate, yet jberman estimated 3 years themselves. The goal of this is to solve the privacy issues solvable under RingCT, buying time to do better regarding design, performance/scalability, and the unsolvable issues (forward privacy). This may also be of indepedent merit.

Performance

I prior estimated Full-Chain Membership Proofs to be 35ms in a batch of 10. This was evaluated over the "pasta curves" with Bulletproofs+. The arithmetic-circuit argument for Bulletproofs, which we are currently using (or rather, Generalized Bulletproofs, a derivative of), should be twice as fast due to the relation to the underlying proof. This would mean ~18ms within a batch of 10.

Generalized Bulletproofs also notably reduces the amount of gates we use in circuit. Prior, we spent several gates entering the divisor into the circuit (a bunch of numbers which had to be entered somehow). Now, with Generalized Bulletproofs, we can simply provide an additional Pedersen Vector Commitment (32 bytes) to enter the divisor. This slashes the size of the circuit.

By adding SA+L, we'll have the additional performance costs within the first layer of:

  1. Proving for a tri-set membership.
  2. Doing multiple unblinds.
  3. Proving consistency of bH, bG or r, r**-1.

I'm expecting this makes the first layer ~3x more expensive.

So from for 1+1+1+1 at 35ms, to 1+1+1+1 at 18ms, to 3+1+1+1 (~31ms). That's ignoring the reduction in gates offered by GBPs.

Steps and Timeline

There are several different tracks along which we can move, and we should move along in parallel.

Implementation

  1. Implement GBPs.

This has already been done and solely needs optimizations performed.

  1. Implement an amenable framework for building arithmetic circuits.

This has technically been done, yet needs some love.

  1. Implement a library for elliptic curve divisors.

Technically done, yet with an edge case that should be fixed.

  1. Implement the various gadgets (set membership, index within set, PoK of DLog, proof of DLog).

This was done as needed for FCMPs, not as needed for FCMP+SA+L which has some distinctions (multi-set membership, which is done via getting the index of each item within each set and checking equality, and proof of DLog).

  1. Implement the FCMP+SA+L circuit.

This was done for FCMPs yet not FCMP+SA+L.

  1. Implement the secondary proof.

This isn't too notable, with a polished implementation hopefully being within a few days of work for the second case. The first case is prior implemented.

  1. Implement the necessary towering curves.

This is trivial to do over a generic integer library, yet performance effectively requires a dedicated implementation be done.

  1. Integrate into Monero.

Formal Security

  1. Formally prove the soundness, completeness, and zero-knowledge properties of GBPs.

We can do this now. Diego claimed a deliverable, or lack thereof if a failure occurs, could happen in one month.

  1. Have the divisor-technique reviewed, having been prior published by Eagen.

This is possible now.

  1. Formally prove the correctness of the "gadgets", as logical.

This is possible with just a formal description of the circuit, and the gadgets contained within. The Curve Trees paper did specify their gadgets, yet we have a notably different construction due to the usage of divisors and the extensions proposed here.

  1. Formally prove the correctness of the secondary proof.

The first proposal uses a DLEq which has existed since Chaum-Pedersen's 1993 work. The second uses a proof I frequently see referred to as a "Generalized Schnorr Protocol", despite my personal dislike for the ascription. I wouldn't be surprised if it was already proven.

Practical Secrity

  1. Audit the implementation of GBPs.

This is possible once the further optimizations are done, and the formal proofs exist.

  1. Audit the circuit framework.

This is possible once the framework is cleaned to a point we're happy with.

  1. Audit the EC divisor library.

This is possible as soon as the standing edge case is resolved.

  1. Audit the implementation of the gadgets.

This should be trivial and possible as soon as the gadgets are formally proven.

  1. Audit the implementations of the towering curves.

This is possible as soon as the curves are implemented. If we implement them with a constant-time integer library, we'd solely audit that.

  1. Audit the circuit.

  2. Audit the secondary proof.

This could be audited as soon as it's implemented, which again, should be trivial to implement.

  1. Audit the integration.

Contribution to Seraphis

Creation of FCMP+SA+L effectively does all the work necessary for FCMPs with Seraphis with regards to FCMPs themselves. It'd provide the proving system, gadgets, review/audits, and most of the composition work. The only additions would be the curve cycle libraries (if we switched curves), the cross-group DLEq we'd move commitments with, and the new integration (into Seraphis instead of into RingCT).

Steps Forward

We'd have to agree on which proposal to move forward with, if we were to move forward.

While the first proposal was the one originally posited, it remains here solely for historic reasons. It appears to perform one less PoK of DLog, making it faster, yet it requires using two private points as generators within Proof of (Knowledge of) DLog statements. Such proofs are much more performant when the generator being used is public.

The second proposal also has the benefit of a much stronger argument for privacy, and uses additive randomness (instead of multiplicative).

We'd then need various implementation work done. I can personally implement the curve libraries using crypto-bigint, yet I cannot implement tailored implementations as likely needed to maintain the performance target. Someone else would need to step up.

I believe I can have all the implementation done, besides integration into Monero, within a few months. The actual integration into Monero would be:

  1. Having Monero build the tree.
  2. Having wallet2 call prove via the Rust library.
  3. Adding RPC routes to get the current tree root and the path of a member of the tree (where the DSA would be used to request multiple paths as to not reveal the output being spent).
  4. Having monerod call verify via the Rust library.

There'd be no wallet/address/privacy set migration. All of the additional curves would be contained within the proof.

I cannot estimate/guarantee the timeline for the formal analysis side of things. I'd hope by clearly defining components, the rate at which review occurs is not bottlenecked by the rate at which development occurs. That doesn't change we'd have to efficiently manage the timeline and definition of components. I'd hope for, in the best case, development + review/audits of the building blocks in 3-6 months, targeting deployment in ~12 months. Beyond myself, the only development necessary is of dedicated curve implementations (towering curves having already been found by tevador) and the integration into Monero (which jberman already started for FCMPs into Seraphis).

@kayabaNerve
Copy link
Author

ACK. Will discuss at aforementioned meetings and see how other developers/researchers feel.

@kayabaNerve
Copy link
Author

@tevador We still need permissibility for the output of the hash function, unless we lose collision resistance for branch hashes, yet that should be DoS-mitigated as the hash is by branch (no one output can cause thousands of operations unless the only item in its batch).

@tevador
Copy link

tevador commented May 4, 2024

we lose collision resistance

Do we?

Let's take the simplest case of a binary tree and Pedersen hash H(a, b) = x(H_0 + a H_1 + b H_2), where x() means we only take the x-coordinate of the point as the hash result.

Suppose that someone can find an algorithm that takes (H_0, H_1, H_2) as input and produces two different tuples (a, b) and (a', b') such that H(a, b) = H(a', b').

We can turn that algorithm into a discrete log oracle because:

  1. If a H_1 + b H_2 = a' H_1 + b' H_2, then H_1 = (b' - b)/(a - a') H_2.
  2. If H_0 + a H_1 + b H_2 = - H_0 - a' H_1 - b' H_2, then H_0 = -(a + a')/2 H_1 - (b + b')/2 H_2 and we still get a discrete log oracle if we input for example H_2 = r H_1.

So finding a collision with an x-only Pedersen hash should still be as hard as solving the discrete log. It's important to have H_0 != O, otherwise you get an easy collision (a, b) and (-a, -b).

@kayabaNerve
Copy link
Author

It is (a, b) and (-a, -b) which I was concerned of. That produces outputs (x, y), (x, -y). I'm unsure the impact of having such a known collision when said -a would still need to be the hash of some branch, and finding such an opening would still be the discrete log problem? Yet it's best not to mess with it.

If you include a initialization of G_0 which is fixed with a coefficient of 1... then this explicitly becomes under the discrete log problem to find a collision? So ACK :) Good thinking. I believe when we pass the blinded Pedersen Commitment to the next layer up, we'd simply add this point, so it'd be done out of circuit which is quite nice. It does incur one extra addition to every hash output, but that should still be better than incurring the permissibility.

@tevador
Copy link

tevador commented May 4, 2024

In the FCMP++ PDF, chapter 6.1, where you write "where all members of branches are zero by default", there should at least be a footnote explaining that this is safe because Selene and Helios don't contain a point with x = 0 (that could be used to fake the presence of a member in the tree).

For this reason, we also don't need any impermissible dummy values that are suggested in the Curve Trees paper in Appendix E.

@kayabaNerve
Copy link
Author

For what it's worth, I did write out the circuit and verify the constraints pass for a honest prover. With that, we effectively have a correctness proof (not a soundness proof to anyone who is reading this without a formal background), letting us move to review/audits/formal verification. At least, we do once I update the paper with the fixes I made on the code side of things...

Re: development, it's been much smoother than expected. I was expecting notable challenges/frustrations based on my prior work on FCMPs. It was quite cumbersome to work with. While I did carry the code from it, as part of "productionizing", I did rewrite vast swaths of it (the existing code providing a reference on what works, the new code doing what works how it should be done).

Part of my thoughts on the amount of rewriting from prior work can be seen in this gist and in my CCS.

Implement an amenable framework for building arithmetic circuits.

is a milestone here, yet not in my CCS. The generic arithmetic circuit framework was fine, yet became incredibly bogged down when we added challenges/Pedersen Vector Commitments. This new work scraps it entirely for a minimal framework (enough to write circuits safely and without scratching my eyes out), then treats the PVCs as outside of the framework. This specification, only implementing what we need as we need, without a layer of abstraction around it, definitively helped. This decision was made prior to the CCS (hence it not being a milestone there).

The explicit document also helped, as it corrected a lot of development before it began. I'm hoping to be able to start work on the exposed API, and the integration side of things, in a couple weeks (letting jberman and co start on integration while I continue work on the FCMP++ side of things).

@kayabaNerve
Copy link
Author

ACK re: zero. The Helioselene library I published earlier today does have such tests.

@tevador
Copy link

tevador commented May 5, 2024

I ran some tests with Ristretto and unfortunately, it doesn't solve all of our problems.

Ristretto abstracts torsion away by choosing one representative from each of the 8-torsion cosets. This choice is made during serialization. However, the chosen representative is not always the point in the prime-order subgroup. For example, the base point deserialized from the ristretto representation e2f2ae0a6abc4e71a884a961c500515f58e30b6aa582dd8db6a65945e08d2d76 will have a torsion component (the y-coordinate is not equal to the canonical value of 4/5).

This has two consequences for us:

  1. In order to get the legacy point representation for hashing (to get the key image base), torsion clearing still needs to be done after deserializing K from the Ristretto format.
  2. Key images serialized in Ristretto format may still have torsion, which must be cleared to get the legacy representation.
  3. When converting a Ristretto point to Wei25519, we will still have torsion.

The first point could be overcome by redefining the key image base for post-fork output keys from Hp(legacy_repr) to Hp(ristretto_repr). This should be safe because each output will still have only one well-defined key image, so double spending will not be possible.

For the second point, torsion clearing with mul8 is unavoidable even with Ristretto.

For the third point, this is probably not a real problem because each K and C will still have only one well-defined x-coordinate in Wei25519 format. We would need to avoid any equality comparisons in Wei25519 format and use strictly Ed25519 format and Ristretto comparison (e.g. to check for identity) and only convert to Wei25519 immediately prior to hashing.

@tevador
Copy link

tevador commented May 5, 2024

For the second point, torsion clearing with mul8 is unavoidable even with Ristretto.

To clarify: Torsion clearing is avoidable only if we also migrate all historical key images to the Ristretto format. It would be a bit tricky though, with the proposed change to have the same representation for both KI and -KI,

@kayabaNerve
Copy link
Author

kayabaNerve commented May 5, 2024

... that may raise the new question of can we convert old key images?

If Ristretto has a random torsion component, and we have the full Ed25519 points (with sign data), within 8 additions we can presumably find its Ristretto-decoded equivalent. i don't love this idea and we'd need more specification to further comment, but I don't think it's impossible.

EDIT: As I posted this, the page refreshed and showed tevador's comment. Seems we're on the same page there.


I removed permissibility. I'll move to updating the paper to recent developments next. I'm unsure if my Helioselene library is 3 or 50x slower than dalek. My computer is unreliable for benchmarks :/ I'm getting its field arithmetic is somehow faster than dalek (despite not using a tailored impl), yet point addition is ~3x slower, yet the multiexp is ~50x slower. My honest guess is it's on the scale of 5-10x slower, but I'm only planning for a 2x performance increase to the entire proof while considering efficiency.

I truly can't say I have the experience nor interest in making a faster library. If anyone wants to do the field arithmetic in C and have me port, I'd be fine doing so. While I don't believe the learning curve for Rust on this amount of low-level code atrocious, I respect lack of interest in working on Rust (as I would a lack of interest in working on any topic, really).


We do need a test confirming Wei25519's x-coordinate of 0 does not have a valid point associated or a distinct representation for null leafs.

@tevador
Copy link

tevador commented May 5, 2024

can we convert old key images?

Yes, it's definitely possible.

  1. Decode the key image into extended coordinates using the legacy method.
  2. Encode the point in extended coordinates into the Ristretto format.

I think it should be possible to tweak the Ristretto encoding function so that both P and -P get the same representation (this would only be used for key images).

I truly can't say I have the experience nor interest in making a faster library. If anyone wants to do the field arithmetic in C and have me port, I'd be fine doing so. While I don't believe the learning curve for Rust on this amount of low-level code atrocious, I respect lack of interest in working on Rust (as I would a lack of interest in working on any topic, really).

It's on my TODO list. I want to take it as an opportunity to learn Rust.

We do need a test confirming Wei25519's x-coordinate of 0 does not have a valid point associated or a distinct representation for null leafs.

Wei25519 has a point with x = 0, but we'd have to check if the point can be reached from a deserialized Ristretto point. There is a 7/8 chance it won't be reachable.

@kayabaNerve
Copy link
Author

It's on my TODO list. I want to take it as an opportunity to learn Rust.

:)

Wei25519 has a point with x = 0, but we'd have to check if the point can be reached from a deserialized Ristretto point. There is a 7/8 chance it won't be reachable.

If we move to Ristretto, which still requires someone step up there in a timely fashion and isn't blocked by larger sentiment/time to review it. For now, I'll add a note that we use a the lowest x-coordinate which cannot be used within a point for the leaves (and 0 for the branches on Helios/Selene per reasoning you noted).

@tevador
Copy link

tevador commented May 6, 2024

Wei25519 has a point with x = 0, but we'd have to check if the point can be reached from a deserialized Ristretto point. There is a 7/8 chance it won't be reachable.

I confirm that the two Wei25519 points with x = 0 are unreachable, so we are lucky. Ristretto only adds a random 4-torsion element, but these two points have an order of 8*ℓ, so they can never be produced from valid Ristretto points.

@tevador
Copy link

tevador commented May 6, 2024

For the third point, this is probably not a real problem because each K and C will still have only one well-defined x-coordinate in Wei25519 format. We would need to avoid any equality comparisons in Wei25519 format and use strictly Ed25519 format and Ristretto comparison (e.g. to check for identity) and only convert to Wei25519 immediately prior to hashing.

So it turns out that this is a problem. The blinded key K' can have a different torsion than the original key K in the tree, in which case a membership proof would be impossible because after subtracting the blinding from K', the result would have a different x-coordinate. The same applies to C' and C.

A naive solution to adjust the blinding until K' and K have the same Ristretto torsion would be a disaster for privacy, because it would reduce the effective anonymity set to 1/4.

The only solution I see at the moment is to redefine the leaf-layer hash as H(K, I, C) := H(4*K, I, 4*C). This is effectively torsion clearing, except it only needs two doublings instead of three (thanks to Ristretto).

Since we can't avoid torsion clearing, this raises the question if we should forget Ristretto and go with the simple mul8 solution instead.

@kayabaNerve
Copy link
Author

I can't claim I'm happy with these torsion discussions and want to continue with them, especially if the end result would be separation of the privacy pools. While we can save a single doubling (two instances per output) without separation (so that definitely won't be the end result), I don't believe the complexity is worth it. The entire point of my torsion clearing in the document was to avoid spending days on these discussions of what's safe and what isn't and to establish a safe, clear, simple policy. While I don't want to say we shouldn't spend days on say, a 5% performance increase, I do believe this discussion is far from that.

I will say I'm happy if Wei25519's 0 x-coordinate is torsioned, as that means this new policy of torsion clearing effectively bans 0, allowing us to use 0 for null leafs. Non-0 for null is yet another such annoyance I'd like to be able to simplify out.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 6, 2024

@tevador For hash to point, do you have a better candidate than the IETF spec and SSWU for the mapping? I'm unsure if you had a specific research effort in mind for these curves.

@tevador
Copy link

tevador commented May 6, 2024

I can't claim I'm happy with these torsion discussions and want to continue with them, especially if the end result would be separation of the privacy pools. While we can save a single doubling (two instances per output) without separation (so that definitely won't be the end result), I don't believe the complexity is worth it. The entire point of my torsion clearing in the document was to avoid spending days on these discussions of what's safe and what isn't and to establish a safe, clear, simple policy. While I don't want to say we shouldn't spend days on say, a 5% performance increase, I do believe this discussion is far from that.

The Ristretto solution was my personal investigation and I think it was important to do it before commiting to a possibly suboptimal solution. Given my comments above, I think it's sensible to move forward with the original torsion clearing solution.

I will say I'm happy if Wei25519's 0 x-coordinate is torsioned, as that means this new policy of torsion clearing effectively bans 0, allowing us to use 0 for null leafs.

Correct. Torsion-cleared Wei25519 points will never have x = 0.

@kayabaNerve
Copy link
Author

The Ristretto solution was my personal investigation and I think it was important to do it before commiting to a possibly suboptimal solution.

Understood and respected. I participated out of belief it may produce a notably better solution. I just wanted to note my lack of active belief (which quickly stated fading when we discussed promoting existing key images to Ristretto representations).

@tevador
Copy link

tevador commented May 6, 2024

For hash to point, do you have a better candidate than the IETF spec (which may still be a draft? I forget if it was ever finalized) and SSWU for the mapping? I'm unsure if you had a specific research effort in mind for these curves.

Do we need hash to point for Selene and Helios? For the generators, we can use the simple method of hashing to a bitstring and trying to decompress it to a point.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 6, 2024

Yes, for Wei25519, Helios, and Selene (though we can of course hash to a birationally equivalent curve and then map, if that is cheaper). The divisor challenges require we sample two challenge points and then evaluate the divisor over their line.

Currently, I'm randomly sampling x's and recovering y. I don't actually want to claim that's optimal.

Also, the divisor challenge is the most expensive part of the FCMP right now (barring the multiexp, which executes in batch). It requires not only these hash to points, yet also creating hundreds of scalar powers (256 for the powers of x, ~256 for the powers of x multiplied by y). It still should be a small amount of work (hampered by the current Helioselene library not meeting performance expectations), that just doesn't change its percentage of the overall work. This final proof has ended up quite compact thanks to the current usage of divisors (it's 2048 or 4096 rows in the inner-product statement to just 256).

@tevador
Copy link

tevador commented May 6, 2024

The divisor challenges require we sample two challenge points and then evaluate the divisor over their line.

OK, I wasn't aware. I think we can use RFC 9380 and Simplified SWU for hashing as you suggested. Wei25519 will need torsion clearing after the mapping.

@kayabaNerve
Copy link
Author

ACK, and all good.

@kayabaNerve
Copy link
Author

Slight correction. We should be able to reuse challenges. That puts us at a flat two hash to points per Bulletproof.

For context on challenge reuse, the divisors are all evaluated independently (there's no ability to create one which plays off another). They just require a challenge binding to their variables. All existing hashes are drawn from one transcript of every such variable, so every challenge right now is binding to every variable. Accordingly, we should be able to collapse challenging without issue.

That should roughly halve the time the FCMP circuit takes? (FCMP circuit -> GBP verification -> batched multiexp, so halving the FCMP circuit is not halving the entire proof, just a notable part of it)

@hinto-janai
Copy link

Hello, I have some questions:

Outgoing View Keys

Is it as simple as adding this key + the current incoming key for full view-only wallets to exist? How would current wallets migrate to this? They can generate a new key and calculate o*G+y*T, but wouldn't this also mean providing 2 addresses?

Forward Secrecy [...] An adversary with a discrete log oracle also cannot distinguish between an unspent non-forward-secret output and a forward-secret output. Such an adversary can only calculate what the linking tag would be if the output isn’t forward secret, and wait to see if that appears

Does this mean this adversary can assert the existence of unspent non-FCMP outputs, and when they are spent? I.e. there's 2 output sets before entering the main FCMP set?

 /------------------------\        /------------------------------------\
| unspent pre-FCMP outputs |      |                                      |
 \------------------------/       |                                      |
             |                    |        main post-FCMP outputs        |
             v                    | (unknown origin, but is known to not |
 /------------------------\       |   be one from the previous 2 sets)   |
|    post-FCMP outputs     | ---> |                                      |
|     (known origin)       |      |                                      |
 \------------------------/        \------------------------------------/

If a post-FCMP output can be identified via the pre-FCMP output's linking tag, can a post-FCMP output be determined to be a member of either the known origin or main set by seeing if a pre-FCMP output's linking tag leads to it? Not sure if this really matters, just curious.

Tree

Maybe for @j-berman:

How is the tree planned to be stored? Additional tables in the main database? For unsynced nodes, would tree operations occur alongside block downloading + verification? For synced nodes, will new post-FCMP++ nodes have to stall on startup for a bit while generating the whole tree? Would this be done as a database migration?

If an output has a time-based timelock prior to the activation of the FCMP++ hard fork, it is converted to an estimated block-based timelock

Assuming timelocks aren't banned before FCMP++, how exactly would this be done? Convert time to blocks by dividing by 120 and add onto the current block height?

DB migration

For context, the last time migration code was touched was by mooo 5 years ago. I'm unsure if there's anyone currently willing to do it, although as far as I can tell it's relatively straightforward.

@kayabaNerve
Copy link
Author

but wouldn't this also mean providing 2 addresses?

Under my proposal, only the new address would need to be provided. They're indistinguishable and have an identical sending process. I make no guarantees my proposal would not impact the relevant JAMTIS proposal which only declares itself compatible with the current addresses.

Does this mean this adversary can assert the existence of unspent non-FCMP outputs, and when they are spent? I.e. there's 2 output sets before entering the main FCMP set?

Sorry, but this entire section isn't really sufficiently well formed for me to reply to its questions/comments. I'd rather clarify things from the start, and if you still have questions, follow up there.

There's currently the Ring outputs and the RingCT outputs. Both of these are non-FS and continue to be non-FS.

After FCMPs, FS outputs become possible. You may still make non-FS outputs yet may also make FS outputs (if there's an address for FS outputs or a protocol for FS outputs).

A historical non-FS output can have its linking tag calculated. Once calculated, an adversary can wait for it to appear on-chain.

A FS output cannot have its linking tag calculated. It has as many linking tag discrete logarithms as scalars for the curve. If you assume all outputs aren't FS, setting the new T term to have a coefficient of 0, you can obtain the linking tag it'd have if it wasn't FS. You can then wait and see if that linking tag appears on chain (confirming it isn't a FS output OR another entity has a solution for the discrete log problem at time of appearance on-chain).

Accordingly, an adversary with a discrete log oracle cannot differentiate unspent non-FS outputs and FS outputs. They both have linking tag discrete logarithms recoverable if you assume the T term has a coefficient of 0. Both will not have those linking tags actually appear on-chain. You can only differentiate spent non-FS outputs as those will have their linking tags appear on-chain.

This assumes we don't deploy FS at a hard fork boundary, which it sounds like we will with the above JAMTIS proposal. My proposal enables it to be done whenever without new wallet protocols (not to suggest its better than the JAMTIS proposal or still an active candidate. Just to note how these discussions formed).

@j-berman
Copy link

How is the tree planned to be stored? Additional tables in the main database?

Yep, here's a draft schema:

/* DB schema:
 *
 * Table            Key          Data			          Properties
 * -----            ---          ----			          ----------
 * leaves           leaf_idx     {O.x, I.x, C.x}	          Integer key, no dups, fixed size, sorted by `leaf_idx`
 * branches         layer_idx    [{branch_idx, branch_hash}...]   Integer key, no dups, fixed size, sorted by `layer_idx` first `branch_idx` second
/*
  • In the branches table:
    • The largest layer_idx in the db == tree root
    • The largest layer_idx in the db should only have a single record in the db, with branch_idx == 0
    • A branch refers to the hash of a chunk w_c elements in layer_idx-1 (or if layer_idx == 0, in the leaves table)
    • branch_idx == element idx in layer_idx-1 / w_c
    • branch_hash is 32 bytes that can either be a Helios point or Selene point
      • even layer_idx == selene
      • odd layer_idx == helios
  • This table approach should enable efficient queries for path by leaf_idx (will know all db reads needed immediately, can make parallel reads in theory).
    • The reason the branches table uses layer_idx as a key, and uses value [{branch_idx, branch_hash}...] is to optimize the same way output_amounts table does it (which is optimized for reading outputs by amount and global output ID). The primary index is layer_idx (like amount) and secondary index is branch_idx (like output_id). Planning on doing some more perf testing on this approach as well.
  • The block header stores the tree root hash at that block, and the end leaf_idx added to the tree for that block.
  • We'll also need a new table to keep track of locked outputs sorted by unlock_time.
  • We'll also want a migration for key images also described here.

For unsynced nodes, would tree operations occur alongside block downloading + verification?

Yep

For synced nodes, will new post-FCMP++ nodes have to stall on startup for a bit while generating the whole tree? Would this be done as a database migration?

I initially figured nodes would stall too; @kayabaNerve proposed implementing the migration as an async background task, which would only stall the node if the tree is not finished constructing at the fork height where FCMP's begin. Sounds like a solid idea to me.

For context, the last time migration code was touched was by mooo 5 years ago. I'm unsure if there's anyone currently willing to do it, although as far as I can tell it's relatively straightforward.

I included the migration of cryptonote outputs to the tree as part of my CCS proposal :)

Assuming timelocks aren't banned before FCMP++, how exactly would this be done? Convert time to blocks by dividing by 120 and add onto the current block height?

The timestamp timelocks are unix timestamps for when an output should unlock to clarify. So we can do something like: take timestamp from some block N, extrapolate block times into the future from that starting point using 120s for each block, then convert timestamps to unlock at their respective extrapolated block.

Thinking on it some more.. it's probably fairly straightforward to have a separate table for timestamp locked outputs sorted by unlock_time also. Then when adding a block, check the lowest unlock_time output in the table; if the output is unlocked using get_adjusted_time at that height, remove from table and add to tree, and continue iterating removing outputs from the table until encountering an output that should remain locked at that block's get_adjusted_time. This way we wouldn't need to do the conversion from timestamp to block.

@tevador
Copy link

tevador commented May 13, 2024

Is it as simple as adding this key + the current incoming key for full view-only wallets to exist? How would current wallets migrate to this? They can generate a new key and calculate o*G+y*T, but wouldn't this also mean providing 2 addresses?

Outgoing view keys are only planned for the new Jamtis address format. Legacy CryptoNote addresses will not have outgoing view keys.

Existing wallets cannot be safely migrated to support outgoing view keys because that would change all their existing addresses.

@kayabaNerve
Copy link
Author

@tevador I actually would like to distinctly follow up with you on if a wallet whose current main addresses are (sG, vG) will survive JAMTIS's backwards compatibility if malleated to (sG + yT, vG), as I originally proposed. While I hear you existing wallets cannot simply move, I'm personally curious about developing wallet software outside a HF boundary which achieves FS in such a manner for new wallets (though I don't want to have said addresses bork at time of JAMTIS).

@tevador
Copy link

tevador commented May 13, 2024

Forward-secrecy can be achieved at the time of FCMP activation with a tiny change in the sender-receiver protocol. The sender would simply add a T-term to the one-time address. This would bring immediate forward secrecy for everyone and not break existing wallets.

I do not support creating any additional new wallet types before Jamtis.

@kayabaNerve
Copy link
Author

In that case, mind documenting the exact proposed change and can we circle in @j-berman so we can push for such a change? Presumably, it's just an extra derivation off the shared secret?

Slight caveat that achieves weaker forward secrecy, which I note now in case such thoughts also impact JAMTIS. The lack of any binomial components in the existing addresses allow breaking known addresses (whereas if there was such an additional term as in my original proposal, you can identify sends to such addresses, yet you can not identify their spends).

@tevador
Copy link

tevador commented May 13, 2024

Slight caveat that achieves weaker forward secrecy, which I note now in case such thoughts also impact JAMTIS. The lack of any binomial components in the existing addresses allow breaking known addresses (whereas if there was such an additional term as in my original proposal, you can identify sends to such addresses, yet you can not identify their spends).

Forward secrecy always assumes that the DLP-breaking adversary doesn't know your address. That applies to both of the discussed cases. Regardless of any T-terms present in the address, the adversary can simply extract the secret view keys from the address, which makes any attempts for forward secrecy moot unless we migrate to a PQ-secure key exchange.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 13, 2024 via email

@tevador
Copy link

tevador commented May 13, 2024

We should not overestimate the forward-secrecy capabilities in case of leaked addresses. For example, the adversary can detect spends to known addresses (they will see a change output going to the sending wallet and a payment output going to the receiving wallet).

I want to reiterate that there is no safe way to malleate the keys of legacy addresses. Adding T-terms to one-time addresses achieves what is generally called forward-secrecy, is backwards compatible with existing addresses and forward-compatible with Jamtis. I don't see any other viable solutions.

@kayabaNerve
Copy link
Author

Fair. I'll circle back to

In that case, mind documenting the exact proposed change and can we circle in @j-berman so we can push for such a change? Presumably, it's just an extra derivation off the shared secret?

For the potential at-time-of-hard-fork solution.

@tevador
Copy link

tevador commented May 13, 2024

In that case, mind documenting the exact proposed change

It should be relatively simple.

variable description
K_d "key derivation" (sender-receiver shared key)
idx output index
K_1 (sub)address public spend key
K_o "one-time address" (output key)

Current one-time address derivation:

k_g = hash_to_scalar(K_d || idx)
K_o = K_1 + k_g G

Proposed change after the FCMP-fork:

k_g = hash_to_scalar(G || K_d || idx)
k_t = hash_to_scalar(T || K_d || idx)
K_o = K_1 + k_g G + k_t T

@kayabaNerve
Copy link
Author

We can eliminate the burning bug while we're at it by also prefixing the first input's key image if we're at it (modifying the shared key derivation). I'll leave further thoughts/discussions on this to @j-berman for now.

@tevador
Copy link

tevador commented May 13, 2024

We can eliminate the burning bug while we're at it

While I'm not saying I'm against this, we need to be careful about scope creep.

@tevador
Copy link

tevador commented May 13, 2024

Thinking on it some more.. it's probably fairly straightforward to have a separate table for timestamp locked outputs sorted by unlock_time also. Then when adding a block, check the lowest unlock_time output in the table; if the output is unlocked using get_adjusted_time at that height, remove from table and add to tree, and continue iterating removing outputs from the table until encountering an output that should remain locked at that block's get_adjusted_time. This way we wouldn't need to do the conversion from timestamp to block.

The problem is that get_adjusted_time is not monotonic. Adding a new block to the blockchain can in some cases result in outputs becoming locked again, unnecessarily increasing complexity.

I propose the following to be implemented at the time of the fork:

  1. Reduce the coinbase lock time to the default of 10 blocks that apply to all spends. This was discussed in research-lab#104.
  2. Ban new time locks. This would be done by a consensus rule mandating unlock_time to be zero as a natural extension of monero#9151.
  3. Convert all time-based legacy locks to height-based locks as unlock_height = height + (unlock_time - get_adjusted_time(height)) / DIFFICULTY_TARGET_V2, where height is the block height where the time-locked transaction was confirmed.

@j-berman
Copy link

j-berman commented May 13, 2024

I propose the following to be implemented at the time of the fork

I'm fine with this plan. Getting rid of the feature is still a +1 in my view as it's more likely to cause harm than benefit as it's been used.

Adding a new block to the blockchain can in some cases result in outputs becoming locked again, unnecessarily increasing complexity.

Clarifying that with the approach to maintain the existing time-based locks, the new rule at the fork would become first get_adjusted_time where time-based timelock is unlocked == block in which the output unlocks and is guaranteed unlocked from that point on. Either way we're introducing a new rule dictating how the output is unlocked.

Considering how little the feature is used and there's rough agreement to deprecate it, I think the optimal route is the path of least resistance and least complexity.

In that vein, I think converting time-based to block-based may still be optimal. I think the picture will be clearer upon implementing handling block-based timelocks first.

@hinto-janai
Copy link

I'd rather clarify things from the start, and if you still have questions, follow up there.

Sorry, not sure how to word things correctly. Given output movement like this:

non-FS output (A) ---> FS output (B) ---> output (C)

are these statements correct?:

  • A is known to link to B
  • B is known to come from A
  • C is known to not come from A
  • C is known to come from some FS output (but not necessarily B)

@j-berman thanks, details are appreciated.

We'll also want a migration for key images also described here

Ah okay, I thought the discussion here lead to this not being needed.

which would only stall the node if the tree is not finished constructing at the fork height where FCMP's begin.

So there will be leeway before the fork height to give nodes time to build the tree?

Thinking on it some more.. it's probably fairly straightforward to have a separate table for timestamp locked outputs sorted by unlock_time also. Then when adding a block, check the lowest unlock_time output in the table; if the output is unlocked using get_adjusted_time at that height, remove from table and add to tree, and continue iterating removing outputs from the table until encountering an output that should remain locked at that block's get_adjusted_time. This way we wouldn't need to do the conversion from timestamp to block.

The problem is that get_adjusted_time is not monotonic. Adding a new block to the blockchain can in some cases result in outputs becoming locked again, unnecessarily increasing complexity.

There's already up to 120 seconds of leeway for timelocks, would this be enough for the proposed table to be accurate enough?

@tevador
Copy link

tevador commented May 13, 2024

Clarifying that with the approach to maintain the existing time-based locks, the new rule at the fork would become first get_adjusted_time where time-based timelock is unlocked == block in which the output unlocks. Either way we're introducing a new rule dictating how the output is unlocked.

Fair point. Since we are already changing the rule, it makes sense to go with the simpler option that doesn't need a call to get_adjusted_time for every future block. I would also expect the code to be simpler if all historical time-locks are treated the same.

@j-berman
Copy link

So there will be leeway before the fork height to give nodes time to build the tree?

Ideally we'd release monerod containing the update well in advance of the fork height (last fork v18 release was released ~1 month in advance IIRC). So users could update their nodes, and it would start building the tree in the background on startup.

it makes sense to go with the simpler option

I lean toward converting time-based to block-based as well.

There's already up to 120 seconds of leeway for timelocks, would this be enough for the proposed table to be accurate enough?

We can also add an extra block like that for the extrapolated block, but imo I think the answer to "is it accurate enough" is likely yes. It's probably worth qualifying that with some analysis of block's adjusted times in practice, but the get_adjusted_time code appears fairly reasonable to me.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 13, 2024 via email

@kayabaNerve
Copy link
Author

kayabaNerve commented May 13, 2024 via email

@kayabaNerve
Copy link
Author

Presumably too late now, but we could've squashed $R$ into $\tilde{O}$. It'd save 32 bytes from the input tuple, save a point subtraction when prepping the GSP out-of-circuit. In-circuit, we'd save a discrete log proof (and with it, two VCs, which are 64 bytes and non-trivial re: GBP perf, prob a few percent).

We can ignore this for now and follow up as things continue (it'd be extending the current composition review, yet we don't have a requirement to do things perfectly contiguously and can revisit in a few weeks or even couple months).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment