Skip to content

Instantly share code, notes, and snippets.

@kayabaNerve
Last active May 1, 2024 21:13
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kayabaNerve/0e1f7719e5797c826b87249f21ab6f86 to your computer and use it in GitHub Desktop.
Save kayabaNerve/0e1f7719e5797c826b87249f21ab6f86 to your computer and use it in GitHub Desktop.
FCMP+SA+L

Full-Chain Membership Proofs + Spend Authorization + Linkability

This proposes an extension to FCMPs to make them a drop-in replacement for the exsting CLSAG. In order to be such a replacement, the proof must handle membership (inherent to FCMPs), spend authorization, and linkability.

Terminology

  • FCMP: Full Chain Membership Proof
  • GBP: Generalized Bulletproof, a modified Bulletproof supporting Pedersen Vector Commitments in its arithmetic circuit proof. Security proofs are currently being worked on by Cypher Stack.
  • GSP: Generalized Schnorr Protocol, a proven protocol for 2009 for proving: For a m*n matrix
    [
        [A, B, C],
        [D, E, F]
    ]`
    
    And a n-length row
    [x, y, z]
    
    The m-length output column:
    [
      T,
      U,
    ]
    
    is
    [
      xA + yB + zC,
      xD + yE + zF,
    ]
    
    This is useful to prove knowledge of the opening of a vector commitment and consistency between vector commitments.
  • BP+: Bulletproof+. A weighted-inner-product proof currently in use in Monero.
  • F-S: Forward secret. An adversary who can solve for the elliptic curve discrete log problem cannot find the spent output.

Recap

FCMPs, when instantiated with GBPs, are a O(n) proof of an O(log m) program (where m is the amount of members in the set). The program is a Merkle tree proof, where the same layer program is executed multiple times. The layer program is summarizable as follows:

  1. Unblind the Pedersen Commitment input.
  2. Verify the unblinded variable is present in the layer being hashed.
  3. Hash the layer, outputting a new Pedersen Vector Commitment (blinding the hash for the layer).

For the first layer, under Seraphis, the input would be the re-randomized squashed-enote (effectively a Pedersen Vector Commitment). The "unblinding" would remove the re-randomization and continue.

The final layer would use randomness = 0 such that the output Pedersen Vector Commitment is the tree root (or the protocol would take the difference and require a discrete log PoK, forcing the user ).

Difficulty

The naive solution would be to open K, the output key, inside the proof, then prove for the key-image inside the proof. This would require hardware wallets perform the FCMP proof, which is extremely undesirable/entirely unacceptable. Accordingly, the goal is for the membership proof to perform re-randomization without knowledge of x, letting a second proof prove for spend authorization and the key image.

Solution

Instead of building a Merkle tree of squashed enotes, we build a Merkle tree with elements (K, hash_to_point(K), C) where C is the amount commitment. We output re-randomized versions of all three and then prove the key image correctly formed.

Originally Proposed Modifications for Deployment During RingCT

EDIT: These should not be moved forward with. The alternative modifications in the next section achieve much better properties and are the basis of the forward-secret variant posted below. This is only present as it was discussed below and therefore would be confusing to remove entirely.

For the first layer, we don't take in the re-randomized squashed-enote (as this is pre-Seraphis), yet (r K, r**-1 H(K), C + a G). We denote this (K', H', C').

The first layer does not perform the traditional unblinding (addition of a G for a known a) for K', H'. Instead, it multiplies K by r (checking it's equal to K') and H' by r (checking it's equal to H(K)). We then perform the traditional unblinding of C' and check membership of (K, H(K), C).

This does not prove the spend is authorized nor that any claimed linking tag is correctly formed. It solely shows that the K', H' are derived as expected from a member of the tree.

We then extend this with a Discrete Log Equality proof for K', I over G, H'. The discrete logarithm of K' is r x, which we notate x'. x' H' expands to r x r**-1 H(K), with the r terms cancelling out for x H(K), the expected key image. This introduces linkability into the proof. By having the DLEq proof's transcript be over the message signed, spend authorization as well.

Proposed Modifications

With newly introduced generators T, U, V, and ignoring the amount commitment for now:

  • K' = K + aT
  • I' = hash_to_point(K) + bU
  • B = bV

The FCMP would only perform three standard unblinds (the third being C', ignored here), and show consistency between the bU term in I' with B. Instead of performing a DLEq proof after, we'd provide a proof of the following.

For matrix:

[
  [G,  T,  0], // Open K'
  [0,  0,  V], // Open B
  [U,  0,  0], // Create xU as a hint, Z, to prove for bxU without revealing bU
  [I', 0, -Z]  // x (H(k) + bU) + -bxU = x H(k)
]

The multi-scalar-multiplication by the consistent row of scalars:

[x, a, b]

The output is:

[
  K',
  B,
  Z,
  I
]

where Z is part of the proof data (formed as x U) and I is the key image.

This proves linkability, and again performs spend-authorization so long as the message is part of the proof's transcript.

Integrity

Obviously, implementation must be done correctly, and the proving system (GBPs) must be proven secure. Forging a member requires breaking the Merkle tree OR r, r**-1 consistency (for the original proposal) or bU, bV consistency (for the current proposal). The original proposal also assumes the trivial r = 0 case is banned, which it already should be by the rule against key images of identity (though we'd still explicitly prevent it).

Privacy

The privacy of the originally proposed scheme relies on it being infeasible to link K', H' to K, H(K). The most trivial way to solve this is by checking if DH(K, H(K)) == DH(K', H'), which is presumably reducible to the Decisional Diffie-Hellman. The Decisional Diffie-Hellman game asks if DH(A, B) == C. One who can check if two Diffie-Hellmans would be equivalent can check if a single Diffie-Hellman is equivalent via DH(A, B) == DH(C, G).

The second scheme's privacy is much more clear-cut. Given the commitment bV, one must calculate bU, which is the Computational Diffie-Hellman (with the caveat Monero key images are currently considered private under the weaker Decisional Diffie-Hellman problem, so this being stronger is irrelevant).

Features

Not only would this introduce FCMPs to Monero, it would also enable transaction chaining and post-proved membership if we don't bind the membership proof into the secondary proof's (whether DLEq or GSP) transcript.

Relation to Seraphis

Seraphis offers more efficient FCMPs, should provide forward privacy, and moves from hacked-together formal properties inherent to RingCT to formalized-from-the-start properties.

Unfortunately, Seraphis has been estimated as 2 to 5 years out by UkoeHB, with inclusion of FCMPs making it closer to 5 years. I disagree with that estimate, yet jberman estimated 3 years themselves. The goal of this is to solve the privacy issues solvable under RingCT, buying time to do better regarding design, performance/scalability, and the unsolvable issues (forward privacy). This may also be of indepedent merit.

Performance

I prior estimated Full-Chain Membership Proofs to be 35ms in a batch of 10. This was evaluated over the "pasta curves" with Bulletproofs+. The arithmetic-circuit argument for Bulletproofs, which we are currently using (or rather, Generalized Bulletproofs, a derivative of), should be twice as fast due to the relation to the underlying proof. This would mean ~18ms within a batch of 10.

Generalized Bulletproofs also notably reduces the amount of gates we use in circuit. Prior, we spent several gates entering the divisor into the circuit (a bunch of numbers which had to be entered somehow). Now, with Generalized Bulletproofs, we can simply provide an additional Pedersen Vector Commitment (32 bytes) to enter the divisor. This slashes the size of the circuit.

By adding SA+L, we'll have the additional performance costs within the first layer of:

  1. Proving for a tri-set membership.
  2. Doing multiple unblinds.
  3. Proving consistency of bH, bG or r, r**-1.

I'm expecting this makes the first layer ~3x more expensive.

So from for 1+1+1+1 at 35ms, to 1+1+1+1 at 18ms, to 3+1+1+1 (~31ms). That's ignoring the reduction in gates offered by GBPs.

Steps and Timeline

There are several different tracks along which we can move, and we should move along in parallel.

Implementation

  1. Implement GBPs.

This has already been done and solely needs optimizations performed.

  1. Implement an amenable framework for building arithmetic circuits.

This has technically been done, yet needs some love.

  1. Implement a library for elliptic curve divisors.

Technically done, yet with an edge case that should be fixed.

  1. Implement the various gadgets (set membership, index within set, PoK of DLog, proof of DLog).

This was done as needed for FCMPs, not as needed for FCMP+SA+L which has some distinctions (multi-set membership, which is done via getting the index of each item within each set and checking equality, and proof of DLog).

  1. Implement the FCMP+SA+L circuit.

This was done for FCMPs yet not FCMP+SA+L.

  1. Implement the secondary proof.

This isn't too notable, with a polished implementation hopefully being within a few days of work for the second case. The first case is prior implemented.

  1. Implement the necessary towering curves.

This is trivial to do over a generic integer library, yet performance effectively requires a dedicated implementation be done.

  1. Integrate into Monero.

Formal Security

  1. Formally prove the soundness, completeness, and zero-knowledge properties of GBPs.

We can do this now. Diego claimed a deliverable, or lack thereof if a failure occurs, could happen in one month.

  1. Have the divisor-technique reviewed, having been prior published by Eagen.

This is possible now.

  1. Formally prove the correctness of the "gadgets", as logical.

This is possible with just a formal description of the circuit, and the gadgets contained within. The Curve Trees paper did specify their gadgets, yet we have a notably different construction due to the usage of divisors and the extensions proposed here.

  1. Formally prove the correctness of the secondary proof.

The first proposal uses a DLEq which has existed since Chaum-Pedersen's 1993 work. The second uses a proof I frequently see referred to as a "Generalized Schnorr Protocol", despite my personal dislike for the ascription. I wouldn't be surprised if it was already proven.

Practical Secrity

  1. Audit the implementation of GBPs.

This is possible once the further optimizations are done, and the formal proofs exist.

  1. Audit the circuit framework.

This is possible once the framework is cleaned to a point we're happy with.

  1. Audit the EC divisor library.

This is possible as soon as the standing edge case is resolved.

  1. Audit the implementation of the gadgets.

This should be trivial and possible as soon as the gadgets are formally proven.

  1. Audit the implementations of the towering curves.

This is possible as soon as the curves are implemented. If we implement them with a constant-time integer library, we'd solely audit that.

  1. Audit the circuit.

  2. Audit the secondary proof.

This could be audited as soon as it's implemented, which again, should be trivial to implement.

  1. Audit the integration.

Contribution to Seraphis

Creation of FCMP+SA+L effectively does all the work necessary for FCMPs with Seraphis with regards to FCMPs themselves. It'd provide the proving system, gadgets, review/audits, and most of the composition work. The only additions would be the curve cycle libraries (if we switched curves), the cross-group DLEq we'd move commitments with, and the new integration (into Seraphis instead of into RingCT).

Steps Forward

We'd have to agree on which proposal to move forward with, if we were to move forward.

While the first proposal was the one originally posited, it remains here solely for historic reasons. It appears to perform one less PoK of DLog, making it faster, yet it requires using two private points as generators within Proof of (Knowledge of) DLog statements. Such proofs are much more performant when the generator being used is public.

The second proposal also has the benefit of a much stronger argument for privacy, and uses additive randomness (instead of multiplicative).

We'd then need various implementation work done. I can personally implement the curve libraries using crypto-bigint, yet I cannot implement tailored implementations as likely needed to maintain the performance target. Someone else would need to step up.

I believe I can have all the implementation done, besides integration into Monero, within a few months. The actual integration into Monero would be:

  1. Having Monero build the tree.
  2. Having wallet2 call prove via the Rust library.
  3. Adding RPC routes to get the current tree root and the path of a member of the tree (where the DSA would be used to request multiple paths as to not reveal the output being spent).
  4. Having monerod call verify via the Rust library.

There'd be no wallet/address/privacy set migration. All of the additional curves would be contained within the proof.

I cannot estimate/guarantee the timeline for the formal analysis side of things. I'd hope by clearly defining components, the rate at which review occurs is not bottlenecked by the rate at which development occurs. That doesn't change we'd have to efficiently manage the timeline and definition of components. I'd hope for, in the best case, development + review/audits of the building blocks in 3-6 months, targeting deployment in ~12 months. Beyond myself, the only development necessary is of dedicated curve implementations (towering curves having already been found by tevador) and the integration into Monero (which jberman already started for FCMPs into Seraphis).

@Gingeropolous
Copy link

this route of implementing a RingCT-compatible version of FCMP if it turns out to be feasible.

Do we know when the "if" in this statement can get resolved? From my perspective, this would introduce a key deciding factor. If its known that FCMP-RCT is sound within a matter of weeks, and the deployment (including audits) can occur <1 year, to me it makes sense to push this forward.

From where I sit, seraphis was first introduced primarily to permit an increase in ringsize, with the hopeful development of some non-ring based system that could be dropped in. Concurrently, the seraphis upgrade would usher in a new transaction system that is ultimately better than the one we have from cryptonote.

It turns out that non-ring based system was discovered prior to seraphis implementation. I think this was unexpected.

I know this is a technical thread, but i'll drop some analogies. I feel like monero is currently a plug in hybrid car. It's using the best of the technology that was available at the time to achieve the goals of the manufacturers, which is lower emissions (or something. don't take the analogy all the way). The ideal car in this analogy would be a fully electric car built from the ground up. This is seraphis. However, in this hypothetical scenario, there's a crisis and there's no more gas and people still need cars. Thus, we can produce an electric car thats usable now by dropping in an electric motor, because an engineer just figured out how to do it. Thats FCMP-RCT.

In real life, car manufacturers that attempted this drop in weren't successful, but I'd argue that's because there wasn't a gas supply crisis. Whereas car manufacturers that created pure electric cars from the ground up have done... relatively well.

Which I think brings us to the pertinent question. Is there a crisis that would warrant postponing the ideal product in lieu of something that could work now?

I would argue yes. It's been known that rings are the weakest link in monero's function. before the spam attacks it was known. It was known well before then.

@tevador
Copy link

tevador commented Apr 18, 2024

Due to the ongoing debate if FCMPs should be implemented before Seraphis, I reread the Seraphis paper. The main premise of the new key image format that comes with Seraphis was to enable outgoing view keys (OVKs) and to separate the membership proof from the ownership proof.

Since both of these features can be achieved with RingCT-compatible key images, is there a good reason to go forward with the imcompatible key image format?

Most, if not all, new features of Seraphis could be implemented with a RingCT-compatible key format:

  1. Membership/ownership proof separation can be done as described in this gist.
  2. Jamtis address format also works (with minor changes to the key derivation formulas), including all wallet tiers, encrypted address tags and Janus attack protections.
  3. Transaction structure changes are also applicable as they are independent of the key format.

If Seraphis was adapted to use the key formats mentioned here, the pre-Seraphis FCMP implementation would 100% roll over to Seraphis, which would address the concerns some people might have with this endeavor. Additional advantages would be:

  1. We could optionally support both old and new address formats at the same time if that was something deemed useful.
  2. There would be no need for migration-type transactions that spend old RingCT e-notes and create new Seraphis e-notes. All old e-notes would be part of the anonymity set of all future transactions, which would arguably be better for privacy.
  3. We would only have a single pool of spent key images instead of two.

@kayabaNerve
Copy link
Author

Do we know when the "if" in this statement can get resolved?

If you ignore performance enough, I'd say it's resolved. I initially posited 35ms at Monerokon, which was actually a 100ms variant that did several additional proofs GBPs supersede. If GBPs fall through, we'd fall back to that 100ms metric (with a new item to formally prove). That ignores the several performance benefits we're looking at. BPs, instead of BP+s, inherently should be ~2x faster (35ms -> 17.5ms), and I've come up with more efficient ways of architecting the circuit internally.

I also have high faith in the ability to prove GBPs. I'm expecting a soundness proof for the divisors to be more difficult, yet that'd 'only' double our cost for performing the elliptic curve cryptography in circuit (back to 35ms, if we use BPs and not BP+s).

Ideally, we should have an upper bound on time in around a month, when the current CS work completes.

It turns out that non-ring based system was discovered prior to seraphis implementation.

To be fair, FCMPs with Seraphis was discovered before FCMPs without Seraphis.

postponing the ideal product

necessitates discussion on how far from ideal this is.


Most, if not all,

I believe all features of Seraphis are accomplished when the above idea is fully executed (the F-S variant and supporting wallet code).

If Seraphis was adapted to use the key formats mentioned here

If we are to keep the Seraphis code (which I would support), I'd probably endorse deploying this over the existing codebase (with extensions), then deploying the Seraphis code as a new wallet library + new transaction version. I'm obligated to note that makes Seraphis more of a codebase than a protocol, as it is now, yet that also isn't necessarily an issue.

We could optionally support both old and new address formats at the same time

This does cause metadata leakage (wallet age revealed), yet doesn't invalidate old addresses which is great.


Distinctly, why wasn't non-RCT outputs migrated into RCT? Can't we set the randomness in a Pedersen Commitment to 0 and make RCT outs out of non-RCT outs? I ask as that'd allow making all historical TX validation code technically still active (due to CN-RCT migration TXs still being allowed, as they should be) solely historical (as new spends of original CN outputs would be treated as RCT outputs which would be a subset of the new global pool). When thinking about removing the Seraphis migration, I had that thought about the old migration.

@tevador
Copy link

tevador commented Apr 19, 2024

Distinctly, why wasn't non-RCT outputs migrated into RCT?

Currently, outputs are grouped by their value for the purposes of decoy selection. Ring-CT outputs are counted as having a value of 0. Old denominated outputs and unmixable outputs have transparent amounts, so presumably the real output (and the amount spent) would be leaked with a high probability if such an old output appeared in the ring.

I think that with FCMPs, we could easily treat all outputs from the genesis block as a single group by creating amount commitments for them with a known mask (AFAIK coinbase outputs use a mask value of 1, i.e. C = v*H + G).

@kayabaNerve
Copy link
Author

kayabaNerve commented Apr 19, 2024

I'd be quite interested if it really wasn't done historically due to side-channel attack commentary, I'll note this as yet another idea which can take FCMPs to their 'ultimate' form.

cc @jeffro256 on tech debt (it doesn't remove the verification code, yet would make it explicitly historical) and also @jeffro256 for their own comment on migrating RCT outputs to Seraphis (which I wouldn't see possible without an overtly complicated circuit/having every Seraphis TX also have a RCT input/etc).

@Gingeropolous
Copy link

necessitates discussion on how far from ideal this is.

from my understanding, a large benefit of seraphis was the modularization of all of the components. (e.g., FCMP could just be dropped into the membership proof component). So I guess one question regarding this FCMP-RCT effort is how upgradable is it? e.g., in 5 years, if something faster and smaller comes out for component x, y, or z, how easy is it to slip into cryptonote-fcmp-rct, vs how easy would it be to slip into seraphis-fcmp?

Or maybe "future-proof" would be the right term.

If Seraphis was adapted to use the key formats mentioned here, the pre-Seraphis FCMP implementation would 100% roll over to Seraphis, which would address the concerns some people might have with this endeavor.

This seems like an ideal option imo.

@kayabaNerve
Copy link
Author

kayabaNerve commented Apr 19, 2024

from my understanding, a large benefit of seraphis was the modularization of all of the components.

This also modularizes the membership and spend-auth/linkability proof. The statements are distinct, as needed to safely satisfy the RingCT design preserved, yet it is modularized. Membership, spend-auth/linkability, balance (currently sum of commitments is zero), and output range could also be independently upgraded.

This seems like an ideal option imo.

I believe that proposal is effectively to move forward with this now, then scrap Seraphis the cryptographic protocol. Seraphis the codebase would then be adapted to satisfy the system created here, argued only fundamentally distinct in the linking tag structure. That is equivalent to the general idea of FCMP+SA+L now + JAMTIS later + new TX formats which are saner later (incrementally upgrading without a hard migration), except the new TX formats would be the already written Seraphis formats re-intented. I just want to be clear about my understanding there.

@kayabaNerve
Copy link
Author

@kayabaNerve
Copy link
Author

https://github.com/kayabaNerve/fcmp-/blob/develop/fcmp%2B%2B.pdf is my work in progress formalization of this, with background. The circuit is detailed yet I wouldn't claim specified yet. It also covers how integration would be handled.

cc @tevador especially (who I don't believe I can reach on Matrix 😅 ), when you have the time.

@kayabaNerve
Copy link
Author

https://github.com/kayabaNerve/fcmp-ringct/blob/develop/fcmp%2B%2B.pdf should now be at the feedback phase, where the only expected changes are for clarity/detail where not enough detail was provided (such as on the integration side of things). Barring evolution with research (see DLog PoK which I played safe here, yet do posit a more efficient candidate for), only minor tweaks should occur.

@tevador
Copy link

tevador commented Apr 30, 2024

I reviewed the PDF, version d946868.

  • 4.3 First bullet point should probably say "Hadamard product" instead of "vector multiplication" to avoid ambiguity.
  • 4.3.8 The section is missing a sentence explaining that permissible points are those where U(y) is a square and U(-y) is non-square. I'd also add that 1/4 of points are permissible.
  • 5.4, 6.8.1 You could add a reference to Jamtis-RCT, which is a compatible addressing scheme supporting outgoing view keys.
  • 6.8.2 Outgoing view keys are not a requirement for forward secrecy. It's sufficient to add a T-component to the one-time address when constructing an e-note.

Regarding section 6.4, where you propose to require output keys and commitments to be permissible, I've been thinking about it and it should be possible by adding two encrypted 1-byte fields to every e-note, which will contain the number of times the constructed value needs to be incremented to be equal to the published key, i.e. K_o = K_o' + i_1 T and C = C' + i_2 G, where K_o' and C' are the values derived from the sender-recipient shared secret. The integers i_1 and i_2 are in the range of 0-255, which gives a negligible chance of failure to find a permissible point (less than 1.4x10-32).

@kayabaNerve
Copy link
Author

  1. ACK.
  2. I don't spell out the square commentary, yet did comment permissible(Y) && !pemissible(Y). I'll elaborate.
  3. ACK as a cite under future expansion :) Thanks for putting that together. I have yet to review it, unfortunately.
  4. Agreed they're not technically required, yet the addition of the T term is what simultaneously creates the outgoing view key and offers forward secrecy. I'll rewrite it so its clearer than outgoing view keys, which is a bunch of wallet protocol changes, isn't implied as required.
  5. I don't require outputs be permissible. I require they be made permissible before being added to the tree. It's a few additions by the node (1.5 on average?), so it shouldn't notably impact performance, and prevents requiring changes to the scanning process. We can also publish such values, yet I believe post-incrementing is the simpler solution. The only argument against the node doing it is if one posits a pit of non-permissible values (say, thousands of them in a row) which forms an effective DoS.

Distinctly, even without an extra item, could we not force wallets to resample randomness until permissibility happens to be achieved? That'd not increase bandwidth (negligible) nor change the scanning protocol (which is a bit annoying, and likely requires a table or 1-byte double-and-add to be properly performant?).

@tevador
Copy link

tevador commented Apr 30, 2024

the addition of the T term is what simultaneously creates the outgoing view key and offers forward secrecy

Not necessarily. If the T term is derived from the shared secret, it's not safe to use the G term as the OVK. However, the resulting ouput key is forward-secret nonetheless.

It's a few additions by the node (1.5 on average?)

Actually, it's on average 4 point additions and 8 Legendre symbol calculations.

The only argument against the node doing it is if one posits a pit of non-permissible values (say, thousands of them in a row) which forms an effective DoS.

Not thousands, but it's quite easy to find keys that will take 80+ point additions, so it can be considered to be a DoS vector. We can go with it for the initial FCMP rollout, but I'd like to add permissible keys and commitments to the Jamtis-RCT specs. That would reduce the verifier workload to just 2 Legendre symbol calculations per key.

could we not force wallets to resample randomness until permissibility happens to be achieved

The chance that a random key is permissible is 1/4. Two-output transactions only have one public key and a total of 4 keys derived from it. That means you would need, on average, 256 attempts (=44) to make a valid 2-output transaction. Impractical.

likely requires a table or 1-byte double-and-add to be properly performant

You can have a precomputed table of 256 multiples of T and G, which would only take 32 KB. Then you'd only need one table lookup and one point addition to recompute the permissible key. In any case, thanks to view tags, this would only happen for a small fraction of e-notes, so the precomputed tables are probably not even worth it.

@kayabaNerve
Copy link
Author

kayabaNerve commented Apr 30, 2024

Fair. I always presumed deployment via the public spend key as that's indistinguishable, with no changes to scanning/sending. Apologies for being so imprecise.

If 1/4 points are permissible, I would've assumed we'd reach a permissible point in 1.5 additions (as the current one can be checked without any additions). The average distance to an permissible point being 4, not 2, when they're presumably even spaced by 4 sounds off. I haven't wrote the code to literally show however. Agreed there's overhead for the check itself.

80+ point additions definitely isn't great. It's still less work than a scalar multiplication though. In that case, I'd go back to recommending resampling randomness yet that only can be discussed when each output has an independent randomness which isn't currently the case. If we're shifting send/scan changes to a future fork, would not that future fork presumably use independent randomness per outputs? I'd have to re-read on how Seraphis + JAMTIS proposed handling change outputs. If the extra byte is the best solution, along with a table or one-byte double-and-add in the scan code (which I never meant to suggest impractical, solely an annoyance like all three of these paths have), ACK.

@tevador
Copy link

tevador commented Apr 30, 2024

Re: torsion (6.2.1 and 6.4): What can happen if we allow keys with a torsion component to be added to the tree? I can't think of any exploit as long as the SA+L proof is handled correctly.

@kayabaNerve
Copy link
Author

kayabaNerve commented Apr 30, 2024

I can't think of any. I just no longer care to ask the question and argue security despite the question. It's not worth the mental effort and the potentially surprising results. I'm still frustrated I still have to consider some aspects re: torsion which are fundamentally not removable. This just really isn't the place to fuck around, find out.

The requirement of torsion clearing historical points is adding a non-negligible amount of work however, which I will concede. To go in depth, the divisor-based DLog proof shouldn't pass for torsioned elements. No sum of elements without torsion will add with a torsioned element to the identity element, which is what the divisor is supposed to check. That means the worst you get is torsion in, torsion out, on the output key and commitment.

We already argue spending torsioned input commitments is fine. You can torsion them on-chain, and we verify them as prime order elements (emphasis on as prime order elements, not that they are prime order elements), but at time of spend they'll be torsioned. For a torsioned binomial output key x'G + aT, where ' denotes torsion, we'd have to argue it's fine.

I'm not sure that is? I proposed a Bulletproof+ for the F-S SA+L. While we can argue if it's actually a BP+ or not (as it's solely the one-round protocol), that doesn't change we currently ensure all elements passed to a BP+ are without torsion via torsion clears. This would not have its torsion cleared and would be explicitly torsioned. We'd have to inv eight, eight (currently proposed for outputs, now done for inputs) or so increase the tolerances of our policy re: BP+.

I'd rather be done with it, and if we're concerned about performance (just over one additional scalar multiplication), implement Pornin's halving method which I believe is just 30% the cost.

@tevador
Copy link

tevador commented Apr 30, 2024

If 1/4 points are permissible, I would've assumed we'd reach a permissible point in 1.5 additions

We have:

Chance Additions Legendre symbols
1/4 0 2
(3/4)*(1/4) 1 3.5
(3/4)2*(1/4) 2 5
... ... ...

So the average number of additions will be:

$$A = \sum_{i=0}^{\infty} i p^i (1-p) = p (1-p) \sum_{i=0}^{\infty} i p^{i-1} = \frac{p (1-p)}{(1-p)^2} = \frac{p}{1-p}$$

With p=0.75, we get A=3. Similarly, for the number of Legendre symbols, we get an average of L=6.5.

I'm estimating that a Legendre symbol calculation will take time comparable to at least 50 point additions. So the average cost is ~300, dominated by Legendre symbol calculations. For "1 in 10 million" crafted DoS points, we get A=51 and L=78.5 and a total cost of ~4000 point additions.

Edit: fixed the number of Legendre symbol calculations. The average is 1.5 per addition plus 2 for the final successful point.

@kayabaNerve
Copy link
Author

kayabaNerve commented Apr 30, 2024 via email

@kayabaNerve
Copy link
Author

Pushed edits in response, though not yet noting the above re: permissibility as it should likely be incorporated into the JAMTIS proposal.

Also, please feel free to further back on my anti-torsion stance. I do get that change is non-negligible when optimizing this. My justification to myself was the fact I proposed broadcasting keys and commitments as inverse eight, decreasing current node resource use, making this a one time payment for our history. I'm obligated to concede however that wallets would have to perform two triple doublings, which should be negligible yet is an additional cost we'd pay into the future (not solely historically).

@tevador
Copy link

tevador commented May 1, 2024

broadcasting keys and commitments as inverse eight, decreasing current node resource use [...]. I'm obligated to concede however that wallets would have to perform two triple doublings, which should be negligible yet is an additional cost we'd pay into the future

Napkin math tells me that it's better to use the Ristretto format rather than pre-multiplication by (1/8) and post-multiplication by 8.

Ignoring the permissibility check and Ed25519 -> Wei25519 conversion, to get the canonical touple (K, I, C) to insert into the tree, you need the following operations:

A. With pre- and post- multiplication

A.1 Transaction builder

  1. K' = ge_scalarmult(inv8, K)
  2. Compress K'
  3. C' = ge_scalarmult(inv8, C)
  4. Compress C'

A.2 Transaction verifier

  1. Decompress K'
  2. 3x ge_dbl
  3. fe_invert to convert to affine
  4. I = hash_to_ec(K)
  5. Decompress C'
  6. 3x ge_dbl
  7. fe_invert to convert to affine

B. With Ristretto

B.1 Transaction builder

  1. Compress K
  2. Compress C

B.2 Transaction verifier

  1. Decompress K
  2. I = hash_to_ec(K)
  3. Decompress C

Even if Ristretto point compression and decompression is slightly slower, it's more than compensated for by avoiding the extra group operations for both the transaction builder and the verifier.

Additionally, for commitments, the post-multiplication by 8 is more risky because it we are not careful, we might allow users to spend 8x the amount in the output.

The requirement of torsion clearing historical points is adding a non-negligible amount of work however, which I will concede.

As for historical outputs, some transactions have been confirmed to have torsioned keys. I see three possible solutions:

  1. Ignore torsion if we can prove it's safe.
  2. What you are proposing, i.e. clearing the torsion by multiplying by 1/8 and then 8 for all keys.
  3. Using the point halving method to detect torsioned keys and clear the cofactor only for those keys. This should be about 2x faster.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 1, 2024

It'd be better to use Ristretto despite there being no explicit conversion provided, as it'd be trivial to define, yet I'm not happy with the proposed scope at this time to move Monero to Ristretto despite calling for it a couple years ago. If others are willing to do the time and effort and it's not too contentious, than yes, I'd support it. I also get how proposing it for a future hard fork is silly though, as it'd re-incur building the tree (which is notable). That isn't to say it wouldn't be worthwhile in a future hard fork, solely bad to put off to one.

Additionally, for commitments, the post-multiplication by 8 is more risky because it we are not careful, we might allow users to spend 8x the amount in the output.

This seems like the thing we'd catch with basic testing (have every single point added to the tree go through a single fn, so either all commitments or none are so mutated) and also already a concern present in Monero? If the BP+ forgets to multiply by eight, the created commitment may be a value that's only in range when divided by eight? (though that variance isn't enough to trigger an overflow regardless)

Using the point halving method to detect torsioned keys and clear the cofactor only for those keys. This should be about 2x faster.

Would be my advocacy at this time, though I'll note halving doesn't detect torsioned keys. 3x halving, then 3x doubling to a different point does. We can't save the second half of those operations for most keys.


Distinctly, re: Wei25519, I implemented Wei25519 from the IETF draft for the divisor library. The map from Montgomery is trivial (one addition of a constant), where Wei25519.2 and Wei25519.-3 are non-trivial (multiplications by constants, plural). The only potential benefit I could see if we optimized the hell out of divisor construction, as the curve formula is a modulus and a coefficient of 2 may theoretically clean that math up? But that'd force the node to do the more expensive map (and it only does the map. it does not do any arithmetic for Wei25519 points, nor does it multiply/divide/mod any polynomials). Feel fee to confirm I'm not missing any considerations :) Really appreciate the feedback and collaboration here.

EDIT: We do two hash to curves and one Weierstrass addition per DLog claim on the first layer. I'll need to review hash to curve candidates and performance implications for choice of A before I can so write off the remaining question of my initial choice.

@tevador
Copy link

tevador commented May 1, 2024

It'd be better to use Ristretto despite there being no explicit conversion provided, as it'd be trivial to define, yet I'm not happy with the proposed scope at this time to move Monero to Ristretto despite calling for it a couple years ago. If others are willing to do the time and effort and it's not too contentious, than yes, I'd support it. I also get how proposing it for a future hard fork is silly though, as it'd re-incur building the tree (which is notable). That isn't to say it wouldn't be worthwhile in a future hard fork, solely bad to put off to one.

I think it's worth it to implement in parallel with FCMPs if we can find someone willing to do so. We should at least analyze the amount of work needed for this. Ristretto is basically only an alternative serialization/deserialization method and produces points in extended coordinates (struct ge_p3 as used by Monero), so it should not be too much work.

I implemented Wei25519 from the IETF draft for the divisor library. The map from Montgomery is trivial (one addition of a constant), where Wei25519.2 and Wei25519.-3 are non-trivial

I started implementing a concept for permissible points and the canonical Wei25519 representation is the most suitable one because the y coordinate in Wei25519 format is equal to the v coordinate in Montgomery format, which makes it easier to check for permissibility.

@tevador
Copy link

tevador commented May 1, 2024

Here is my reference implementation for point permissibility: https://gist.github.com/tevador/f9e6d4f262aba1c3de1eb7a20f40347d

I'm using the criterion that 1+y is square and 1-y is non-square, where y is the point coordinate in Wei25519 form. Point addition is still done in Ed25519 form, we just convert to Curve25519 coordinates for the permissibility test.

G + 3 T is a permissible point (the example used in the script).

The script shows that we actually need 3 Legendre symbol calculations to check permissbility due to the use of projective coordinates (affine coordinates are not used in practice).

@kayabaNerve
Copy link
Author

so it should not be too much work.

https://ristretto.group is clear enough I was able to implement it 2y ago, when I really did not know what I was doing, into an old project of mine: https://github.com/MerosCrypto/Meros/blob/master/e2e/Libs/Ristretto/Ristretto.py#L18-L111

It's less than 100 lines of Python for encode/decode (though that code isn't well-written and is ugly, please refer to the Ristretto specification).


I agree the y coordinate directly mapping is pleasant. I'm not convinced the saved cost of the map (multiplication by one constant) is worth the cost of the (potentially) worse hash to points/incomplete addition functions (which could incur multiple multiplications if not exponentations). I'm refraining from forming a definite comment until I poke around on the matter more.


Clarifying, is 1+y simply meant as a demo where a=1, b=1 for simplicity, or are you proposing using a=1, b=1? I'm not convinced the uniform distribution 'achieved' is of too much benefit given the a/b are constant. As one can brute force ys, they can brute force ay + bs. We may want to legitimately discuss removing the 'hash' element.

@tevador
Copy link

tevador commented May 1, 2024

I agree the y coordinate directly mapping is pleasant. I'm not convinced the saved cost of the map (multiplication by one constant) is worth [...]

You are correct. After publishing the permissibility script, I realized that there is no performance difference between Wei25519 and Wei25519.2. They simply differ in one constant that needs to be multiplied in both cases.

worse hash to points/incomplete addition functions

Incomplete addition doesn't use the curve constants. Only incomplete doubling uses a, but AFAIK we will not use doubling in the circuit. In case we need doublings in the circuit, it's better to use Wei25519.2. On-curve check might also be a little bit faster with Wei25519.2 because you can replace the multiplication by a with an addition, but I'm not sure if that makes any difference in the circuit.

Clarifying, is 1+y simply meant as a demo where a=1, b=1 for simplicity, or are you proposing using a=1, b=1? I'm not convinced the uniform distribution 'achieved' is of too much benefit given the a/b are constant. As one can brute force ys, they can brute force ay + bs. We may want to legitimately discuss removing the 'hash' element.

I'm proposing 1+y as the simplest hash that works. Not using a hash is out of question because our field has p = 1 mod 4, so y and -y always have the same quadratic residuosity (i.e. there would be no permissible points at all).

@kayabaNerve
Copy link
Author

kayabaNerve commented May 1, 2024 via email

@tevador
Copy link

tevador commented May 1, 2024

The implemented incomplete addition doesn't use the constant a but some
iadd functions only work for specific values of a (and may be more
performant).

You can check Explicit Formulas Database. The formulas for addition are the same for all values of a with the curve equation y^2 = x^3 + a*x + b. There are specific doubling formulas for some coordinate systems that depend on the value of a.

However, I'm fine with using Wei25519.2. I don't recommend Wei25519.-3.

It's a performance/'security' benefit tradeoff.

There is no security tradeoff here. You might be confused by the term "hash" they used in the Curve Trees paper. "Surjective map" would be a better term. We simply want a map that produces the expected 1/4 of permissible points. The simplest surjective map that works is the one I'm proposing. The main purpose of all this is to make sure only one of the pair [P, -P] is permissible for adding to the tree.

Note that if we had p = 3 mod 4 (like Bitcoin, for example), we could use simply y and -y and get 1/2 of permissible points.

@kayabaNerve
Copy link
Author

You are right re: iadd, sorry. I missed only doubling had distinct formulas.

I do understand it's not a proper hash. The security argument I was attempting to reference would be if scaling by a uniform element achieves a distribution closer to uniform. I don't inherently see why such a scaling would benefit security, hence security being in quotes, but I also don't consider myself capable of commenting on the statistical distribution and the impact of scaling vs solely including a fixed offset (b). I'll also note a term better than security may be "integrity" or "quality". If you also don't see a benefit for the distribution, its solely a performance detriment we can skip by setting a=1.

@tevador
Copy link

tevador commented May 1, 2024

AFAIK there is no relationship between the residuosity of 1+y and 1-y. This test is giving a good distribution. For examples, G + x T is permissible for the following values of x:

3
5
8
10
12
15
16
21
22
31
35
39
46
47
48
50
51
60
71
77
81
89
90
91

Overall, out of the first 100 000 points, 25 228 are permissible, roughly 1/4 as expected.

In the circuit, you will only prove the first part, i.e. w^2 = 1 + y. This is sufficient because for any point P in the tree, -P will not be permissible, so this check prevents the use of negated points for double spending.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 1, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment