Skip to content

Instantly share code, notes, and snippets.

@kayabaNerve
Last active May 13, 2024 19:24
Show Gist options
  • Save kayabaNerve/0e1f7719e5797c826b87249f21ab6f86 to your computer and use it in GitHub Desktop.
Save kayabaNerve/0e1f7719e5797c826b87249f21ab6f86 to your computer and use it in GitHub Desktop.
FCMP+SA+L

Full-Chain Membership Proofs + Spend Authorization + Linkability

This proposes an extension to FCMPs to make them a drop-in replacement for the exsting CLSAG. In order to be such a replacement, the proof must handle membership (inherent to FCMPs), spend authorization, and linkability.

Terminology

  • FCMP: Full Chain Membership Proof
  • GBP: Generalized Bulletproof, a modified Bulletproof supporting Pedersen Vector Commitments in its arithmetic circuit proof. Security proofs are currently being worked on by Cypher Stack.
  • GSP: Generalized Schnorr Protocol, a proven protocol for 2009 for proving: For a m*n matrix
    [
        [A, B, C],
        [D, E, F]
    ]`
    
    And a n-length row
    [x, y, z]
    
    The m-length output column:
    [
      T,
      U,
    ]
    
    is
    [
      xA + yB + zC,
      xD + yE + zF,
    ]
    
    This is useful to prove knowledge of the opening of a vector commitment and consistency between vector commitments.
  • BP+: Bulletproof+. A weighted-inner-product proof currently in use in Monero.
  • F-S: Forward secret. An adversary who can solve for the elliptic curve discrete log problem cannot find the spent output.

Recap

FCMPs, when instantiated with GBPs, are a O(n) proof of an O(log m) program (where m is the amount of members in the set). The program is a Merkle tree proof, where the same layer program is executed multiple times. The layer program is summarizable as follows:

  1. Unblind the Pedersen Commitment input.
  2. Verify the unblinded variable is present in the layer being hashed.
  3. Hash the layer, outputting a new Pedersen Vector Commitment (blinding the hash for the layer).

For the first layer, under Seraphis, the input would be the re-randomized squashed-enote (effectively a Pedersen Vector Commitment). The "unblinding" would remove the re-randomization and continue.

The final layer would use randomness = 0 such that the output Pedersen Vector Commitment is the tree root (or the protocol would take the difference and require a discrete log PoK, forcing the user ).

Difficulty

The naive solution would be to open K, the output key, inside the proof, then prove for the key-image inside the proof. This would require hardware wallets perform the FCMP proof, which is extremely undesirable/entirely unacceptable. Accordingly, the goal is for the membership proof to perform re-randomization without knowledge of x, letting a second proof prove for spend authorization and the key image.

Solution

Instead of building a Merkle tree of squashed enotes, we build a Merkle tree with elements (K, hash_to_point(K), C) where C is the amount commitment. We output re-randomized versions of all three and then prove the key image correctly formed.

Originally Proposed Modifications for Deployment During RingCT

EDIT: These should not be moved forward with. The alternative modifications in the next section achieve much better properties and are the basis of the forward-secret variant posted below. This is only present as it was discussed below and therefore would be confusing to remove entirely.

For the first layer, we don't take in the re-randomized squashed-enote (as this is pre-Seraphis), yet (r K, r**-1 H(K), C + a G). We denote this (K', H', C').

The first layer does not perform the traditional unblinding (addition of a G for a known a) for K', H'. Instead, it multiplies K by r (checking it's equal to K') and H' by r (checking it's equal to H(K)). We then perform the traditional unblinding of C' and check membership of (K, H(K), C).

This does not prove the spend is authorized nor that any claimed linking tag is correctly formed. It solely shows that the K', H' are derived as expected from a member of the tree.

We then extend this with a Discrete Log Equality proof for K', I over G, H'. The discrete logarithm of K' is r x, which we notate x'. x' H' expands to r x r**-1 H(K), with the r terms cancelling out for x H(K), the expected key image. This introduces linkability into the proof. By having the DLEq proof's transcript be over the message signed, spend authorization as well.

Proposed Modifications

With newly introduced generators T, U, V, and ignoring the amount commitment for now:

  • K' = K + aT
  • I' = hash_to_point(K) + bU
  • B = bV

The FCMP would only perform three standard unblinds (the third being C', ignored here), and show consistency between the bU term in I' with B. Instead of performing a DLEq proof after, we'd provide a proof of the following.

For matrix:

[
  [G,  T,  0], // Open K'
  [0,  0,  V], // Open B
  [U,  0,  0], // Create xU as a hint, Z, to prove for bxU without revealing bU
  [I', 0, -Z]  // x (H(k) + bU) + -bxU = x H(k)
]

The multi-scalar-multiplication by the consistent row of scalars:

[x, a, b]

The output is:

[
  K',
  B,
  Z,
  I
]

where Z is part of the proof data (formed as x U) and I is the key image.

This proves linkability, and again performs spend-authorization so long as the message is part of the proof's transcript.

Integrity

Obviously, implementation must be done correctly, and the proving system (GBPs) must be proven secure. Forging a member requires breaking the Merkle tree OR r, r**-1 consistency (for the original proposal) or bU, bV consistency (for the current proposal). The original proposal also assumes the trivial r = 0 case is banned, which it already should be by the rule against key images of identity (though we'd still explicitly prevent it).

Privacy

The privacy of the originally proposed scheme relies on it being infeasible to link K', H' to K, H(K). The most trivial way to solve this is by checking if DH(K, H(K)) == DH(K', H'), which is presumably reducible to the Decisional Diffie-Hellman. The Decisional Diffie-Hellman game asks if DH(A, B) == C. One who can check if two Diffie-Hellmans would be equivalent can check if a single Diffie-Hellman is equivalent via DH(A, B) == DH(C, G).

The second scheme's privacy is much more clear-cut. Given the commitment bV, one must calculate bU, which is the Computational Diffie-Hellman (with the caveat Monero key images are currently considered private under the weaker Decisional Diffie-Hellman problem, so this being stronger is irrelevant).

Features

Not only would this introduce FCMPs to Monero, it would also enable transaction chaining and post-proved membership if we don't bind the membership proof into the secondary proof's (whether DLEq or GSP) transcript.

Relation to Seraphis

Seraphis offers more efficient FCMPs, should provide forward privacy, and moves from hacked-together formal properties inherent to RingCT to formalized-from-the-start properties.

Unfortunately, Seraphis has been estimated as 2 to 5 years out by UkoeHB, with inclusion of FCMPs making it closer to 5 years. I disagree with that estimate, yet jberman estimated 3 years themselves. The goal of this is to solve the privacy issues solvable under RingCT, buying time to do better regarding design, performance/scalability, and the unsolvable issues (forward privacy). This may also be of indepedent merit.

Performance

I prior estimated Full-Chain Membership Proofs to be 35ms in a batch of 10. This was evaluated over the "pasta curves" with Bulletproofs+. The arithmetic-circuit argument for Bulletproofs, which we are currently using (or rather, Generalized Bulletproofs, a derivative of), should be twice as fast due to the relation to the underlying proof. This would mean ~18ms within a batch of 10.

Generalized Bulletproofs also notably reduces the amount of gates we use in circuit. Prior, we spent several gates entering the divisor into the circuit (a bunch of numbers which had to be entered somehow). Now, with Generalized Bulletproofs, we can simply provide an additional Pedersen Vector Commitment (32 bytes) to enter the divisor. This slashes the size of the circuit.

By adding SA+L, we'll have the additional performance costs within the first layer of:

  1. Proving for a tri-set membership.
  2. Doing multiple unblinds.
  3. Proving consistency of bH, bG or r, r**-1.

I'm expecting this makes the first layer ~3x more expensive.

So from for 1+1+1+1 at 35ms, to 1+1+1+1 at 18ms, to 3+1+1+1 (~31ms). That's ignoring the reduction in gates offered by GBPs.

Steps and Timeline

There are several different tracks along which we can move, and we should move along in parallel.

Implementation

  1. Implement GBPs.

This has already been done and solely needs optimizations performed.

  1. Implement an amenable framework for building arithmetic circuits.

This has technically been done, yet needs some love.

  1. Implement a library for elliptic curve divisors.

Technically done, yet with an edge case that should be fixed.

  1. Implement the various gadgets (set membership, index within set, PoK of DLog, proof of DLog).

This was done as needed for FCMPs, not as needed for FCMP+SA+L which has some distinctions (multi-set membership, which is done via getting the index of each item within each set and checking equality, and proof of DLog).

  1. Implement the FCMP+SA+L circuit.

This was done for FCMPs yet not FCMP+SA+L.

  1. Implement the secondary proof.

This isn't too notable, with a polished implementation hopefully being within a few days of work for the second case. The first case is prior implemented.

  1. Implement the necessary towering curves.

This is trivial to do over a generic integer library, yet performance effectively requires a dedicated implementation be done.

  1. Integrate into Monero.

Formal Security

  1. Formally prove the soundness, completeness, and zero-knowledge properties of GBPs.

We can do this now. Diego claimed a deliverable, or lack thereof if a failure occurs, could happen in one month.

  1. Have the divisor-technique reviewed, having been prior published by Eagen.

This is possible now.

  1. Formally prove the correctness of the "gadgets", as logical.

This is possible with just a formal description of the circuit, and the gadgets contained within. The Curve Trees paper did specify their gadgets, yet we have a notably different construction due to the usage of divisors and the extensions proposed here.

  1. Formally prove the correctness of the secondary proof.

The first proposal uses a DLEq which has existed since Chaum-Pedersen's 1993 work. The second uses a proof I frequently see referred to as a "Generalized Schnorr Protocol", despite my personal dislike for the ascription. I wouldn't be surprised if it was already proven.

Practical Secrity

  1. Audit the implementation of GBPs.

This is possible once the further optimizations are done, and the formal proofs exist.

  1. Audit the circuit framework.

This is possible once the framework is cleaned to a point we're happy with.

  1. Audit the EC divisor library.

This is possible as soon as the standing edge case is resolved.

  1. Audit the implementation of the gadgets.

This should be trivial and possible as soon as the gadgets are formally proven.

  1. Audit the implementations of the towering curves.

This is possible as soon as the curves are implemented. If we implement them with a constant-time integer library, we'd solely audit that.

  1. Audit the circuit.

  2. Audit the secondary proof.

This could be audited as soon as it's implemented, which again, should be trivial to implement.

  1. Audit the integration.

Contribution to Seraphis

Creation of FCMP+SA+L effectively does all the work necessary for FCMPs with Seraphis with regards to FCMPs themselves. It'd provide the proving system, gadgets, review/audits, and most of the composition work. The only additions would be the curve cycle libraries (if we switched curves), the cross-group DLEq we'd move commitments with, and the new integration (into Seraphis instead of into RingCT).

Steps Forward

We'd have to agree on which proposal to move forward with, if we were to move forward.

While the first proposal was the one originally posited, it remains here solely for historic reasons. It appears to perform one less PoK of DLog, making it faster, yet it requires using two private points as generators within Proof of (Knowledge of) DLog statements. Such proofs are much more performant when the generator being used is public.

The second proposal also has the benefit of a much stronger argument for privacy, and uses additive randomness (instead of multiplicative).

We'd then need various implementation work done. I can personally implement the curve libraries using crypto-bigint, yet I cannot implement tailored implementations as likely needed to maintain the performance target. Someone else would need to step up.

I believe I can have all the implementation done, besides integration into Monero, within a few months. The actual integration into Monero would be:

  1. Having Monero build the tree.
  2. Having wallet2 call prove via the Rust library.
  3. Adding RPC routes to get the current tree root and the path of a member of the tree (where the DSA would be used to request multiple paths as to not reveal the output being spent).
  4. Having monerod call verify via the Rust library.

There'd be no wallet/address/privacy set migration. All of the additional curves would be contained within the proof.

I cannot estimate/guarantee the timeline for the formal analysis side of things. I'd hope by clearly defining components, the rate at which review occurs is not bottlenecked by the rate at which development occurs. That doesn't change we'd have to efficiently manage the timeline and definition of components. I'd hope for, in the best case, development + review/audits of the building blocks in 3-6 months, targeting deployment in ~12 months. Beyond myself, the only development necessary is of dedicated curve implementations (towering curves having already been found by tevador) and the integration into Monero (which jberman already started for FCMPs into Seraphis).

@tevador
Copy link

tevador commented Apr 30, 2024

the addition of the T term is what simultaneously creates the outgoing view key and offers forward secrecy

Not necessarily. If the T term is derived from the shared secret, it's not safe to use the G term as the OVK. However, the resulting ouput key is forward-secret nonetheless.

It's a few additions by the node (1.5 on average?)

Actually, it's on average 4 point additions and 8 Legendre symbol calculations.

The only argument against the node doing it is if one posits a pit of non-permissible values (say, thousands of them in a row) which forms an effective DoS.

Not thousands, but it's quite easy to find keys that will take 80+ point additions, so it can be considered to be a DoS vector. We can go with it for the initial FCMP rollout, but I'd like to add permissible keys and commitments to the Jamtis-RCT specs. That would reduce the verifier workload to just 2 Legendre symbol calculations per key.

could we not force wallets to resample randomness until permissibility happens to be achieved

The chance that a random key is permissible is 1/4. Two-output transactions only have one public key and a total of 4 keys derived from it. That means you would need, on average, 256 attempts (=44) to make a valid 2-output transaction. Impractical.

likely requires a table or 1-byte double-and-add to be properly performant

You can have a precomputed table of 256 multiples of T and G, which would only take 32 KB. Then you'd only need one table lookup and one point addition to recompute the permissible key. In any case, thanks to view tags, this would only happen for a small fraction of e-notes, so the precomputed tables are probably not even worth it.

@kayabaNerve
Copy link
Author

kayabaNerve commented Apr 30, 2024

Fair. I always presumed deployment via the public spend key as that's indistinguishable, with no changes to scanning/sending. Apologies for being so imprecise.

If 1/4 points are permissible, I would've assumed we'd reach a permissible point in 1.5 additions (as the current one can be checked without any additions). The average distance to an permissible point being 4, not 2, when they're presumably even spaced by 4 sounds off. I haven't wrote the code to literally show however. Agreed there's overhead for the check itself.

80+ point additions definitely isn't great. It's still less work than a scalar multiplication though. In that case, I'd go back to recommending resampling randomness yet that only can be discussed when each output has an independent randomness which isn't currently the case. If we're shifting send/scan changes to a future fork, would not that future fork presumably use independent randomness per outputs? I'd have to re-read on how Seraphis + JAMTIS proposed handling change outputs. If the extra byte is the best solution, along with a table or one-byte double-and-add in the scan code (which I never meant to suggest impractical, solely an annoyance like all three of these paths have), ACK.

@tevador
Copy link

tevador commented Apr 30, 2024

Re: torsion (6.2.1 and 6.4): What can happen if we allow keys with a torsion component to be added to the tree? I can't think of any exploit as long as the SA+L proof is handled correctly.

@kayabaNerve
Copy link
Author

kayabaNerve commented Apr 30, 2024

I can't think of any. I just no longer care to ask the question and argue security despite the question. It's not worth the mental effort and the potentially surprising results. I'm still frustrated I still have to consider some aspects re: torsion which are fundamentally not removable. This just really isn't the place to fuck around, find out.

The requirement of torsion clearing historical points is adding a non-negligible amount of work however, which I will concede. To go in depth, the divisor-based DLog proof shouldn't pass for torsioned elements. No sum of elements without torsion will add with a torsioned element to the identity element, which is what the divisor is supposed to check. That means the worst you get is torsion in, torsion out, on the output key and commitment.

We already argue spending torsioned input commitments is fine. You can torsion them on-chain, and we verify them as prime order elements (emphasis on as prime order elements, not that they are prime order elements), but at time of spend they'll be torsioned. For a torsioned binomial output key x'G + aT, where ' denotes torsion, we'd have to argue it's fine.

I'm not sure that is? I proposed a Bulletproof+ for the F-S SA+L. While we can argue if it's actually a BP+ or not (as it's solely the one-round protocol), that doesn't change we currently ensure all elements passed to a BP+ are without torsion via torsion clears. This would not have its torsion cleared and would be explicitly torsioned. We'd have to inv eight, eight (currently proposed for outputs, now done for inputs) or so increase the tolerances of our policy re: BP+.

I'd rather be done with it, and if we're concerned about performance (just over one additional scalar multiplication), implement Pornin's halving method which I believe is just 30% the cost.

@tevador
Copy link

tevador commented Apr 30, 2024

If 1/4 points are permissible, I would've assumed we'd reach a permissible point in 1.5 additions

We have:

Chance Additions Legendre symbols
1/4 0 3
(3/4)*(1/4) 1 5.5
(3/4)2*(1/4) 2 8
... ... ...

So the average number of additions will be:

$$A = \sum_{i=0}^{\infty} i p^i (1-p) = p (1-p) \sum_{i=0}^{\infty} i p^{i-1} = \frac{p (1-p)}{(1-p)^2} = \frac{p}{1-p}$$

With p=0.75, we get A=3. Similarly, for the number of Legendre symbols, we get an average of L=10.5.

I'm estimating that a Legendre symbol calculation will take time comparable to at least 50 point additions. So the average cost is ~500, dominated by Legendre symbol calculations. For "1 in 10 million" crafted DoS points, we get A=51 and L=130.5 and a total cost of ~6500 point additions.

Update (2024-04-30): fixed the number of Legendre symbol calculations. The average is 1.5 per addition plus 2 for the final successful point.

Update (2024-05-03): fixed the number of Legendre symbol calculations to account for the use of projective coordinates. The average is 2.5 per addition plus 3 for the final successful point.

@kayabaNerve
Copy link
Author

kayabaNerve commented Apr 30, 2024 via email

@kayabaNerve
Copy link
Author

Pushed edits in response, though not yet noting the above re: permissibility as it should likely be incorporated into the JAMTIS proposal.

Also, please feel free to further back on my anti-torsion stance. I do get that change is non-negligible when optimizing this. My justification to myself was the fact I proposed broadcasting keys and commitments as inverse eight, decreasing current node resource use, making this a one time payment for our history. I'm obligated to concede however that wallets would have to perform two triple doublings, which should be negligible yet is an additional cost we'd pay into the future (not solely historically).

@tevador
Copy link

tevador commented May 1, 2024

broadcasting keys and commitments as inverse eight, decreasing current node resource use [...]. I'm obligated to concede however that wallets would have to perform two triple doublings, which should be negligible yet is an additional cost we'd pay into the future

Napkin math tells me that it's better to use the Ristretto format rather than pre-multiplication by (1/8) and post-multiplication by 8.

Ignoring the permissibility check and Ed25519 -> Wei25519 conversion, to get the canonical touple (K, I, C) to insert into the tree, you need the following operations:

A. With pre- and post- multiplication

A.1 Transaction builder

  1. K' = ge_scalarmult(inv8, K)
  2. Compress K'
  3. C' = ge_scalarmult(inv8, C)
  4. Compress C'

A.2 Transaction verifier

  1. Decompress K'
  2. 3x ge_dbl
  3. fe_invert to convert to affine
  4. I = hash_to_ec(K)
  5. Decompress C'
  6. 3x ge_dbl
  7. fe_invert to convert to affine

B. With Ristretto

B.1 Transaction builder

  1. Compress K
  2. Compress C

B.2 Transaction verifier

  1. Decompress K
  2. I = hash_to_ec(K)
  3. Decompress C

Even if Ristretto point compression and decompression is slightly slower, it's more than compensated for by avoiding the extra group operations for both the transaction builder and the verifier.

Additionally, for commitments, the post-multiplication by 8 is more risky because it we are not careful, we might allow users to spend 8x the amount in the output.

The requirement of torsion clearing historical points is adding a non-negligible amount of work however, which I will concede.

As for historical outputs, some transactions have been confirmed to have torsioned keys. I see three possible solutions:

  1. Ignore torsion if we can prove it's safe.
  2. What you are proposing, i.e. clearing the torsion by multiplying by 1/8 and then 8 for all keys.
  3. Using the point halving method to detect torsioned keys and clear the cofactor only for those keys. This should be about 2x faster.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 1, 2024

It'd be better to use Ristretto despite there being no explicit conversion provided, as it'd be trivial to define, yet I'm not happy with the proposed scope at this time to move Monero to Ristretto despite calling for it a couple years ago. If others are willing to do the time and effort and it's not too contentious, than yes, I'd support it. I also get how proposing it for a future hard fork is silly though, as it'd re-incur building the tree (which is notable). That isn't to say it wouldn't be worthwhile in a future hard fork, solely bad to put off to one.

Additionally, for commitments, the post-multiplication by 8 is more risky because it we are not careful, we might allow users to spend 8x the amount in the output.

This seems like the thing we'd catch with basic testing (have every single point added to the tree go through a single fn, so either all commitments or none are so mutated) and also already a concern present in Monero? If the BP+ forgets to multiply by eight, the created commitment may be a value that's only in range when divided by eight? (though that variance isn't enough to trigger an overflow regardless)

Using the point halving method to detect torsioned keys and clear the cofactor only for those keys. This should be about 2x faster.

Would be my advocacy at this time, though I'll note halving doesn't detect torsioned keys. 3x halving, then 3x doubling to a different point does. We can't save the second half of those operations for most keys.


Distinctly, re: Wei25519, I implemented Wei25519 from the IETF draft for the divisor library. The map from Montgomery is trivial (one addition of a constant), where Wei25519.2 and Wei25519.-3 are non-trivial (multiplications by constants, plural). The only potential benefit I could see if we optimized the hell out of divisor construction, as the curve formula is a modulus and a coefficient of 2 may theoretically clean that math up? But that'd force the node to do the more expensive map (and it only does the map. it does not do any arithmetic for Wei25519 points, nor does it multiply/divide/mod any polynomials). Feel fee to confirm I'm not missing any considerations :) Really appreciate the feedback and collaboration here.

EDIT: We do two hash to curves and one Weierstrass addition per DLog claim on the first layer. I'll need to review hash to curve candidates and performance implications for choice of A before I can so write off the remaining question of my initial choice.

@tevador
Copy link

tevador commented May 1, 2024

It'd be better to use Ristretto despite there being no explicit conversion provided, as it'd be trivial to define, yet I'm not happy with the proposed scope at this time to move Monero to Ristretto despite calling for it a couple years ago. If others are willing to do the time and effort and it's not too contentious, than yes, I'd support it. I also get how proposing it for a future hard fork is silly though, as it'd re-incur building the tree (which is notable). That isn't to say it wouldn't be worthwhile in a future hard fork, solely bad to put off to one.

I think it's worth it to implement in parallel with FCMPs if we can find someone willing to do so. We should at least analyze the amount of work needed for this. Ristretto is basically only an alternative serialization/deserialization method and produces points in extended coordinates (struct ge_p3 as used by Monero), so it should not be too much work.

I implemented Wei25519 from the IETF draft for the divisor library. The map from Montgomery is trivial (one addition of a constant), where Wei25519.2 and Wei25519.-3 are non-trivial

I started implementing a concept for permissible points and the canonical Wei25519 representation is the most suitable one because the y coordinate in Wei25519 format is equal to the v coordinate in Montgomery format, which makes it easier to check for permissibility.

@tevador
Copy link

tevador commented May 1, 2024

Here is my reference implementation for point permissibility: https://gist.github.com/tevador/f9e6d4f262aba1c3de1eb7a20f40347d

I'm using the criterion that 1+y is square and 1-y is non-square, where y is the point coordinate in Wei25519 form. Point addition is still done in Ed25519 form, we just convert to Curve25519 coordinates for the permissibility test.

G + 3 T is a permissible point (the example used in the script).

The script shows that we actually need 3 Legendre symbol calculations to check permissbility due to the use of projective coordinates (affine coordinates are not used in practice).

@kayabaNerve
Copy link
Author

so it should not be too much work.

https://ristretto.group is clear enough I was able to implement it 2y ago, when I really did not know what I was doing, into an old project of mine: https://github.com/MerosCrypto/Meros/blob/master/e2e/Libs/Ristretto/Ristretto.py#L18-L111

It's less than 100 lines of Python for encode/decode (though that code isn't well-written and is ugly, please refer to the Ristretto specification).


I agree the y coordinate directly mapping is pleasant. I'm not convinced the saved cost of the map (multiplication by one constant) is worth the cost of the (potentially) worse hash to points/incomplete addition functions (which could incur multiple multiplications if not exponentations). I'm refraining from forming a definite comment until I poke around on the matter more.


Clarifying, is 1+y simply meant as a demo where a=1, b=1 for simplicity, or are you proposing using a=1, b=1? I'm not convinced the uniform distribution 'achieved' is of too much benefit given the a/b are constant. As one can brute force ys, they can brute force ay + bs. We may want to legitimately discuss removing the 'hash' element.

@tevador
Copy link

tevador commented May 1, 2024

I agree the y coordinate directly mapping is pleasant. I'm not convinced the saved cost of the map (multiplication by one constant) is worth [...]

You are correct. After publishing the permissibility script, I realized that there is no performance difference between Wei25519 and Wei25519.2. They simply differ in one constant that needs to be multiplied in both cases.

worse hash to points/incomplete addition functions

Incomplete addition doesn't use the curve constants. Only incomplete doubling uses a, but AFAIK we will not use doubling in the circuit. In case we need doublings in the circuit, it's better to use Wei25519.2. On-curve check might also be a little bit faster with Wei25519.2 because you can replace the multiplication by a with an addition, but I'm not sure if that makes any difference in the circuit.

Clarifying, is 1+y simply meant as a demo where a=1, b=1 for simplicity, or are you proposing using a=1, b=1? I'm not convinced the uniform distribution 'achieved' is of too much benefit given the a/b are constant. As one can brute force ys, they can brute force ay + bs. We may want to legitimately discuss removing the 'hash' element.

I'm proposing 1+y as the simplest hash that works. Not using a hash is out of question because our field has p = 1 mod 4, so y and -y always have the same quadratic residuosity (i.e. there would be no permissible points at all).

@kayabaNerve
Copy link
Author

kayabaNerve commented May 1, 2024 via email

@tevador
Copy link

tevador commented May 1, 2024

The implemented incomplete addition doesn't use the constant a but some
iadd functions only work for specific values of a (and may be more
performant).

You can check Explicit Formulas Database. The formulas for addition are the same for all values of a with the curve equation y^2 = x^3 + a*x + b. There are specific doubling formulas for some coordinate systems that depend on the value of a.

However, I'm fine with using Wei25519.2. I don't recommend Wei25519.-3.

It's a performance/'security' benefit tradeoff.

There is no security tradeoff here. You might be confused by the term "hash" they used in the Curve Trees paper. "Surjective map" would be a better term. We simply want a map that produces the expected 1/4 of permissible points. The simplest surjective map that works is the one I'm proposing. The main purpose of all this is to make sure only one of the pair [P, -P] is permissible for adding to the tree.

Note that if we had p = 3 mod 4 (like Bitcoin, for example), we could use simply y and -y and get 1/2 of permissible points.

@kayabaNerve
Copy link
Author

You are right re: iadd, sorry. I missed only doubling had distinct formulas.

I do understand it's not a proper hash. The security argument I was attempting to reference would be if scaling by a uniform element achieves a distribution closer to uniform. I don't inherently see why such a scaling would benefit security, hence security being in quotes, but I also don't consider myself capable of commenting on the statistical distribution and the impact of scaling vs solely including a fixed offset (b). I'll also note a term better than security may be "integrity" or "quality". If you also don't see a benefit for the distribution, its solely a performance detriment we can skip by setting a=1.

@tevador
Copy link

tevador commented May 1, 2024

AFAIK there is no relationship between the residuosity of 1+y and 1-y. This test is giving a good distribution. For examples, G + x T is permissible for the following values of x:

3
5
8
10
12
15
16
21
22
31
35
39
46
47
48
50
51
60
71
77
81
89
90
91

Overall, out of the first 100 000 points, 25 228 are permissible, roughly 1/4 as expected.

In the circuit, you will only prove the first part, i.e. w^2 = 1 + y. This is sufficient because for any point P in the tree, -P will not be permissible, so this check prevents the use of negated points for double spending.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 1, 2024 via email

@kayabaNerve
Copy link
Author

kayabaNerve commented May 3, 2024

@tevador Thoughts on re-defining key images to just their x coordinates?

If we let the prover malleate the sign of their key image, we avoid 25% of the scalars on the first layer (which is by far the widest part of the tree). Since the tree building is non-negligible (four scalar multiplications per output + updating all higher branches), I wanted to consider it. It'd trivially be possible now without rebuilding the key image store if we do two lookups, one per sign of the key image (until an eventual/never migration to an x-only store).

I think it'd only be the one-bit loss to security? I can't think of further ramifications immediately. I'm still mulling this over though.

EDIT: If we keep this going, we don't need permissibility checks on the output key/key image, as we allow sign malleation. That prevents requiring tweaking, and makes resampling randomness better than prior (as it's now 1 point per output which may trigger resampling, not 2). Whether or not resampling randomness is optimal remains debatable. I'm also just not really happy with the questions raised and how invasive this entire idea would be though...

@tevador
Copy link

tevador commented May 3, 2024

Eliminating the y coordinate of I from the hash would mean you can prove both (K, I, C) and (K, -I, C) are in the tree. Negating the key image base would result in a negated key image -KI. Not considering the sign bit of the key image would fix that and prevent double spending.

I think rebuilding the key image table would be relatively easy (could be done during a common DB migration). We would probably need to check the current set of spent outputs if there are two key images that only differ by the sign bit (extremely unlikely). Otherwise I can't immediately see any major issues. If it significantly speeds up the hash, I support this change.

@tevador
Copy link

tevador commented May 3, 2024

If we keep this going, we don't need permissibility checks on the output key/key image, as we allow sign malleation. That prevents requiring tweaking, and makes resampling randomness better than prior (as it's now 1 point per output which may trigger resampling, not 2).

When I'm thinking about it, do we even need the amount commitment C to be permissible? Not checking for the sign of C would at worst mean someone can 'spend' the negative amount in the output, which would only introduce a new method of burning Monero, but I can't immediately see issues with it.

@kayabaNerve
Copy link
Author

In order to currently have a difference by the sign bit, you'd have a solution to the discrete log problem for two outputs of the hash function. For xH_1 == yH_2, x * y**-1 is the relationship of H_1, H_2.

I think rebuilding the key image table would be relatively easy (could be done during a common DB migration).

Assuming key images are unordered (LMDB prefix iteration means they aren't), it's a one-time migration to avoid 2x impact to reading. I'm unsure that's worthwhile vs simply having new nodes build the DB in the faster manner, but I'm unsure migration policy.

This would save 25% off the first layer of the tree, and literally over 100m scalarmuls from the initial tree building (due to having more outputs than that). Those scalarmuls aren't full cost, due to being part of a multiexp, yet still non-negligible (especially since we need the multiexp output for a small batch, say 100 points at the high end, limiting the cost saved).


I also considered it for C. Since the formula is

sum(inputs) - sum(ouputs), having a negative input does not increase the amount spendable. It does allow underflowing inputs, yet wrapping around the curve order would require 2**192 input and/or outputs since this is additive.

I didn't want to suggest it, as while it removes the concept of permissibility once and for all, it's cursed. I appreciate optimizations where they exist, yet allowing negative inputs to save a tiny bit (ideally, just one sqrt per output) seems overly optimizing. I'm debating not endorsing the key image commentary due to how it'd affect the protocol proper (not just the TX format, yet fundamentally how we track double spends) and the need to be extra careful with checking historical key images, in-block key images, and in-TX key images. Not to mention, we can always generalize protocol rules, yet we can never re-specify them.

But yes, I believe that's technically valid and would remove permissibility.


I'll keep the protocol as-is for now, and we can discuss both these items at the next NWLB/MRL to see sentiment.

@tevador
Copy link

tevador commented May 3, 2024

It does allow underflowing inputs, yet wrapping around the curve order would require 2**192 input and/or outputs since this is additive.

That's nothing new, it already applies to outputs (i.e. you could create counterfeit Monero today if transactions with 2^192 outputs were possible).

I didn't want to suggest it, as while it removes the concept of permissibility once and for all, it's cursed. I appreciate optimizations where they exist, yet allowing negative inputs to save a tiny bit (ideally, just one sqrt per output) seems overly optimizing.

I don't think it's a tiny optimization.

  1. It removes the need to migrate old commitments to be permissible (a one-time cost of 100M+ commitments for every node).
  2. It removes the DoS vector of permissibility (see above). It's not one square root per commitment. The average is 10.5 square roots (or Legendre symbols).
  3. It simplifies both the protocol and the implementation (everything regarding permissibility can be removed).
  4. No need to modify Jamtis (sender-receiver protocol) to account for permissibility.

The only con of the change would be the possibility of purposefully burning double the amount in your own e-note (you'd need to provide extra e-notes to make the result non-negative).

@kayabaNerve
Copy link
Author

ACK. Will discuss at aforementioned meetings and see how other developers/researchers feel.

@kayabaNerve
Copy link
Author

@tevador We still need permissibility for the output of the hash function, unless we lose collision resistance for branch hashes, yet that should be DoS-mitigated as the hash is by branch (no one output can cause thousands of operations unless the only item in its batch).

@tevador
Copy link

tevador commented May 4, 2024

we lose collision resistance

Do we?

Let's take the simplest case of a binary tree and Pedersen hash H(a, b) = x(H_0 + a H_1 + b H_2), where x() means we only take the x-coordinate of the point as the hash result.

Suppose that someone can find an algorithm that takes (H_0, H_1, H_2) as input and produces two different tuples (a, b) and (a', b') such that H(a, b) = H(a', b').

We can turn that algorithm into a discrete log oracle because:

  1. If a H_1 + b H_2 = a' H_1 + b' H_2, then H_1 = (b' - b)/(a - a') H_2.
  2. If H_0 + a H_1 + b H_2 = - H_0 - a' H_1 - b' H_2, then H_0 = -(a + a')/2 H_1 - (b + b')/2 H_2 and we still get a discrete log oracle if we input for example H_2 = r H_1.

So finding a collision with an x-only Pedersen hash should still be as hard as solving the discrete log. It's important to have H_0 != O, otherwise you get an easy collision (a, b) and (-a, -b).

@kayabaNerve
Copy link
Author

It is (a, b) and (-a, -b) which I was concerned of. That produces outputs (x, y), (x, -y). I'm unsure the impact of having such a known collision when said -a would still need to be the hash of some branch, and finding such an opening would still be the discrete log problem? Yet it's best not to mess with it.

If you include a initialization of G_0 which is fixed with a coefficient of 1... then this explicitly becomes under the discrete log problem to find a collision? So ACK :) Good thinking. I believe when we pass the blinded Pedersen Commitment to the next layer up, we'd simply add this point, so it'd be done out of circuit which is quite nice. It does incur one extra addition to every hash output, but that should still be better than incurring the permissibility.

@tevador
Copy link

tevador commented May 4, 2024

In the FCMP++ PDF, chapter 6.1, where you write "where all members of branches are zero by default", there should at least be a footnote explaining that this is safe because Selene and Helios don't contain a point with x = 0 (that could be used to fake the presence of a member in the tree).

For this reason, we also don't need any impermissible dummy values that are suggested in the Curve Trees paper in Appendix E.

@kayabaNerve
Copy link
Author

For what it's worth, I did write out the circuit and verify the constraints pass for a honest prover. With that, we effectively have a correctness proof (not a soundness proof to anyone who is reading this without a formal background), letting us move to review/audits/formal verification. At least, we do once I update the paper with the fixes I made on the code side of things...

Re: development, it's been much smoother than expected. I was expecting notable challenges/frustrations based on my prior work on FCMPs. It was quite cumbersome to work with. While I did carry the code from it, as part of "productionizing", I did rewrite vast swaths of it (the existing code providing a reference on what works, the new code doing what works how it should be done).

Part of my thoughts on the amount of rewriting from prior work can be seen in this gist and in my CCS.

Implement an amenable framework for building arithmetic circuits.

is a milestone here, yet not in my CCS. The generic arithmetic circuit framework was fine, yet became incredibly bogged down when we added challenges/Pedersen Vector Commitments. This new work scraps it entirely for a minimal framework (enough to write circuits safely and without scratching my eyes out), then treats the PVCs as outside of the framework. This specification, only implementing what we need as we need, without a layer of abstraction around it, definitively helped. This decision was made prior to the CCS (hence it not being a milestone there).

The explicit document also helped, as it corrected a lot of development before it began. I'm hoping to be able to start work on the exposed API, and the integration side of things, in a couple weeks (letting jberman and co start on integration while I continue work on the FCMP++ side of things).

@kayabaNerve
Copy link
Author

ACK re: zero. The Helioselene library I published earlier today does have such tests.

@tevador
Copy link

tevador commented May 5, 2024

I ran some tests with Ristretto and unfortunately, it doesn't solve all of our problems.

Ristretto abstracts torsion away by choosing one representative from each of the 8-torsion cosets. This choice is made during serialization. However, the chosen representative is not always the point in the prime-order subgroup. For example, the base point deserialized from the ristretto representation e2f2ae0a6abc4e71a884a961c500515f58e30b6aa582dd8db6a65945e08d2d76 will have a torsion component (the y-coordinate is not equal to the canonical value of 4/5).

This has two consequences for us:

  1. In order to get the legacy point representation for hashing (to get the key image base), torsion clearing still needs to be done after deserializing K from the Ristretto format.
  2. Key images serialized in Ristretto format may still have torsion, which must be cleared to get the legacy representation.
  3. When converting a Ristretto point to Wei25519, we will still have torsion.

The first point could be overcome by redefining the key image base for post-fork output keys from Hp(legacy_repr) to Hp(ristretto_repr). This should be safe because each output will still have only one well-defined key image, so double spending will not be possible.

For the second point, torsion clearing with mul8 is unavoidable even with Ristretto.

For the third point, this is probably not a real problem because each K and C will still have only one well-defined x-coordinate in Wei25519 format. We would need to avoid any equality comparisons in Wei25519 format and use strictly Ed25519 format and Ristretto comparison (e.g. to check for identity) and only convert to Wei25519 immediately prior to hashing.

@tevador
Copy link

tevador commented May 5, 2024

For the second point, torsion clearing with mul8 is unavoidable even with Ristretto.

To clarify: Torsion clearing is avoidable only if we also migrate all historical key images to the Ristretto format. It would be a bit tricky though, with the proposed change to have the same representation for both KI and -KI,

@kayabaNerve
Copy link
Author

kayabaNerve commented May 5, 2024

... that may raise the new question of can we convert old key images?

If Ristretto has a random torsion component, and we have the full Ed25519 points (with sign data), within 8 additions we can presumably find its Ristretto-decoded equivalent. i don't love this idea and we'd need more specification to further comment, but I don't think it's impossible.

EDIT: As I posted this, the page refreshed and showed tevador's comment. Seems we're on the same page there.


I removed permissibility. I'll move to updating the paper to recent developments next. I'm unsure if my Helioselene library is 3 or 50x slower than dalek. My computer is unreliable for benchmarks :/ I'm getting its field arithmetic is somehow faster than dalek (despite not using a tailored impl), yet point addition is ~3x slower, yet the multiexp is ~50x slower. My honest guess is it's on the scale of 5-10x slower, but I'm only planning for a 2x performance increase to the entire proof while considering efficiency.

I truly can't say I have the experience nor interest in making a faster library. If anyone wants to do the field arithmetic in C and have me port, I'd be fine doing so. While I don't believe the learning curve for Rust on this amount of low-level code atrocious, I respect lack of interest in working on Rust (as I would a lack of interest in working on any topic, really).


We do need a test confirming Wei25519's x-coordinate of 0 does not have a valid point associated or a distinct representation for null leafs.

@tevador
Copy link

tevador commented May 5, 2024

can we convert old key images?

Yes, it's definitely possible.

  1. Decode the key image into extended coordinates using the legacy method.
  2. Encode the point in extended coordinates into the Ristretto format.

I think it should be possible to tweak the Ristretto encoding function so that both P and -P get the same representation (this would only be used for key images).

I truly can't say I have the experience nor interest in making a faster library. If anyone wants to do the field arithmetic in C and have me port, I'd be fine doing so. While I don't believe the learning curve for Rust on this amount of low-level code atrocious, I respect lack of interest in working on Rust (as I would a lack of interest in working on any topic, really).

It's on my TODO list. I want to take it as an opportunity to learn Rust.

We do need a test confirming Wei25519's x-coordinate of 0 does not have a valid point associated or a distinct representation for null leafs.

Wei25519 has a point with x = 0, but we'd have to check if the point can be reached from a deserialized Ristretto point. There is a 7/8 chance it won't be reachable.

@kayabaNerve
Copy link
Author

It's on my TODO list. I want to take it as an opportunity to learn Rust.

:)

Wei25519 has a point with x = 0, but we'd have to check if the point can be reached from a deserialized Ristretto point. There is a 7/8 chance it won't be reachable.

If we move to Ristretto, which still requires someone step up there in a timely fashion and isn't blocked by larger sentiment/time to review it. For now, I'll add a note that we use a the lowest x-coordinate which cannot be used within a point for the leaves (and 0 for the branches on Helios/Selene per reasoning you noted).

@tevador
Copy link

tevador commented May 6, 2024

Wei25519 has a point with x = 0, but we'd have to check if the point can be reached from a deserialized Ristretto point. There is a 7/8 chance it won't be reachable.

I confirm that the two Wei25519 points with x = 0 are unreachable, so we are lucky. Ristretto only adds a random 4-torsion element, but these two points have an order of 8*ℓ, so they can never be produced from valid Ristretto points.

@tevador
Copy link

tevador commented May 6, 2024

For the third point, this is probably not a real problem because each K and C will still have only one well-defined x-coordinate in Wei25519 format. We would need to avoid any equality comparisons in Wei25519 format and use strictly Ed25519 format and Ristretto comparison (e.g. to check for identity) and only convert to Wei25519 immediately prior to hashing.

So it turns out that this is a problem. The blinded key K' can have a different torsion than the original key K in the tree, in which case a membership proof would be impossible because after subtracting the blinding from K', the result would have a different x-coordinate. The same applies to C' and C.

A naive solution to adjust the blinding until K' and K have the same Ristretto torsion would be a disaster for privacy, because it would reduce the effective anonymity set to 1/4.

The only solution I see at the moment is to redefine the leaf-layer hash as H(K, I, C) := H(4*K, I, 4*C). This is effectively torsion clearing, except it only needs two doublings instead of three (thanks to Ristretto).

Since we can't avoid torsion clearing, this raises the question if we should forget Ristretto and go with the simple mul8 solution instead.

@kayabaNerve
Copy link
Author

I can't claim I'm happy with these torsion discussions and want to continue with them, especially if the end result would be separation of the privacy pools. While we can save a single doubling (two instances per output) without separation (so that definitely won't be the end result), I don't believe the complexity is worth it. The entire point of my torsion clearing in the document was to avoid spending days on these discussions of what's safe and what isn't and to establish a safe, clear, simple policy. While I don't want to say we shouldn't spend days on say, a 5% performance increase, I do believe this discussion is far from that.

I will say I'm happy if Wei25519's 0 x-coordinate is torsioned, as that means this new policy of torsion clearing effectively bans 0, allowing us to use 0 for null leafs. Non-0 for null is yet another such annoyance I'd like to be able to simplify out.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 6, 2024

@tevador For hash to point, do you have a better candidate than the IETF spec and SSWU for the mapping? I'm unsure if you had a specific research effort in mind for these curves.

@tevador
Copy link

tevador commented May 6, 2024

I can't claim I'm happy with these torsion discussions and want to continue with them, especially if the end result would be separation of the privacy pools. While we can save a single doubling (two instances per output) without separation (so that definitely won't be the end result), I don't believe the complexity is worth it. The entire point of my torsion clearing in the document was to avoid spending days on these discussions of what's safe and what isn't and to establish a safe, clear, simple policy. While I don't want to say we shouldn't spend days on say, a 5% performance increase, I do believe this discussion is far from that.

The Ristretto solution was my personal investigation and I think it was important to do it before commiting to a possibly suboptimal solution. Given my comments above, I think it's sensible to move forward with the original torsion clearing solution.

I will say I'm happy if Wei25519's 0 x-coordinate is torsioned, as that means this new policy of torsion clearing effectively bans 0, allowing us to use 0 for null leafs.

Correct. Torsion-cleared Wei25519 points will never have x = 0.

@kayabaNerve
Copy link
Author

The Ristretto solution was my personal investigation and I think it was important to do it before commiting to a possibly suboptimal solution.

Understood and respected. I participated out of belief it may produce a notably better solution. I just wanted to note my lack of active belief (which quickly stated fading when we discussed promoting existing key images to Ristretto representations).

@tevador
Copy link

tevador commented May 6, 2024

For hash to point, do you have a better candidate than the IETF spec (which may still be a draft? I forget if it was ever finalized) and SSWU for the mapping? I'm unsure if you had a specific research effort in mind for these curves.

Do we need hash to point for Selene and Helios? For the generators, we can use the simple method of hashing to a bitstring and trying to decompress it to a point.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 6, 2024

Yes, for Wei25519, Helios, and Selene (though we can of course hash to a birationally equivalent curve and then map, if that is cheaper). The divisor challenges require we sample two challenge points and then evaluate the divisor over their line.

Currently, I'm randomly sampling x's and recovering y. I don't actually want to claim that's optimal.

Also, the divisor challenge is the most expensive part of the FCMP right now (barring the multiexp, which executes in batch). It requires not only these hash to points, yet also creating hundreds of scalar powers (256 for the powers of x, ~256 for the powers of x multiplied by y). It still should be a small amount of work (hampered by the current Helioselene library not meeting performance expectations), that just doesn't change its percentage of the overall work. This final proof has ended up quite compact thanks to the current usage of divisors (it's 2048 or 4096 rows in the inner-product statement to just 256).

@tevador
Copy link

tevador commented May 6, 2024

The divisor challenges require we sample two challenge points and then evaluate the divisor over their line.

OK, I wasn't aware. I think we can use RFC 9380 and Simplified SWU for hashing as you suggested. Wei25519 will need torsion clearing after the mapping.

@kayabaNerve
Copy link
Author

ACK, and all good.

@kayabaNerve
Copy link
Author

Slight correction. We should be able to reuse challenges. That puts us at a flat two hash to points per Bulletproof.

For context on challenge reuse, the divisors are all evaluated independently (there's no ability to create one which plays off another). They just require a challenge binding to their variables. All existing hashes are drawn from one transcript of every such variable, so every challenge right now is binding to every variable. Accordingly, we should be able to collapse challenging without issue.

That should roughly halve the time the FCMP circuit takes? (FCMP circuit -> GBP verification -> batched multiexp, so halving the FCMP circuit is not halving the entire proof, just a notable part of it)

@hinto-janai
Copy link

Hello, I have some questions:

Outgoing View Keys

Is it as simple as adding this key + the current incoming key for full view-only wallets to exist? How would current wallets migrate to this? They can generate a new key and calculate o*G+y*T, but wouldn't this also mean providing 2 addresses?

Forward Secrecy [...] An adversary with a discrete log oracle also cannot distinguish between an unspent non-forward-secret output and a forward-secret output. Such an adversary can only calculate what the linking tag would be if the output isn’t forward secret, and wait to see if that appears

Does this mean this adversary can assert the existence of unspent non-FCMP outputs, and when they are spent? I.e. there's 2 output sets before entering the main FCMP set?

 /------------------------\        /------------------------------------\
| unspent pre-FCMP outputs |      |                                      |
 \------------------------/       |                                      |
             |                    |        main post-FCMP outputs        |
             v                    | (unknown origin, but is known to not |
 /------------------------\       |   be one from the previous 2 sets)   |
|    post-FCMP outputs     | ---> |                                      |
|     (known origin)       |      |                                      |
 \------------------------/        \------------------------------------/

If a post-FCMP output can be identified via the pre-FCMP output's linking tag, can a post-FCMP output be determined to be a member of either the known origin or main set by seeing if a pre-FCMP output's linking tag leads to it? Not sure if this really matters, just curious.

Tree

Maybe for @j-berman:

How is the tree planned to be stored? Additional tables in the main database? For unsynced nodes, would tree operations occur alongside block downloading + verification? For synced nodes, will new post-FCMP++ nodes have to stall on startup for a bit while generating the whole tree? Would this be done as a database migration?

If an output has a time-based timelock prior to the activation of the FCMP++ hard fork, it is converted to an estimated block-based timelock

Assuming timelocks aren't banned before FCMP++, how exactly would this be done? Convert time to blocks by dividing by 120 and add onto the current block height?

DB migration

For context, the last time migration code was touched was by mooo 5 years ago. I'm unsure if there's anyone currently willing to do it, although as far as I can tell it's relatively straightforward.

@kayabaNerve
Copy link
Author

but wouldn't this also mean providing 2 addresses?

Under my proposal, only the new address would need to be provided. They're indistinguishable and have an identical sending process. I make no guarantees my proposal would not impact the relevant JAMTIS proposal which only declares itself compatible with the current addresses.

Does this mean this adversary can assert the existence of unspent non-FCMP outputs, and when they are spent? I.e. there's 2 output sets before entering the main FCMP set?

Sorry, but this entire section isn't really sufficiently well formed for me to reply to its questions/comments. I'd rather clarify things from the start, and if you still have questions, follow up there.

There's currently the Ring outputs and the RingCT outputs. Both of these are non-FS and continue to be non-FS.

After FCMPs, FS outputs become possible. You may still make non-FS outputs yet may also make FS outputs (if there's an address for FS outputs or a protocol for FS outputs).

A historical non-FS output can have its linking tag calculated. Once calculated, an adversary can wait for it to appear on-chain.

A FS output cannot have its linking tag calculated. It has as many linking tag discrete logarithms as scalars for the curve. If you assume all outputs aren't FS, setting the new T term to have a coefficient of 0, you can obtain the linking tag it'd have if it wasn't FS. You can then wait and see if that linking tag appears on chain (confirming it isn't a FS output OR another entity has a solution for the discrete log problem at time of appearance on-chain).

Accordingly, an adversary with a discrete log oracle cannot differentiate unspent non-FS outputs and FS outputs. They both have linking tag discrete logarithms recoverable if you assume the T term has a coefficient of 0. Both will not have those linking tags actually appear on-chain. You can only differentiate spent non-FS outputs as those will have their linking tags appear on-chain.

This assumes we don't deploy FS at a hard fork boundary, which it sounds like we will with the above JAMTIS proposal. My proposal enables it to be done whenever without new wallet protocols (not to suggest its better than the JAMTIS proposal or still an active candidate. Just to note how these discussions formed).

@j-berman
Copy link

How is the tree planned to be stored? Additional tables in the main database?

Yep, here's a draft schema:

/* DB schema:
 *
 * Table            Key          Data			          Properties
 * -----            ---          ----			          ----------
 * leaves           leaf_idx     {O.x, I.x, C.x}	          Integer key, no dups, fixed size, sorted by `leaf_idx`
 * branches         layer_idx    [{branch_idx, branch_hash}...]   Integer key, no dups, fixed size, sorted by `layer_idx` first `branch_idx` second
/*
  • In the branches table:
    • The largest layer_idx in the db == tree root
    • The largest layer_idx in the db should only have a single record in the db, with branch_idx == 0
    • A branch refers to the hash of a chunk w_c elements in layer_idx-1 (or if layer_idx == 0, in the leaves table)
    • branch_idx == element idx in layer_idx-1 / w_c
    • branch_hash is 32 bytes that can either be a Helios point or Selene point
      • even layer_idx == selene
      • odd layer_idx == helios
  • This table approach should enable efficient queries for path by leaf_idx (will know all db reads needed immediately, can make parallel reads in theory).
    • The reason the branches table uses layer_idx as a key, and uses value [{branch_idx, branch_hash}...] is to optimize the same way output_amounts table does it (which is optimized for reading outputs by amount and global output ID). The primary index is layer_idx (like amount) and secondary index is branch_idx (like output_id). Planning on doing some more perf testing on this approach as well.
  • The block header stores the tree root hash at that block, and the end leaf_idx added to the tree for that block.
  • We'll also need a new table to keep track of locked outputs sorted by unlock_time.
  • We'll also want a migration for key images also described here.

For unsynced nodes, would tree operations occur alongside block downloading + verification?

Yep

For synced nodes, will new post-FCMP++ nodes have to stall on startup for a bit while generating the whole tree? Would this be done as a database migration?

I initially figured nodes would stall too; @kayabaNerve proposed implementing the migration as an async background task, which would only stall the node if the tree is not finished constructing at the fork height where FCMP's begin. Sounds like a solid idea to me.

For context, the last time migration code was touched was by mooo 5 years ago. I'm unsure if there's anyone currently willing to do it, although as far as I can tell it's relatively straightforward.

I included the migration of cryptonote outputs to the tree as part of my CCS proposal :)

Assuming timelocks aren't banned before FCMP++, how exactly would this be done? Convert time to blocks by dividing by 120 and add onto the current block height?

The timestamp timelocks are unix timestamps for when an output should unlock to clarify. So we can do something like: take timestamp from some block N, extrapolate block times into the future from that starting point using 120s for each block, then convert timestamps to unlock at their respective extrapolated block.

Thinking on it some more.. it's probably fairly straightforward to have a separate table for timestamp locked outputs sorted by unlock_time also. Then when adding a block, check the lowest unlock_time output in the table; if the output is unlocked using get_adjusted_time at that height, remove from table and add to tree, and continue iterating removing outputs from the table until encountering an output that should remain locked at that block's get_adjusted_time. This way we wouldn't need to do the conversion from timestamp to block.

@tevador
Copy link

tevador commented May 13, 2024

Is it as simple as adding this key + the current incoming key for full view-only wallets to exist? How would current wallets migrate to this? They can generate a new key and calculate o*G+y*T, but wouldn't this also mean providing 2 addresses?

Outgoing view keys are only planned for the new Jamtis address format. Legacy CryptoNote addresses will not have outgoing view keys.

Existing wallets cannot be safely migrated to support outgoing view keys because that would change all their existing addresses.

@kayabaNerve
Copy link
Author

@tevador I actually would like to distinctly follow up with you on if a wallet whose current main addresses are (sG, vG) will survive JAMTIS's backwards compatibility if malleated to (sG + yT, vG), as I originally proposed. While I hear you existing wallets cannot simply move, I'm personally curious about developing wallet software outside a HF boundary which achieves FS in such a manner for new wallets (though I don't want to have said addresses bork at time of JAMTIS).

@tevador
Copy link

tevador commented May 13, 2024

Forward-secrecy can be achieved at the time of FCMP activation with a tiny change in the sender-receiver protocol. The sender would simply add a T-term to the one-time address. This would bring immediate forward secrecy for everyone and not break existing wallets.

I do not support creating any additional new wallet types before Jamtis.

@kayabaNerve
Copy link
Author

In that case, mind documenting the exact proposed change and can we circle in @j-berman so we can push for such a change? Presumably, it's just an extra derivation off the shared secret?

Slight caveat that achieves weaker forward secrecy, which I note now in case such thoughts also impact JAMTIS. The lack of any binomial components in the existing addresses allow breaking known addresses (whereas if there was such an additional term as in my original proposal, you can identify sends to such addresses, yet you can not identify their spends).

@tevador
Copy link

tevador commented May 13, 2024

Slight caveat that achieves weaker forward secrecy, which I note now in case such thoughts also impact JAMTIS. The lack of any binomial components in the existing addresses allow breaking known addresses (whereas if there was such an additional term as in my original proposal, you can identify sends to such addresses, yet you can not identify their spends).

Forward secrecy always assumes that the DLP-breaking adversary doesn't know your address. That applies to both of the discussed cases. Regardless of any T-terms present in the address, the adversary can simply extract the secret view keys from the address, which makes any attempts for forward secrecy moot unless we migrate to a PQ-secure key exchange.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 13, 2024 via email

@tevador
Copy link

tevador commented May 13, 2024

We should not overestimate the forward-secrecy capabilities in case of leaked addresses. For example, the adversary can detect spends to known addresses (they will see a change output going to the sending wallet and a payment output going to the receiving wallet).

I want to reiterate that there is no safe way to malleate the keys of legacy addresses. Adding T-terms to one-time addresses achieves what is generally called forward-secrecy, is backwards compatible with existing addresses and forward-compatible with Jamtis. I don't see any other viable solutions.

@kayabaNerve
Copy link
Author

Fair. I'll circle back to

In that case, mind documenting the exact proposed change and can we circle in @j-berman so we can push for such a change? Presumably, it's just an extra derivation off the shared secret?

For the potential at-time-of-hard-fork solution.

@tevador
Copy link

tevador commented May 13, 2024

In that case, mind documenting the exact proposed change

It should be relatively simple.

variable description
K_d "key derivation" (sender-receiver shared key)
idx output index
K_1 (sub)address public spend key
K_o "one-time address" (output key)

Current one-time address derivation:

k_g = hash_to_scalar(K_d || idx)
K_o = K_1 + k_g G

Proposed change after the FCMP-fork:

k_g = hash_to_scalar(G || K_d || idx)
k_t = hash_to_scalar(T || K_d || idx)
K_o = K_1 + k_g G + k_t T

@kayabaNerve
Copy link
Author

We can eliminate the burning bug while we're at it by also prefixing the first input's key image if we're at it (modifying the shared key derivation). I'll leave further thoughts/discussions on this to @j-berman for now.

@tevador
Copy link

tevador commented May 13, 2024

We can eliminate the burning bug while we're at it

While I'm not saying I'm against this, we need to be careful about scope creep.

@tevador
Copy link

tevador commented May 13, 2024

Thinking on it some more.. it's probably fairly straightforward to have a separate table for timestamp locked outputs sorted by unlock_time also. Then when adding a block, check the lowest unlock_time output in the table; if the output is unlocked using get_adjusted_time at that height, remove from table and add to tree, and continue iterating removing outputs from the table until encountering an output that should remain locked at that block's get_adjusted_time. This way we wouldn't need to do the conversion from timestamp to block.

The problem is that get_adjusted_time is not monotonic. Adding a new block to the blockchain can in some cases result in outputs becoming locked again, unnecessarily increasing complexity.

I propose the following to be implemented at the time of the fork:

  1. Reduce the coinbase lock time to the default of 10 blocks that apply to all spends. This was discussed in research-lab#104.
  2. Ban new time locks. This would be done by a consensus rule mandating unlock_time to be zero as a natural extension of monero#9151.
  3. Convert all time-based legacy locks to height-based locks as unlock_height = height + (unlock_time - get_adjusted_time(height)) / DIFFICULTY_TARGET_V2, where height is the block height where the time-locked transaction was confirmed.

@j-berman
Copy link

j-berman commented May 13, 2024

I propose the following to be implemented at the time of the fork

I'm fine with this plan. Getting rid of the feature is still a +1 in my view as it's more likely to cause harm than benefit as it's been used.

Adding a new block to the blockchain can in some cases result in outputs becoming locked again, unnecessarily increasing complexity.

Clarifying that with the approach to maintain the existing time-based locks, the new rule at the fork would become first get_adjusted_time where time-based timelock is unlocked == block in which the output unlocks and is guaranteed unlocked from that point on. Either way we're introducing a new rule dictating how the output is unlocked.

Considering how little the feature is used and there's rough agreement to deprecate it, I think the optimal route is the path of least resistance and least complexity.

In that vein, I think converting time-based to block-based may still be optimal. I think the picture will be clearer upon implementing handling block-based timelocks first.

@hinto-janai
Copy link

I'd rather clarify things from the start, and if you still have questions, follow up there.

Sorry, not sure how to word things correctly. Given output movement like this:

non-FS output (A) ---> FS output (B) ---> output (C)

are these statements correct?:

  • A is known to link to B
  • B is known to come from A
  • C is known to not come from A
  • C is known to come from some FS output (but not necessarily B)

@j-berman thanks, details are appreciated.

We'll also want a migration for key images also described here

Ah okay, I thought the discussion here lead to this not being needed.

which would only stall the node if the tree is not finished constructing at the fork height where FCMP's begin.

So there will be leeway before the fork height to give nodes time to build the tree?

Thinking on it some more.. it's probably fairly straightforward to have a separate table for timestamp locked outputs sorted by unlock_time also. Then when adding a block, check the lowest unlock_time output in the table; if the output is unlocked using get_adjusted_time at that height, remove from table and add to tree, and continue iterating removing outputs from the table until encountering an output that should remain locked at that block's get_adjusted_time. This way we wouldn't need to do the conversion from timestamp to block.

The problem is that get_adjusted_time is not monotonic. Adding a new block to the blockchain can in some cases result in outputs becoming locked again, unnecessarily increasing complexity.

There's already up to 120 seconds of leeway for timelocks, would this be enough for the proposed table to be accurate enough?

@tevador
Copy link

tevador commented May 13, 2024

Clarifying that with the approach to maintain the existing time-based locks, the new rule at the fork would become first get_adjusted_time where time-based timelock is unlocked == block in which the output unlocks. Either way we're introducing a new rule dictating how the output is unlocked.

Fair point. Since we are already changing the rule, it makes sense to go with the simpler option that doesn't need a call to get_adjusted_time for every future block. I would also expect the code to be simpler if all historical time-locks are treated the same.

@j-berman
Copy link

So there will be leeway before the fork height to give nodes time to build the tree?

Ideally we'd release monerod containing the update well in advance of the fork height (last fork v18 release was released ~1 month in advance IIRC). So users could update their nodes, and it would start building the tree in the background on startup.

it makes sense to go with the simpler option

I lean toward converting time-based to block-based as well.

There's already up to 120 seconds of leeway for timelocks, would this be enough for the proposed table to be accurate enough?

We can also add an extra block like that for the extrapolated block, but imo I think the answer to "is it accurate enough" is likely yes. It's probably worth qualifying that with some analysis of block's adjusted times in practice, but the get_adjusted_time code appears fairly reasonable to me.

@kayabaNerve
Copy link
Author

kayabaNerve commented May 13, 2024 via email

@kayabaNerve
Copy link
Author

kayabaNerve commented May 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment