UkoeHB/preliminary_seraphis_design_overview.md Secret

## preliminary_seraphis_design_overview.md

      
    Raw
  

              preliminary_seraphis_design_overview.md
            
          
    Preliminary Seraphis Design Overview

Seraphis is a next-generation transaction protocol that builds off the many things that have been learned about protocol design since CryptoNote was first made public in 2013. After a year of development and refinement with the help of tevador (who proposed the Jamtis addressing scheme for Seraphis), Justin Berman (jberman), and Luke Parker (kayabanerve), the current Seraphis implementation is a solid protocol that addresses many long-standing issues with Monero and opens a path to possible future privacy improvements.
The main improvements include:

A new key image design that enables proofs with much larger ring sizes (e.g. 128 instead of 16) and full balance recovery with a view key (instead of only partial recovery which we have with CryptoNote view keys).
A new addressing scheme (Jamtis) that permits greatly increased privacy for light wallet users, eliminates the 'subaddress lookahead' issue while enabling robust random address generation, mitigates the Janus attack, and has strong forward secrecy against a possible future quantum adversary.
Improved modularity of the protocol with good support for future membership proof upgrades and the ability to remove timing information that can expose multisig transactions.
Many transaction uniformity upgrades.

Note: In this document 'Seraphis' refers loosely to both the core transaction protocol abstraction presented in the Seraphis paper and also the specific implementation that has been developed in my seraphis_lib Monero branch.
Monero core design goals

Since Seraphis is a protocol designed for Monero, it is important to consider Monero's design goals. Any proposed change to Monero's consensus rules must be justified based on the following core design goals.

Privacy: Minimize the amount of information that an observer of the Monero system (including the network, blockchain, transactions, addresses, etc.) can learn about people other than themselves.
Security: The Monero system should behave as you expect it to (funds cannot be stolen, privacy attributes are as advertised/implied, etc.).
Scalability: The Monero network and blockchain should be able to handle as many new transactions per second as possible without degrading performance or security. An individual Monero user should be able to send and receive funds as efficiently as possible.
Longevity: Monero should be able to stand the test of time. New protocol rules should be forward-thinking and design decisions should be oriented toward reducing the frequency of hard-forks and guided by the assumption that future hard-forks will be impossible.

Designers need to be careful not to lump a useful feature into 'longevity' because it satisfies some future usecase. Adding complexity makes it harder to understand/validate/implement the protocol, which is a longevity cost (analogous to technical debt - protocol debt). Moreover, adding pure features can set a precedent that may undermine the goal of reduced hard-fork frequency.
For an example, let's look at the 'random address generation without duplication' feature that is enabled by Jamtis address tags (definitely useful). Is it just a pure feature?

Privacy: Random address generation allows a user to have two address generators that are very unlikely to ever produce the same address. If those generators could produce the same address, that address could be used to link the generators together.
Security: When a user says 'generate me a new random address', they will always get one that hasn't existed anywhere else (except with negligible probability), which is what they expect.
Scalability: A user can generate many random or pseudo-random addresses and efficiently recover their funds, which is good for large-scale enterprise wallets. Note that Jamtis address tags increase the length of public addresses and add bytes to the blockchain, which is a scalability cost (protocol features often have scalability tradeoffs).
Longevity: Jamtis address indices are 16 bytes, a standard length for robust industrial-grade identifiers (e.g. see the UUID scheme). We hope 16 bytes is sufficient to satisfy most use-cases and never needs to be changed (notice that CryptoNote addresses did not stand the test of time - both subaddresses and integrated addresses added functionality).

Seraphis design overview

With Monero's core design goals in mind, here is an overview of the current Seraphis implementation. I emphasize how the protocol is different from the RingCT protocol that is live on mainnet as of v16. Note that all the parameters (e.g. Seraphis reference set size 128) can be easily changed.
Terminology


Enote: A message containing an amount and destination (aka 'output' or 'Txo').
Input image: Representing an enote being spent, an input image contains commitments to parts of the enote plus a key image. Input proofs operate on input images instead of the original enotes.

Transactions

Seraphis transactions look like you would expect. There are inputs spending old funds, outputs redistributing those funds, and proofs about those inputs and outputs. A Seraphis tx can spend both Cryptonote/RingCT (legacy) and Seraphis enotes in the same transaction.

Legacy input images: Legacy input images include a key image and masked commitment (aka pseudo-output commitment), the same as in RingCT.
Legacy input proofs (membership and ownership): These use CLSAG in the same way as Monero v16. Legacy ring signatures can reference all legacy enotes, from genesis to today, by handling early plaintext-amount legacy enotes the same way RingCT coinbase enotes are currently handled.
Seraphis input images: Seraphis input images include a key image, masked address, and masked commitment.
Seraphis input membership proofs: These use the Grootle proving structure (a simpler version of the Triptych and Lelantus Spark proofs using the optimization from section 1.3 here), using the squashed enote model from Appendix A of the Seraphis paper.

Grootle reference sets (ring members) use binning for a compact representation and bi-modal distribution of decoys (local and global).
Important: Seraphis was originally created with the goal of increasing ring sizes 'all the way'. The ideal membership proof would reference all enotes on the chain. Although no specific proposal has been published, some initial thoughts on how to achieve that can be found here.


Seraphis input ownership proofs: These use the lightweight Seraphis composition proof structure from Appendix B of the Seraphis paper to authorize the transfer of funds in a transaction and prove each input's key image was properly computed.

The current implementation uses per-input Seraphis composition proofs to keep the proof design simple and leave some flexibility during tx construction. It is theoretically possible to 'merge' all the composition proofs together into a smaller and slightly more efficient aggregate proof.


Output enotes: Seraphis enotes are very similar to RingCT enotes. They include a onetime address (constructed using Jamtis), an amount commitment, an encoded amount, a view tag, and an encrypted address tag (a Jamtis feature, it encodes the address index of the address that owns the enote).
Balance proof: A balance proof involves checking that the sum of input masked commitments equals the sum of output amount commitments plus the fee (the same way RingCT does it).

Transaction input masked commitments all use unique masks (the 'leftover' bit from adding and subtracting output and input commitment masks is recorded explicitly and included in the balance check), unlike in RingCT where the sum of input and output masks must equal zero. This means input images are not tied to the output set, allowing Seraphis membership proofs to be easily pre-built before making a tx proposal.


Range proofs: These are implemented with Bulletproofs+. Range-proofing Seraphis inputs is required by the squashed enote model, so Seraphis input masked amount commitments are range proofed in addition to output amount commitments like in RingCT.

Bulletproofs++ are currently being explored as a possible improvement over BP+.


Fees: Fees are stored in 'discretized' form, see below for more on that.
Tx supplement: Enote ephemeral pubkeys for creating sender-receiver Diffie-Hellman secrets (used in Jamtis for identifying owned enotes and decrypting their contents), and the tx_extra field (a mostly random memo field) are stored in a 'tx supplement' within transactions. Unlike in the current Monero protocol, enote ephemeral pubkeys have consensus rules (see below), so they are no longer embedded in the tx_extra.

An important part of the transaction design is you can make a partial transaction that has everything except Seraphis membership proofs. This enables tx chaining, where you make a chain of off-chain partial transactions that connect to the blockchain. It also means multisigs only need to make partial transactions. The multisig signer who finishes a multisig partial tx can add membership proofs for Seraphis inputs and submit the completed tx to the chain. Delaying the membership proof construction removes timing information about when the tx started being constructed which could expose the tx as a multisig tx (if there are no legacy inputs - CLSAGs can't be delayed).
Transaction uniformity

Over the years, a number of 'fingerprintability' issues with Monero transactions have been highlighted (mainly by the Noncesense Research Lab, see this excellent video). Many of those have been addressed with the current Seraphis implementation.

Fees: Instead of fees with arbitrary precision, which leak a lot of timing and wallet-implementation information, Seraphis uses 'discretized fees'. Only fees that are a power of 1.5 (rounded to 1 significant digit; includes 0, 1, and the max amount) are permitted in transactions. Discretized fees can be represented with just 1 byte by recording the exponent directly.
Address distinctions: Jamtis does not have a distinction between normal and 'sub' addresses - all addresses behave like Monero subaddresses. As a consequence, all enotes must have an associated enote ephemeral pubkey recorded in transactions (called the 'tx pubkey' or 'additional tx pubkeys' currently). This is a uniformity win because observers will no longer see some txs with 'additional tx pubkeys' (which means at least one output destination is a subaddress) and some without (which means there may or may not be a subaddress destination).

Optimization: A 2-output normal transaction may only have one enote ephemeral pubkey. The vast majority of transactions have only two outputs, so this rule is a major optimization for balance recovery (you only need to compute one Diffie-Hellman exchange for 2-output transactions, instead of two). All other transactions (coinbase txs and normal txs with 3+ outputs) must have one enote ephemeral pubkey per output.


Semantics: Seraphis tx inputs and outputs are sorted. In the current protocol, output ordering is implementation-defined because onetime addresses are a function of output index (Seraphis does not need this).
tx_extra: The tx_extra field now enforces sorted TLV (type-length-value) format (without the 'restricted tags' recommended in that link).
unlock_time: The much-maligned unlock_time field has been completely removed from Seraphis. Coinbase transactions use the same minimum spendable age as normal transactions (10 blocks).

There has been some inconclusive discussion about eliminating or reducing the 10-block-lock as well.


Payment IDs: Payment IDs have been deprecated in favor of Jamtis address tags (more on those below).
Transaction parameters:


Parameter
Monero v16
Seraphis proposed


In counts (legacy + Seraphis)
1 to unlimited
1 - 112


Out counts (normal)
2 to 16
2 to 16


Out counts (coinbase)
1 to unlimited
1 to unlimited


Reference set size (legacy)
16
16


Reference set size (Seraphis)
N/A
2^7 = 128


Enote ephemeral pubkey count (normal)
no rule
1 for 2-outs, 1:1 for >2-outs


Enote ephemeral pubkey count (coinbase)
no rule
1:1 for all


There are a few remaining areas where transactions can look different from each other.

Fees: Fees are necessarily user-defined, which means a user who consistently makes txs with a high fee will stand out relative to other users.
Ins/outs: Txs may have variable numbers of inputs and outputs. Input/output counts can be used to fingerprint users.

Note that eliminating this issue for outputs by adding dummy outputs would reduce the average number of times a given enote is referenced by membership proofs, which would be a privacy degredation. It would have to be combined with adding dummy inputs, a more technically challenging change.


Reference set size: Input membership proofs reference a small subset of on-chain enotes (both legacy CLSAG and seraphis Grootle membership proofs). This means the transaction graph is not fully obscured. If a user receives multiple enotes and spends them all in the same transaction, then anyone who can attach relatedness to those enotes will be able to guess with high certainty that the tx referencing them (in separate input membership proofs) is spending them, because the probability of such an event occurring by random chance is very low. Using a global membership proof that references all enotes on-chain would eliminate this big privacy defect, however creating such a proof is very challenging to do efficiently and securely.

The behavior of coinbase enotes highlights the problem with small reference set sizes. Coinbase enotes are distinct from normal enotes, so it is more obvious when they are spent compared to normal enotes. The true spends in txs that only spend coinbase enotes are often unambiguous because coinbase enote ownership is largely public information due to the ubiquity of pool-based mining. If each input in a transaction references one coinbase enote owned by the same organization, that is a strong statistical sign. It may seem at first glance that you could isolate coinbase enotes from normal enotes one way or another, however that would only be a partial solution to a more general problem. It is perfectly feasible for a Monero use-case to arise where some organization(s) decide to publicize their normal enote ownership. That case would have the same problematic characteristics as the coinbase situation. Using a membership proof with global anonymity set would render reference set analysis useless.


Reference set selection: Since membership proof reference sets are small, it is important to select decoy references carefully, however the reference sets for CLSAG and Grootle are currently implementation-defined (seraphis binned reference sets are only partially deterministic). It might therefore be appealing to enforce a decoy selection algorithm for all transactions. In practice, there is no ideal algorithm for selecting decoys, and any heuristic-based algorithm would almost certainly require periodic adjustments via hardfork, which would degrade protocol longevity.

There is ongoing research into improving decoy selection beyond the currently predominant gamma distribution approach, and an open proposal to enforcing decoy selection at consensus.
Membership proofs can't use decoys from the future, so there is no way to completely remove timing information about the delay between when membership proofs are constructed and when a transaction is submitted to the Monero network. However, as mentioned before, being able to delay Seraphis membership proofs means users that need a non-trivial delay between tx proposal definition and completing a tx can reduce the membership proof timing delay significantly (e.g. multisig txs or interactive protocols like atomic swaps).


tx_extra: The tx_extra field is, aside from the format, completely implementation-defined. An important purpose of this field is as an 'outlet' for features that are unable to be added to the protocol. Many possible features do not satisfy Monero design goals, and in the long run changing the protocol at all may become impossible.
Steganography: Many of the bytes in a transaction are cryptographic elements that a tx author can influence to record specific messages (think of 'vanity addresses' for example). It is impossible to guarantee (without unreliable heuristics) that a transaction contains no non-random patterns (note that encrypted steganography is not observable).

Jamtis addressing scheme

Seraphis introduces a new key image design that is incompatible with CryptoNote addresses. This presents an opportunity to re-design Monero's address scheme from the ground up - i.e. Jamtis.
Jamtis public addresses have three public keys (one Ed25519 key and two X25519 keys) and an 18-byte address tag. All Jamtis addresses are analogous to Monero subaddresses - there is no 'normal/main address'. The address tag includes a 16-byte encrypted address index and a 2-byte decryption hint (used like a view tag to speed up balance recovery). When sending funds to a Jamtis address, you encrypt its address tag with the sender-receiver secret and include the resulting encrypted address tag in the new enote. Using an address tag means Jamtis addresses don't need an 'index lookahead' like Monero subaddresses (however the lookahead is still needed for legacy balance recovery). This opens the possibility of random address generation and embedding custom information in the index (e.g. payment IDs).
Using three public keys facilitates a multi-tier authority structure, among other benefits discussed in the Balance recovery section.

Generate-address: The address generator can make addresses for the user (using any address index) and decipher the address tags of existing addresses.

Using a 16-byte address index means multiple non-communicating address generators can freely generate random addresses in parallel with negligible risk of colliding (both self-collisions and cross-collisions).


Find-received: This tier can perform view-tag checks on all enotes and recover the address tags of normal enotes. It cannot recover the address tags of self-send enotes (e.g. change enotes), nor can it view the amount of any enote.

This tier was designed and optimized for remote scanners. The client of a remote find-received scanner that pre-scanned the chain can very quickly recover their entire balance (mostly limited by bandwidth to download all view tag matches from the remote server). A user has massively improved privacy when using a find-received scanner compared to current view-key-based remote scanners. Note that there is no practical benefit to remote scanning with any address tier other than view-all, which would allow the scanner to store significantly less data. It is my hope that the privacy gap between the find-received and view-all tiers is so large that no one will be tempted to implement a third-party view-all scanning service, and the lack of efficiency gain from other tiers due to the lack of support for selfsend-recovery means they won't be implemented for third-party scanning either.


Generate-address + find-received: This combo tier can identify all received normal enotes (no amounts), decipher their address indices, and validate that the onetime addresses in those enotes are well-formed. It is unlikely to be very useful in practice because it cannot read amounts nor identify self-sends beyond view tag checks.
Payment-validator (find-received + generate-address + unlock-amounts): A payment validator can identify all received normal enotes (with amounts) and decipher their address indices.

The unlock-amounts key is only useful in combination with the find-received and generate-address keys, and is necessary to ensure you can't get a payment validator from the 'generate-address + find-received' combo tier. The unlock-amounts key only works with three-key addresses.


View-all: The core view tier can recover all balance-related information (identify normal and self-send enotes, recover address indices, read amounts, and compute all key images).

The client of a remote find-received scanner will usually have at least view-all authority (although a payment validator client may also be useful).


Master: The master tier has view-all powers and can spend all owned enotes.

Balance recovery

Jamtis introduces multiple improvements to the balance recovery process.

Janus mitigation: When recovering amounts, a reverse Diffie-Hellman is used when making amount commitment masks. Doing this mitigates the Janus attack by ensuring a user will fail to reconstruct a Janus-attacked enote's amount commitment. A three-key address is required to enable that reverse Diffie-Hellman technique.

Some care has to be taken here because a timing analysis can theoretically be performed on balance recovery, with the same result as a Janus attack. If you have a side channel on A, then send funds to B, you can watch A perform balance recovery to see how far the enote sent to B goes through A's balance recovery process. If the enote goes all the way through, then A probably owns B's address.


Enote burning: In CryptoNote, a user can receive multiple enotes with the same onetime address, but only one of those can be spent - a very hairy issue for wallets to handle (wallets have to make sure to only try to spend the highest-amount legacy enote amoung legacy enotes with the same onetime address). To solve this, Jamtis embeds an 'input context' in sender-receiver secrets to ensure that during balance recovery a user will only find enotes with unique onetime addresses.

The input context is a hash of a tx's key images for normal transactions, or a hash of the block height for coinbase transactions.
Enotes with duplicate onetime addresses may exist on-chain, but only one of them will ever be discovered by a user during balance recovery (assuming they follow the Jamtis spec). It is not feasible to ban duplicate onetime addresses on-chain because that would allow someone in an interactive protocol like multisig or atomic swaps to 'kill' in-progress transactions.
Note that onetime address sender extensions hash the amount commitment to ensure even if a malicious third-party find-received scanner sends an invalid input context to the user, they will still only recover enotes with the same exact amounts (all but the oldest of those can be trivially ignored).


X25519: Jamtis uses X25519 for the expensive part of enote scanning (computing Diffie-Hellman exchanges for all enote ephemeral pubkeys on-chain), for a ~40-60% speedup depending on hardware. X25519 is enabled by having three-key addresses.
Selfsends: All Seraphis transactions must have at least one selfsend output (either a dummy selfsend, change, or self-spend enote), which enables a major optimization for clients of third-party find-received scanners. Such clients only need to download key images from txs with view tag matches, instead of all key images.

Caveat: Two-output txs should not have two selfsend output enotes of the same selfsend type. In that case the sender-receiver secret would be the same for both enotes, so the amount blinding factors would be the same, XOR between the encoded amounts would equal XOR between the underlying amounts, and if the amounts are the same then subtracting the enotes' onetime addresses would equal the difference between the underlying user addresses.


Quantum resistance: Jamtis provides strong forward secrecy against an efficient DLP solver (e.g. privacy against a quantum adversary). This includes protecting amounts, membership proof anonymity, key image origins, and enote owners/recipients.*

A DLP solver with a user's public address can recover the user's find-received key and the amounts of any normal enotes received to that address (i.e. identify all normal enotes owned by the address plus amounts). If at least one of those normal enotes is spent, then they can recover all of their key images (actually they can generate key image candidates derived from all on-chain key images, among which will be the real key images - if two normal enotes are spent then the candidate set collapses). They cannot figure out anything about the user's self-send owned enotes (aside from view tag checks), and can't learn anything about the user's other addresses. However, if the DLP solver gets access to the user's generate-address secret in addition to an address that owns spent enotes, then they can unravel the user's entire key structure (including the master key).
DLP solvers cannot learn anything about users whose addresses they don't know (aside from weakening their membership proofs by learning the true spends of compromised users).
Here is an open proposal to embed a post-quantum 'switch' into Seraphis. While full post-quantum resilience is a valuable goal, my general stance is to take things one thing at a time. It would be better to see and think about a comprehensive future-gen protocol that could allow Monero to evolve beyond Seraphis, if necessary.
*These claims are based on a short analysis and need to be verified carefully. Note that no analysis has yet been done on the forward-secrecy properties of seraphis knowledge proofs which are still WIP.
Parameter	Monero v16	Seraphis proposed
In counts (legacy + Seraphis)	1 to unlimited	1 - 112
Out counts (normal)	2 to 16	2 to 16
Out counts (coinbase)	1 to unlimited	1 to unlimited
Reference set size (legacy)	16	16
Reference set size (Seraphis)	N/A	2^7 = 128
Enote ephemeral pubkey count (normal)	no rule	1 for 2-outs, 1:1 for >2-outs
Enote ephemeral pubkey count (coinbase)	no rule	1:1 for all