The protocol presented in this puzzle sets up a basic ZCash-style private cryptocurrency protocol, using the Groth16 SNARK and instantiated with the MNT cycle of curves (as used by Mina to allow for arbitrary-depth recursion with succinct proofs: see e.g. here).
Each "coin" (this roughly corresponds to the bitcoin concept of 'utxo' or "unspent transaction output") is associated with a public/private keypair on the MNT6_753
elliptic curve.
More precisely, the private key is an element x
of the finite scalar field MNT6BigFr
, and the public key is the element [x] * g
in the elliptic curve group MNT6
.
The protocol involves maintaining a Merkle tree, consisting of the public keys for all prior transaction outputs, which we can imagine to be maintained by a decentralized network of miners/validators. In a public blockchain like bitcoin, when someone wants to spend a coin, they publish a transaction signed with the coin's private key and including a pointer to the coin (e.g. its index in the Merkle tree). Then, blockchain observers/miners/etc. can simply remove the corresponding "coin" from the pool of unspent coins (utxos). If someone tries to spend the same coin again, miners will match the pointer to the coin to the pointer specified in the prior transaction, and reject the double-spend attempt.
In a private blockchain like ZCash, the goal is to allow for decentralized consensus and to avoid the possibility of double-spending without a public record of transactions.
In particular, the user should not have to reveal which prior transaction output they are spending, only that they have the secret key corresponding to some unspent transaction output.
In ZCash and its descendants, this is accomplished using a zero-knowledge proof: the Merkle root root
of the coin pool is a public input, and the user proves the statement
"I know some secret key x, along with a Merkle inclusion proof that the corresponding public key [x] * g
is in the Merkle tree with root root
".
Since the proof is zero-knowledge, validators can verify the proof without having to know which public key corresponds to the coin being spent. This achieves the goal of privacy, but does not address the double-spend problem. To address this, the ZCash protocol introduces the concept of 'nullifiers'. This is a cryptographic commitment to the secret key which cannot be linked to the corresponding public key: more precisely, the nullifier is the hash of the secret key. This is included in the public input of the SNARK proof along with the Merkle root, and the SNARK circuit verifies that it is indeed equal to the hash of the secret key used in the proof.
Now, the validators can maintain a list of the nullifiers for all previous transactions, and reject any submitted transaction whose nullifier appears on this list. Assuming that the hash function is secure, finding out which public key corresponds to a given nullifier is infeasible, so the privacy of the protocol is preserved.
Let's unpack what the arkworks circuit SpendCircuit
does in the puzzle code,
unwinding the various type aliases:
- The public inputs are:
root
: an element ofMNT4BigFr
holding the root hash of the Merkle tree.nullifier
: another element ofMNT4BigFr
holding the nullifier value corresponding to the coin's secret key.
- The private inputs are:
secret
: an element ofMNT4BigFr
holding the coin's secret keyx
.cw
: aPath<MNTMerkleTreeParams>
variable holding the Merkle inclusion proof of the coin's public key in the Merkle tree.
- The circuit includes constant values:
leaf_crh_params_var
, aPoseidonConfig<MNT4BigFr>
variable giving the configuration of the Poseidon hash to be applied to create the Merkle tree's leaves.two_to_one_crh_params_var
, also aPoseidonConfig<MNT4BigFr>
variable giving the configuration of the Poseidon hash to be applied inside the Merkle tree to compute the internal nodes' hashes.base
: the generator ofark_mnt6_753::G1Affine
.
- The circuit enforces the following relationships among these:
- The canonical bit representation of
secret
(i.e. as an integer less than the modulus ofMNT4BigFr
) is below the modulus ofMNT6BigFr
. nullifier
is equal to the Poseidon hash ofsecret
- with the Poseidon hash configured using
leaf_crh_params_var
- with the Poseidon hash configured using
cw
is a valid Merkle inclusion proof for a Merkle tree with root equal toroot
for the valueleaf_g = [pk.x]
, wherepk
is equal to the scalar multiplication ofbase
bysecrets
.
- The canonical bit representation of
As we know, the implementation provided here has a vulnerability: a user can produce more than one nullifier corresponding to the same secret key, and valid transaction proofs for each of them. This would allow the user to spend the same coin twice: free money!
From the breakdown of the circuit above, we can make some observations about where the vulnerability lies:
- First, the
nullifier
variable is a fixed function ofsecret
: therefore, if we want proofs for two different nullifiers, we will also need two different secrets. - The configurations of the Poseidon hash and the Merkle tree
are hard-coded by the
SpendCircuit
, so there's nothing we can change there. - The code in
main()
enforces that both proofs have the sameroot
and the same Merkle proofcw
. - Assuming that the arkworks implementation of Merkle proofs is correct, the
circuit enforces that
cw
is an actual Merkle inclusion proof ofleaf_g
into the tree with rootroot
.
From the above, we can see that we are looking for a
malleability
bug in the circuit: given secret
and a Merkle inclusion proof cw
for secret
's public key in
the Merkle tree, Alice can produce a
different key secret'
such that the same proof cw
will show that
secret'
's public key is in the tree!
In theory, the Merkle tree construction provides a cryptographic vector
commitment to its list of leaves: it should be cryptographically infeasible to find more
than one vector of leaves with the same root
. (A commitment scheme that
satisfies this condition is said to be "binding").
Indeed, if we were to find two different values of
leaf_g
and valid Merkle inclusion proofs for both of them with the same
root
, we would have found a Poseidon hash collision!
Let's see how this works, following the cw.verify_membership()
function call:
- The Merkle inclusion proof
cw
consists of a list of hash values for the "siblings" of the path from the leaf to the root. - Assuming (without loss of generality) that
leaf_g
is the leftmost leaf, theverify_membership()
function does something like:- set
nodes[0] = LeafH(leaf_g)
- recursively set
nodes[i] = CompressH(nodes[i-1], cw[i-1])
fori = 1..=cw.len()
. - check
nodes[cw.len()] == root
- set
So, starting at the root, we see that either nodes[cw.len()-1]
is equal
for both leaves, or we have found a hash collision for compressH
.
In the former case, we continue downwards: repeating this until we reach
the leaf, we see that we either have a CompressH
hash collision somewhere
along the way or we have a hash collision for LeafH
.
Since we expect finding a Poseidon hash collision to be a difficult problem, we're left with the
conclusion that the attack involves finding two different
values of secret
corresponding to the same value of leaf_g
.
My first guess was that the relationship between secret
and
pk
was insufficiently constrained. The circuit enforces
that pk = [secret] * base
, so maybe we need to find a secret'
such that
[secret] * base == [secret'] * base
. There's two different elliptic curves and
two different fields floating around: maybe Bob mixed them up somewhere?
Looking at the type signatures, we see that base
and pk
are elements
of the elliptic curve group ark_mnt6_753::G1Affine
, and secret
is an element
of the field MNT4BigFr
: aha! This is the scalar field of
the wrong curve, MNT4_753
rather than MNT6_753
. Looking
at the docs, we see that indeed MNT4BigFr
is a larger field
than MNT6BigFr
, so perhaps it does indeed contain two
elements that differ by a multiple of the modulus of MNT6BigFr
. This modulus
is equal to the order of the elliptic curve group ark_mnt6_753::G1Affine
, so
two such elements would indeed give two different values of secret
producing
the same value of pk
.
I was excited at this possibility, and went ahead and wrote
a loop to add a bunch of multiples of MNT6BigFr::MODULUS
to
leaked_secret
and see if they solved the puzzle. But no dice!
The reason this attempt did not work is the range-check in the circuit:
- This takes
secret
, an element ofMNT4BigFr
, and first computessecret_bits
, its canonical representation as an integer greater than or equal to0
and less thanMNT4BigFr::MODULUS
. - Then it constrains the integer
secret_bits
to be belowMNT6BigFr::MODULUS
.
Now, secret_bits
is an integer greater than or equal to 0
and less than
MNT6BigFr::MODULUS
, and this is what is fed into the
arkworks scalar multiplication circuit. So there's no room for any sneaky
modular arithmetic: no two such integers can differ by a multiple of
MNT6BigFr::MODULUS
. Darn.
Now, we are led to the actual vulnerability: the Merkle tree's leaves are
computed from leaf_g = pk.x
, and multiple different curve points pk
can
correspond to the same pk.x
!
Indeed, the MNT curves are represented in short
Weierstrass
form: i.e. curves given as solutions pk.x
is not sufficient to encode the
actual point pk
!
Now that we have found the problem, it seems like a pretty silly
mistake on Bob's part, doesn't it? However, they are following the Zcash spec
here. See
ZCash uses the elliptic curve Jubjub
: the curve given by the solutions
-
$a, d, q$ have specific values, which are recorded in the spec. - This is a different type of equation from the short Weierstrass
equations
$y^2 = x^3 + ax + b$ ! Elliptic curves written in this form are called twisted Edwards curves
At first glance, this seems to have the same problem as we encountered above
with short Weierstrass curves: indeed, if
However, there is a crucial difference: the public keys in their scheme live in
a subgroup of points of the elliptic curve. They specify a large prime number
- The private keys are then members of the field $\mathbb{F}{r\mathbb{J}}$,
and scalar multiplication by this field only makes sense for points in this
subgroup (as
$r_\mathbb{J} \equiv 0$ in $\mathbb{F}{r\mathbb{J}})$. - The generator
$g$ of$\mathbb{J}^{(r)}$ used to map private keys to public keys is chosen to be a member of this subgroup: this ensures that for any integer$x$ ,$x * g$ is contained in$\mathbb{J}^{(r)}$ . - When implementing their spec, they need to be sure to actually check that any claimed elliptic curve points provided by the prover are actually in this subgroup! Arkworks does this here.
As they explain in Lemma
Finally, we will see how Alice actually implements their attack. She starts with
secret
and needs to find secret'
such that the two points pk = [secret] * base
and pk' = [secret'] * base
have the same x
coordinate.
Since the MNT curves are in short Weierstrass form, these two points are related
by algebra: if
This is because the
identity point of short Weierstrass curves is the "point at infinity" where all
vertical lines meet. The elliptic curve group operation is defined by setting
This means that to find secret'
, we need [secret'] * base = -[secret] * base
. Now, as we discussed above, the circuit in Bob's code
enforces that secret
and secret'
are both elements of MNT4BigFr
whose
canonical integer representation lies between 0
and
MNT6BigFr::MODULUS
. This means that literally taking secret' = -secret
won't work: since secret
is an element of MNT4BigFr
, the canonical integer
representation of -secret
will be MNT4BigFr::MODULUS - secret
.
However, since MNT6BigFr
is the scalar field of our elliptic curve, we know
that -[secret] * base = [MNT6BigFr::MODULUS - secret] * base
. Since secret
is between 0
and MNT6BigFr::MODULUS
, the same will be true of
MNT6BigFr::MODULUS - secret
.
So this is our answer! Indeed, if we write
let secret_hack = MNT4BigFr::from_bigint(MNT6BigFr::MODULUS).unwrap() - leaked_secret;
let nullifier_hack = <LeafH as CRHScheme>::evaluate(&leaf_crh_params, vec![secret_hack]).unwrap();
we get a valid solution to the puzzle.
To fix Bob's code, we could take the leaf_g
to include the full public key
(pk.x, pk.y)
. However, this creates some inefficiency:
now we have to hash two field elements instead of just one, and hashing inside
of a circuit is an expensive operation. Indeed, avoiding
this extra hash is likely why ZCash uses just the x
value in the first place.
Including the full y
value is wasteful: there are only
two possible values of y
for a given value of x
, so we only really need an
extra bit of data. For this reason, in many implementations of elliptic curve
cryptography, "compressed" forms of elliptic curve points are used. For example,
the curve bn254
has a base field of size y
with x
in a 256
-bit integer.
However, this trick does not quite work in the circuit context here: the value of secret
is stored
within an element of MNT4BigFr
. This is larger than MNT6BigFr
, but not quite
large enough: elements of both fields take
Is there a trick Bob can use to avoid needing to hash two field elements for each public key? I don't know if this is possible.