bkomuves/sumcheck_writeup.md Secret

## sumcheck_writeup.md

      
    Raw
  

              sumcheck_writeup.md
            
          
    ZK Hack III, first puzzle (univariate sum-check protocol)

Introduction

The puzzle description was pretty minimal. We can infer the following
pieces of information from it:

we are given a secret polynomial f, and the ZK argument should prove that
the sum of its values on some specific subset is zero
we should find a way to cheat, and "prove" this while the sum is actually
nonzero; and also keep the actual sum hidden from the verifier

Looking at the source code, we can see the general parameters:

it's a univariate polynomial (over the BLS12-381 scalar field, but this won't matter);
it has degree 30, and the evaluation domain, on which we calculate the sum,
has 16 elements: it's a multiplicative subgroup H of size N = 16 = 2⁴
(so a standard FFT friendly domain);
the polynomial commitment scheme is Marlin's batch version of the KZG scheme,
but this won't matter either.

So we are looking at a univariate sum-check protocol (as opposed to the more
popular multilinear sum-check protocol, as described for example in Thaler's book).
Looking more deeply, the prover is not implemented at all (that's our task!), but
the verifier is there, and we shall find an issue in that which we can exploit
with a malicious prover.
The verifier looks pretty simple:

it receives the statement, which consists of the evaluation domain,
a commitment to our polynomial f(x), and the expected sum
(which will be just zero in this task)
it also receives the proof, which consists of four openings
of four polynomials: f, s, h, g, together with commitments to the latter
three (f was already commited in the statement), and a batch evaluation proof
for the openings.
and then it checks the evaluation proofs, and some equation

The sum-check protocol

Ok, time to figure out what exactly this protocol is supposed to be! The linked
reference material is not really helpful, but after a few minutes of google-ing, I
found this review paper:

Adrian Hamelink:
Understanding Zero-Knowledge Proof Systems

which describes (on page 25, section 4.1.1) a univariate sum-check protocol
which now looks eerily familiar! Even most of the symbols are the same, though for
some extra confusion, what is our g, he calls p, and as a bonus he also
uses g for something else.
So how does this protocol work? The simpler, non-ZK version proves that the sum
of f(x) over x∈H is a given field element γ ∈ 𝔽.
To do this, first divide (using polynomial long division, or in practice,
a specialized version of it) our polynomial f(x) by the vanishing polynomial
Z_H, which is defined as the smallest (nonzero) polynomial which
takes zero on all x∈H:
     Z_H(x)  :=  ∏_h∈H (x-h) = x^N - 1
The arkworks function for this division, is helpfully implemented as divide_by_vanishing_poly(),
and it computes the quotient h(x) and the remainder r(x):
     f(x)  =  h(x) ⋅ Z_H(x) + r(x)
with deg(r) < N.
Now, the first observation is that of course the first term, by definition,
vanishes on H, so it doesn't contribute to the sum:
    γ := ∑_h∈H f(h) = ∑_h∈H r(h)
More interestingly, the higher degree terms of r(x) do not contribute either!
That's because of the following fact:
    ∑_h∈H h^k = 0    for 0 < k < N
Why is this true? Well, H is a multiplicative subgroup, that is, it is generated by some
generator u∈𝔽 (the letter g is unfortunately already used):
    H = {   uⁱ   |   i∈{0..N-1}    }
Rewriting the sum, we get
    ∑_h∈H h^k    =    ∑_i=0^N-1 (uⁱ)^k    =     ∑_i=0^N-1 (u^k)ⁱ
but u∈H, hence also all its powers u^k too, are N-th roots of unity, whose powers
sum to zero (to have some intuition about this, imagine the complex plane instead of
finite fields: they just go around the unit circle symmetrically, and sum to zero because of this symmetry).
The only exception is when k is divisible by N=16 (which in our case
means k=0), in which case each term of the sum is u^N = 1, hence the final sum is just N
(but interpreted as a field element).
After this little detour, we can get back to our protocol! What we just convinced
ourselves about means that only the constant term of the remainder polynomial r(x)
actually contributes to the sum! So let's rewrite
    r(x) = c + x ⋅ g(x)
separating the constant term c and the rest. Now our equation looks like this:
    f(x) = h(x) ⋅ Z_H(x) + x ⋅ g(x) + c
and the sum is simply:
    γ    =    ∑_h∈H f(h)    =    ∑_h∈H ( h(x) ⋅ Z_H(x) + x ⋅ g(x) + c )    =    0 + 0 + N ⋅ c   =    N ⋅ c
In other words, c = γ/N. Now we are ready to put together the sum-check protocol:

the prover computes, using polynomial division, h(x) and g(x) as above
commits to these polynomials
then the verifier checks the equation f(x) = h(x) ⋅ Z_H(x) + x ⋅ g(x) + γ/N
by evaluating both sides at a random pont x_i∈𝔽 (this works because of the Schwartz–Zippel lemma)

Looking at the verifier source code, it implements almost exactly this, but there is
also an additional polynomial s(x), and the equation is modified to
    f(x) + s(x) = h(x) ⋅ Z_H(x) + x ⋅ g(x) + γ/N
Cheating using the masking polynomial

The problem description vaguely hints that we should keep the actual sum
secret, or else bad things may happen in real life. Presumably the role
of the extra polynomial s(x) is to hide information; it's even referred
(implicitly) as the "masking polynomial" in the problem description.
However, there is actually no check whatsoever about s(x) in the code!
In fact, the only check apart from the above equation is a degree bound
check on g(x):
     deg(g) ≤ N - 2
This is a strong hint that we can abuse s(x) to cheat, and indeed we can!
The first idea to simply cancel out the real sum γ_real
by defining s(x) to be the constant polynomial:
     s(x) := - γ_real / N
However, with this definition, the verifier can deduce the actual sum, as it
evaluates s(x_i) = -γ_real/N. So this does not
yet fully satisfy the puzzle description.
No worries though, we can choose s(x) to be whatever we want, as there are
absolutely no extra checks about it in the verifier! For example we can
just add some random higher-order terms, masking this information:
     s(x) := - γ_real / N + x ⋅ t(x)
     t(x) := ∑_i=0^N-2 (t_i ⋅ xⁱ)
where t_i∈𝔽 are random field elements.
Of course we have to fix the equation too, but that's easy: just roll the freshly
introduced "error terms" into g(x) on the right side, too:
    g'(x) := g(x) + t(x)
Now the equation looks like this:
    f(x) + s(x)    =   f(x) - γ_real / N + x ⋅ t(x)   =   h(x) ⋅ Z_H(x) + x ⋅ ( g(x) + t(x) ) + γ_fake / N
with γ_fake = 0 in our case (but we could fake any other sum, too). Cancelling out t(x) from the two sides,
and moving γ_real to the other side, we get back the original equation, that is, we were able to cheat:
    f(x) =   h(x) ⋅ Z_H(x) + x ⋅ g(x) + γ_real / N
We could also introduce even higher order masking terms, and carefully roll them into h(x),
but unless s(x) is evaluated at as many points as the size of domain N, this
should be enough to hide the information. The problem description hints at s(x) being
evaluated twice (as part of a larger of a larger, unspecified protocol), hence
this is already more than enough.
Corrected ZK protocol

After solving the problem, one wonders how the correct protocol with
masking polynomial would look like? Well, it is very easy to eliminate the above
cheat: Just ensure that the constant term of s(x) is zero. The simplest way
to do this is to modify the verifier's equation to
    f(x) + x ⋅ s'(x) = h(x) ⋅ Z_H(x) + x ⋅ g(x) + γ/N
where s'(x) is the new, "corrected" masking polynomial (replacing the old s). We should also check the degree bounds:

deg(g) ≤ N - 2
deg(s') ≤ N - 2
deg(h) ≤ deg(f) - N

Note that if for example s'(x) would have a nonzero term of degree N-1, then
after the multiplication by x it would become a term of degree N, which would contribute a
nonzero amount to the sum, again enabling the prover to cheat! This is also the reason
why we need all this division by Z_H business in the first place.