In this document, we explore an approach to uncovering prover's secrets by analyzing the content of their proofs. This method leverages scenarios where the proof system fails or just doesn't bother to conceal information — specifically, when the proofs are not zero-knowledge.
Note: This article is not a formal technical paper. Instead, it aims to present complex and advanced concepts in a (hopefully) accessible way. While some simplifications will be made for clarity, I will strive to point out when the actual details are more intricate than described.
Shortcuts for Advanced Readers: Each section includes a quick TL;DR summary for those already familiar with the basics. Feel free to skip ahead if the summary indicates nothing new or significant for you.
TL;DR: We work with a statement
F(x)=y
, whereF
is a known function,x
is the private input, andy
is the publicly known result.
Let’s start by establishing a shared understanding of the paradigm. The basic scenario in the world of zero-knowledge (ZK) proofs involves the following components:
- A known function
F
that represents some computation. - A known value
y
, which is the result of applyingF
to some input. - Two actors:
- The prover, who claims to know an
x
such thatF(x) = y
. - The verifier, who needs to be convinced that the prover is telling the truth.
- The prover, who claims to know an
The prover’s goal is to convince the verifier that they indeed know the correct x
. The simplest solution would be to reveal x
outright, but this has two major drawbacks:
- The verifier must recompute
F(x)
to verify the claim, which might be computationally expensive. - The prover is forced to disclose their private knowledge, which they might prefer to keep secret.
Fortunately, modern cryptographic proof systems offer solutions to these problems. These systems, essentially protocols, enable the prover to generate an artifact called a proof. The verifier can use this proof to validate the prover’s claim:
- Efficiency: Verification is often much faster than computing
F(x)
, addressing the first problem. - Zero-Knowledge: Certain protocols allow the prover to hide their private input entirely, ensuring that the verifier gains no knowledge about
x
beyond its existence. This is the essence of the ZK (zero-knowledge) property, which solves the second problem.
Clarification: Formally, this is generalized in terms of relations instead of functions. In this context, we define a relation R
, a public input y
, and a private input x
, such that R(x, y) = 0
.
TL;DR: PLONK tables have three types of columns: fixed, instance, and advice, each can be filled only if one has appropriate knowledge. For the sake of this document, we completetly ignore constraining cells in the table.
Many popular proof systems are based on the PLONK paper. The idea there is to represent the computation F(x) = y
in the form of a peculiar rectangular table. This table consists of three types of columns: fixed, instance, and advice. Why three? There is a good reason.
In the equation F(x) = y
, there are three symbols: F
, x
, and y
. A crucial aspect is understanding what is known to whom and when:
F
is known by both the verifier and the prover, and is common for any pair(x, y)
they might use during the protocol (since the procedure may need to be repeated for multiple input-result pairs).y
is known by both the verifier and the prover but is specific to a single protocol run.x
is known only by the prover and is also specific to a single protocol run.
This knowledge distinction explains why the table requires three types of columns:
- Values in fixed columns represent
F
. OnceF
is agreed upon, these columns can be filled by everyone, and their values remain independent of any specific(x, y)
pair. - Values in instance columns represent
y
. These columns can be filled by both the verifier and the prover once the protocol starts. - Values in advice columns represent
x
and the computation ofF(x)
. Only the prover knows how to fill these columns, making them akin to a private scratchpad for the prover.
For example, during proof preparation, the prover will fill the entire table, including the advice columns. Meanwhile, the verifier will only be able to fill the fixed and instance columns, treating the advice area as a blurred region.
Clarification: What is missing from this description is the actual heart of the PLONK protocol. The values in the fixed columns alone are far from sufficient to represent F
. In fact, the table is accompanied by a set of constraints between specific cells. For example, a constraint might state that the second cell in the first advice column must be equal to the sum of its left and right neighbors. However, for the purposes of this article, we won’t delve into this topic.
TL;DR: In non-interactive protocols, the prover computes the verifier's challenges while simulating the communication. Proofs are essentially transcripts of this simulation and contain a compressed content of the advice columns.
Let’s examine how the prove-verify protocol works. The procedure specifies the communication between the prover and the verifier:
- The prover sends an initial message.
- The verifier responds with a challenge.
- The prover sends another message, typically dependent on the verifier’s challenge.
- The verifier responds with another challenge.
- After a few such rounds, the verifier decides whether they are convinced of the prover's honesty and either accepts or rejects the proof.
The verifier’s confidence in the prover’s claim stems from these challenges. Challenges are typically random, ensuring the prover cannot precompute or forge satisfactory responses. This mechanism enforces that only honest provers can produce valid responses.
However, this multi-round communication can be inconvenient, particularly when a non-interactive protocol is desired—where the prover publishes a proof, and the verifier validates it offline without interaction. Thanks to the Fiat-Shamir transformation, this is achievable. Instead of generating random challenges, they are computed as a hash of all prior messages. Since hashes produce pseudo-random outputs, this approach works effectively.
Using this trick, the prover simulates the entire communication. At each step where they would expect a challenge from the verifier, they compute the hash of the communication so far and proceed. At the end of the simulation, they have a complete transcript of the communication, which serves as the proof. The verifier then follows the transcript to ensure the protocol was properly simulated.
To validate a transcript, the verifier must ensure it corresponds to the function F
. Embedding F
directly into the proof would make it unnecessarily large, as proofs are designed to be much smaller than the description of F
.
The solution is a preprocessing round, performed once for a given F
. This generates a verification key, which is a compact representation of F
's table—specifically, the content of the fixed columns. This compression is designed to work seamlessly with the proof structure, enabling the verifier to efficiently validate proofs.
Another artifact from this round is a proving key, which serves a similar purpose to the verification key but is used for proof generation.
Note: The verification key also includes the number and indices of other columns in the table, as well as the constraints between the table's cells.
Now, the key question: what does the prover include in the proof? Naturally, they must include the information they know that the verifier does not—specifically, the values in the advice columns.
Since the advice area of the table can be enormous, the prover compresses it, just as the verification key compresses the fixed columns. In later sections, we will explore how this compression is performed (an essential step for solving our puzzle!). For now, what matters is that the compression must be compatible with the verification key.
TL;DR: Skip to the Crucial observation section.
Up to this point, we have focused on theory. In our puzzle, however, we deal with a concrete instantiation of the equation F(x) = y
. Here:
x
is a pair(secret, nonce)
; our ultimate goal, as dishonest verifier, is to know the value ofsecret
y
is a pair(commitment, nullifier)
F
is a function:F((a,b)) = (hash(a), hash(a, b))
Let’s analyze the corresponding PLONK table. The first 290 lines of main.rs
file implement F
as a halo2
circuit. For those unfamiliar with the halo2
framework, the code may seem daunting, so we’ll focus on the most relevant aspects.
To determine the columns in our table, we examine the Circuit::configure
implementation. For example:
/// We have an advice column. Here, the prover will put `secret` and `nonce`
let witness = meta.advice_column();
// We have one instance column. Here the `commitment` and `nullifier` values will be put.
let instance = meta.instance_column();
<a batch of columns is created>
// We define a chip (subcircuit) for hashing `secret` using the columns, we have just created
let witness_hash_config = Pow5Chip::configure(...)
<another batch of columns is created>
// We define another chip (subcircuit) for hashing `secret` and `nonce` using the new batch of columns.
let hash_config = Pow5Chip::configure(...)
Once the table shape is defined, we can observe how the prover populates it with values in the Circuit::synthesize
function:
// Put `secret` and `nonce` to the witness advice column we have created previously.
let x = input_chip.load_witness("load secret", self.witness)?;
let nonce = input_chip.load_nonce("load nonce", self.nonce)?;
// Copy the cell containing `secret` to the hashing area and compute `hash(secret)`
let commitment_digest =
commitment_hasher.hash("commitment", [x.0.clone(), c.0])?;
Recall that we are given 64 proofs generated with the same secret but 64 different nonce values. Each proof uses:
- The same table shape (number and indices of columns).
- The same fixed column values.
Additionally, each proof produces the same commitment. You can confirm this by comparing commitments (e.g., using diff commitments/commitment_x.bin commitments/commitment_y.bin).
Thus, the part of the table used for commitment computation remains unchanged across proofs. This "constant" area always contains the secret value in a specific cell. The question is: where exactly?
Print the table contents. Modify this line to a non-zero value (e.g., 9) and add a log instruction to the proof generation code. Compare table contents across different nonces to identify unchanging columns.
One such column (index 1) consistently contains values like:
values: [
0x0000000000000000000000000000000000000000000000000000000000000000,
0x0000000000000000000000000000000000000000000000000000000000000000,
0x0000000000000000000000000000000000000000000000000000000000000009,
0x0000000000000000000000000000000000000000000000000000000000000009,
0x0000000000000000000000000000000000000000000000000000000000000009,
0x0c36e2f6b1a32bd253301b616a7283469f82bf56477a821f1a81e4bfe1679d80,
0x1be4aef7a23c21521e80bb5aac54967d5124bd292146710fad41ba927813893e,
0x17952565d8cff13b4adb4736dae95ac22774effaa821a7de9e331066128cdd43,
0x2824c23fcb9422eb32a20248bac1d385ef621bf511bbf236dba2b211767b45e9,
0x1d3cb3943c7d97c19b57881db6d719bc3d2e61b895fea4f3dcbeb0cdffe3e5b3,
0x156604dad626e7954001a21c0ec6fe29d275e98dcec8a56828b711ad0abd71f4,
0x0548f847f54298ae8ae28e26d41df70da661e66cdcae5535782341c66f4939f0,
...
And therefore, the interesting cell resides e.g., at index 3.
If we:
- spot
type FloorPlanner = SimpleFloorPlanner;
in the circuit configuration - examine instruction order in
Circuit::synthesize
method - either check the
Pow5Chip
implementation or use common sense
we will deduce that secret
will be written to the first advice column used by the chip, at the 3rd (absolute) row (since the first 3 rows are occupied by prior region allocations). While effective, this method is error-prone and requires deep framework knowledge.
Bruteforce all table cells. Since at least one cell always contains the secret
, you can test each cell by verifying if its hash matches commitment
. Although slightly inefficient, this method works universally.
TL;DR: This section is short enough to read it :).
We have learned that:
- Every time the prover prepares a proof, they always place their secret value in the advice column with index 1 at row with index 3.
- This column contains the same values every time.
- The proof must include a compression of the advice area.
Therefore, our task is to decompress the proof, restore the advice column, and extract the secret from it.
TL;DR: Evaluations of our polynomial on the verifier's challenge are at offset 640 in the proof.
In order to understand what is contained in the proof, we need to examine the massive, 800-line-long function create_proof. One of its arguments is transcript
, which starts out empty but gradually records the entire communication simulation. Once complete, this object becomes the final proof, and the proofs/
directory contains its raw byte encoding.
Recall that this transcript must include compressed information about the advice column. Therefore, we should analyze the code for anything related to advice
.
After scrolling through dozens of helper functions, the first relevant detail appears on line 397:
let (advice, challenges) = { ... }
After a few jumps and a closer look, we find that ultimately, the advice
variable is set as:
advice.advice_polys[*column_index] = advice_values;
Where advice_values
is essentially witness.advice_vec
. But what is witness.advice_vec
? Its elements' type is Polynomial<Assigned<F>, LagrangeCoeff>
.
If you are even slightly familiar with PLONK, this should already ring a bell. This is simply the values of an advice column, interpreted as coefficients of a polynomial over a Lagrange basis (this is the "L" in the PLONK acronym).
Next stop is at line 670:
// Simplified to:
let advice = advice.map(|poly| domain.lagrange_to_coeff(poly));
This is nothing more but applying Inverse Fast Fourier Transform to convert our polynomials from Lagrange representation to a standard coefficient form.
The last interesting part can be spotted at lin 738:
// Compute and hash advice evals for each circuit instance
for advice in advice.iter() {
// Evaluate polynomials at omega^i x
let advice_evals: Vec<_> = meta
.advice_queries
.map(|&(column, at)| {
eval_polynomial(
&advice.advice_polys[column.index()],
domain.rotate_omega(*x, at),
)
}
So we see that we will be evaluating our column polynomials again (eval_polynomial()
). But where?
This is a slightly more advanced topic, closely tied to the functionality of custom gates in halo2
. While it's beyond the scope of our main interest, here’s a very brief intuition to avoid leaving too many black boxes:
Normally, the verification process simulates going through the table row-by-row, ensuring that all constraints are satisfied. However, some constraints might require checking a cell outside the current row. In essence, an advice query is just a relative coordinate that the verifier checks during proof validation. This coordinate is obviously two-dimensional (because of the 2D table structure) and consists of the column column
and a relative row offset at
.
You might have noticed another unknown in the snippet above: x
. Tracing its origin, we find that:
let x: ChallengeX<_> = transcript.squeeze_challenge_scalar();
x
is some challenge from the verifier that is computed based on the transcript so far.
For every relevant advice query (column, at)
, we evaluate the polynomial corresponding to column
at the point domain.rotate_omega(*x, at)
, where domain = &pk.vk.domain
. Fortunately, we don’t need to delve deeper into the details of this domain.
The important question is: Which advice query is relevant to us? The most effective way to find out is by printing all advice queries. Upon doing this, we discover that one query (with index 1) takes the form (column = 1, at = 0)
. This query corresponds precisely to the cell we are interested in: the one at the current row in the advice column containing our desired secret.
The final step in this process involves appending all these evaluations to the transcript:
// Hash each advice column evaluation
for eval in advice_evals.iter() {
transcript.write_scalar(*eval)?;
}
Since the transcript we are using is a simple byte array, we can determine the exact offset in this loop where our evaluation will be stored. By printing this offset, we find that the evaluation corresponding to the query (column = 1, at = 0)
is written to bytes 640..672
.
We are working with polynomials, but we haven’t discussed their degrees yet. Since these polynomials originally encode columns of the table, their degrees are bounded by the table's height.
To determine the table height, we can examine the parameter used to set up the circuit and keys. In the main.rs file, we see: const K: u32 = 6;
. This indicates that the table has 26 = 64 rows. Consequently, the polynomials have a degree of at most 63.
An alternative way of learning this is, again, printing polynomials.
Let’s summarize how we can recover the secret
from the proofs:
- The values in the target column
C
are encoded as coefficients of a polynomialP
over the Lagrange basis. P
is converted to the standard basis (coefficient form).- A verifier’s challenge
x
is computed during proof generation. - For the advice query
(C, 0)
,P(x)
is evaluated at a point derived fromx
. - This evaluation (32 bytes) is saved into the proof transcript at offset 640.
Given 64 proofs:
- The polynomial
P
has degree 63. - We have 64 evaluations of
P
at different points (derived fromx
).
Using these evaluations, we can reconstruct P
via Lagrange interpolation. Once P
is recovered, we extract the secret
as follows:
- Convert
P
back to the Lagrange basis and read the third coefficient. - Alternatively, evaluate
P
atω^2
, whereω
is the domain's primitive root.
-
Extract Evaluations
- Read bytes 640–672 from all 64 proofs to gather
P(x_i)
evaluations.
- Read bytes 640–672 from all 64 proofs to gather
-
Extract Evaluation Points
- Modify the verifier code to log the challenge
x
values during verification.
Specifically, add logging wherex
is regenerated (verifier code line 180).
- Modify the verifier code to log the challenge
-
Reconstruct Polynomial
P
- Use
halo2_proofs::arithmetic::lagrange_interpolate
to interpolateP
from the evaluations and the corresponding points.
- Use
-
Recover the Secret
- Either:
- Extract the third Lagrange coefficient of
P
, or - Evaluate
P
atω^2
(withω
fromdomain.get_omega()
).
- Extract the third Lagrange coefficient of
- Either:
This process fully recovers the secret
embedded in the column. 🎉
One might wonder: how can we protect ourselves and ensure that the verifier cannot restore our secrets? The typical approach in PLONK is to reserve the last few rows of each column and populate them with random values. This is a crucial step in ensuring that even if parts of the computation remain the same for different instances, the overall contents of the column will vary. This randomness breaks the pattern and prevents the verifier from easily deducing any secret values from the proof.
The added randomization helps to obscure the structure of the proof and hides the relationship between the prover’s inputs and the proof’s contents. As a result, while the verifier can check the validity of the proof and confirm that it corresponds to the given function F
, they cannot extract any sensitive information, like the secret, without breaking the cryptographic guarantees provided by the system.