Skip to content

Instantly share code, notes, and snippets.

@pmikolajczyk41
Created December 1, 2024 14:34
Show Gist options
  • Save pmikolajczyk41/8c987e6c408971b07e22cdcaad31f1a2 to your computer and use it in GitHub Desktop.
Save pmikolajczyk41/8c987e6c408971b07e22cdcaad31f1a2 to your computer and use it in GitHub Desktop.

ZK Hack V: Zeitgeist write-up

In this document, we explore an approach to uncovering prover's secrets by analyzing the content of their proofs. This method leverages scenarios where the proof system fails or just doesn't bother to conceal information — specifically, when the proofs are not zero-knowledge.

Note: This article is not a formal technical paper. Instead, it aims to present complex and advanced concepts in a (hopefully) accessible way. While some simplifications will be made for clarity, I will strive to point out when the actual details are more intricate than described.

Shortcuts for Advanced Readers: Each section includes a quick TL;DR summary for those already familiar with the basics. Feel free to skip ahead if the summary indicates nothing new or significant for you.

Basics of (ZK-)Proving

TL;DR: We work with a statement F(x)=y, where F is a known function, x is the private input, and y is the publicly known result.

Let’s start by establishing a shared understanding of the paradigm. The basic scenario in the world of zero-knowledge (ZK) proofs involves the following components:

  • A known function F that represents some computation.
  • A known value y, which is the result of applying F to some input.
  • Two actors:
    • The prover, who claims to know an x such that F(x) = y.
    • The verifier, who needs to be convinced that the prover is telling the truth.

The prover’s goal is to convince the verifier that they indeed know the correct x. The simplest solution would be to reveal x outright, but this has two major drawbacks:

  1. The verifier must recompute F(x) to verify the claim, which might be computationally expensive.
  2. The prover is forced to disclose their private knowledge, which they might prefer to keep secret.

Fortunately, modern cryptographic proof systems offer solutions to these problems. These systems, essentially protocols, enable the prover to generate an artifact called a proof. The verifier can use this proof to validate the prover’s claim:

  • Efficiency: Verification is often much faster than computing F(x), addressing the first problem.
  • Zero-Knowledge: Certain protocols allow the prover to hide their private input entirely, ensuring that the verifier gains no knowledge about x beyond its existence. This is the essence of the ZK (zero-knowledge) property, which solves the second problem.

Clarification: Formally, this is generalized in terms of relations instead of functions. In this context, we define a relation R, a public input y, and a private input x, such that R(x, y) = 0.

PLONK Table

TL;DR: PLONK tables have three types of columns: fixed, instance, and advice, each can be filled only if one has appropriate knowledge. For the sake of this document, we completetly ignore constraining cells in the table.

Many popular proof systems are based on the PLONK paper. The idea there is to represent the computation F(x) = y in the form of a peculiar rectangular table. This table consists of three types of columns: fixed, instance, and advice. Why three? There is a good reason.

Three Levels of Knowledge

In the equation F(x) = y, there are three symbols: F, x, and y. A crucial aspect is understanding what is known to whom and when:

  1. F is known by both the verifier and the prover, and is common for any pair (x, y) they might use during the protocol (since the procedure may need to be repeated for multiple input-result pairs).
  2. y is known by both the verifier and the prover but is specific to a single protocol run.
  3. x is known only by the prover and is also specific to a single protocol run.

This knowledge distinction explains why the table requires three types of columns:

  1. Values in fixed columns represent F. Once F is agreed upon, these columns can be filled by everyone, and their values remain independent of any specific (x, y) pair.
  2. Values in instance columns represent y. These columns can be filled by both the verifier and the prover once the protocol starts.
  3. Values in advice columns represent x and the computation of F(x). Only the prover knows how to fill these columns, making them akin to a private scratchpad for the prover.

For example, during proof preparation, the prover will fill the entire table, including the advice columns. Meanwhile, the verifier will only be able to fill the fixed and instance columns, treating the advice area as a blurred region.

Clarification: What is missing from this description is the actual heart of the PLONK protocol. The values in the fixed columns alone are far from sufficient to represent F. In fact, the table is accompanied by a set of constraints between specific cells. For example, a constraint might state that the second cell in the first advice column must be equal to the sum of its left and right neighbors. However, for the purposes of this article, we won’t delve into this topic.

Proofs and Challenges

TL;DR: In non-interactive protocols, the prover computes the verifier's challenges while simulating the communication. Proofs are essentially transcripts of this simulation and contain a compressed content of the advice columns.

Protocol

Let’s examine how the prove-verify protocol works. The procedure specifies the communication between the prover and the verifier:

  1. The prover sends an initial message.
  2. The verifier responds with a challenge.
  3. The prover sends another message, typically dependent on the verifier’s challenge.
  4. The verifier responds with another challenge.
  5. After a few such rounds, the verifier decides whether they are convinced of the prover's honesty and either accepts or rejects the proof.

The verifier’s confidence in the prover’s claim stems from these challenges. Challenges are typically random, ensuring the prover cannot precompute or forge satisfactory responses. This mechanism enforces that only honest provers can produce valid responses.

However, this multi-round communication can be inconvenient, particularly when a non-interactive protocol is desired—where the prover publishes a proof, and the verifier validates it offline without interaction. Thanks to the Fiat-Shamir transformation, this is achievable. Instead of generating random challenges, they are computed as a hash of all prior messages. Since hashes produce pseudo-random outputs, this approach works effectively.

Using this trick, the prover simulates the entire communication. At each step where they would expect a challenge from the verifier, they compute the hash of the communication so far and proceed. At the end of the simulation, they have a complete transcript of the communication, which serves as the proof. The verifier then follows the transcript to ensure the protocol was properly simulated.

Verification Key

To validate a transcript, the verifier must ensure it corresponds to the function F. Embedding F directly into the proof would make it unnecessarily large, as proofs are designed to be much smaller than the description of F.

The solution is a preprocessing round, performed once for a given F. This generates a verification key, which is a compact representation of F's table—specifically, the content of the fixed columns. This compression is designed to work seamlessly with the proof structure, enabling the verifier to efficiently validate proofs.

Another artifact from this round is a proving key, which serves a similar purpose to the verification key but is used for proof generation.

Note: The verification key also includes the number and indices of other columns in the table, as well as the constraints between the table's cells.

Content of the Proof

Now, the key question: what does the prover include in the proof? Naturally, they must include the information they know that the verifier does not—specifically, the values in the advice columns.

Since the advice area of the table can be enormous, the prover compresses it, just as the verification key compresses the fixed columns. In later sections, we will explore how this compression is performed (an essential step for solving our puzzle!). For now, what matters is that the compression must be compatible with the verification key.

Case study

TL;DR: Skip to the Crucial observation section.

Up to this point, we have focused on theory. In our puzzle, however, we deal with a concrete instantiation of the equation F(x) = y. Here:

  • x is a pair (secret, nonce); our ultimate goal, as dishonest verifier, is to know the value of secret
  • y is a pair (commitment, nullifier)
  • F is a function: F((a,b)) = (hash(a), hash(a, b))

Let’s analyze the corresponding PLONK table. The first 290 lines of main.rs file implement F as a halo2 circuit. For those unfamiliar with the halo2 framework, the code may seem daunting, so we’ll focus on the most relevant aspects.

Creating columns

To determine the columns in our table, we examine the Circuit::configure implementation. For example:

/// We have an advice column. Here, the prover will put `secret` and `nonce`
let witness = meta.advice_column();

// We have one instance column. Here the `commitment` and `nullifier` values will be put.
let instance = meta.instance_column();

<a batch of columns is created>
// We define a chip (subcircuit) for hashing `secret` using the columns, we have just created
let witness_hash_config = Pow5Chip::configure(...)

<another batch of columns is created>
// We define another chip (subcircuit) for hashing `secret` and `nonce` using the new batch of columns.
let hash_config = Pow5Chip::configure(...)

Filling columns

Once the table shape is defined, we can observe how the prover populates it with values in the Circuit::synthesize function:

// Put `secret` and `nonce` to the witness advice column we have created previously.
let x = input_chip.load_witness("load secret", self.witness)?;
let nonce = input_chip.load_nonce("load nonce", self.nonce)?;

// Copy the cell containing `secret` to the hashing area and compute `hash(secret)`
let commitment_digest =
    commitment_hasher.hash("commitment", [x.0.clone(), c.0])?;

Crucial observation

Recall that we are given 64 proofs generated with the same secret but 64 different nonce values. Each proof uses:

  • The same table shape (number and indices of columns).
  • The same fixed column values.

Additionally, each proof produces the same commitment. You can confirm this by comparing commitments (e.g., using diff commitments/commitment_x.bin commitments/commitment_y.bin).

Thus, the part of the table used for commitment computation remains unchanged across proofs. This "constant" area always contains the secret value in a specific cell. The question is: where exactly?

Localizing the cell with secret

The good way (recommended)

Print the table contents. Modify this line to a non-zero value (e.g., 9) and add a log instruction to the proof generation code. Compare table contents across different nonces to identify unchanging columns.

One such column (index 1) consistently contains values like:

values: [
    0x0000000000000000000000000000000000000000000000000000000000000000,
    0x0000000000000000000000000000000000000000000000000000000000000000,
    0x0000000000000000000000000000000000000000000000000000000000000009,
    0x0000000000000000000000000000000000000000000000000000000000000009,
    0x0000000000000000000000000000000000000000000000000000000000000009,
    0x0c36e2f6b1a32bd253301b616a7283469f82bf56477a821f1a81e4bfe1679d80,
    0x1be4aef7a23c21521e80bb5aac54967d5124bd292146710fad41ba927813893e,
    0x17952565d8cff13b4adb4736dae95ac22774effaa821a7de9e331066128cdd43,
    0x2824c23fcb9422eb32a20248bac1d385ef621bf511bbf236dba2b211767b45e9,
    0x1d3cb3943c7d97c19b57881db6d719bc3d2e61b895fea4f3dcbeb0cdffe3e5b3,
    0x156604dad626e7954001a21c0ec6fe29d275e98dcec8a56828b711ad0abd71f4,
    0x0548f847f54298ae8ae28e26d41df70da661e66cdcae5535782341c66f4939f0,
    ...

And therefore, the interesting cell resides e.g., at index 3.

The bad way (for halo2 natives)

If we:

  • spot type FloorPlanner = SimpleFloorPlanner; in the circuit configuration
  • examine instruction order in Circuit::synthesize method
  • either check the Pow5Chip implementation or use common sense

we will deduce that secret will be written to the first advice column used by the chip, at the 3rd (absolute) row (since the first 3 rows are occupied by prior region allocations). While effective, this method is error-prone and requires deep framework knowledge.

The ugly way (for the lazy)

Bruteforce all table cells. Since at least one cell always contains the secret, you can test each cell by verifying if its hash matches commitment. Although slightly inefficient, this method works universally.

Attack idea

TL;DR: This section is short enough to read it :).

We have learned that:

  1. Every time the prover prepares a proof, they always place their secret value in the advice column with index 1 at row with index 3.
  2. This column contains the same values every time.
  3. The proof must include a compression of the advice area.

Therefore, our task is to decompress the proof, restore the advice column, and extract the secret from it.

Analyzing Proof

TL;DR: Evaluations of our polynomial on the verifier's challenge are at offset 640 in the proof.

In order to understand what is contained in the proof, we need to examine the massive, 800-line-long function create_proof. One of its arguments is transcript, which starts out empty but gradually records the entire communication simulation. Once complete, this object becomes the final proof, and the proofs/ directory contains its raw byte encoding.

Recall that this transcript must include compressed information about the advice column. Therefore, we should analyze the code for anything related to advice.

Column Encoding

After scrolling through dozens of helper functions, the first relevant detail appears on line 397:

let (advice, challenges) = { ... }

After a few jumps and a closer look, we find that ultimately, the advice variable is set as:

advice.advice_polys[*column_index] = advice_values;

Where advice_values is essentially witness.advice_vec. But what is witness.advice_vec? Its elements' type is Polynomial<Assigned<F>, LagrangeCoeff>.

If you are even slightly familiar with PLONK, this should already ring a bell. This is simply the values of an advice column, interpreted as coefficients of a polynomial over a Lagrange basis (this is the "L" in the PLONK acronym).

Standard coefficient form

Next stop is at line 670:

// Simplified to:
let advice = advice.map(|poly| domain.lagrange_to_coeff(poly));

This is nothing more but applying Inverse Fast Fourier Transform to convert our polynomials from Lagrange representation to a standard coefficient form.

Evaluation

The last interesting part can be spotted at lin 738:

// Compute and hash advice evals for each circuit instance
for advice in advice.iter() {
    // Evaluate polynomials at omega^i x
    let advice_evals: Vec<_> = meta
        .advice_queries
        .map(|&(column, at)| {
            eval_polynomial(
                &advice.advice_polys[column.index()],
                domain.rotate_omega(*x, at),
            )
}

So we see that we will be evaluating our column polynomials again (eval_polynomial()). But where?

Advice Queries

This is a slightly more advanced topic, closely tied to the functionality of custom gates in halo2. While it's beyond the scope of our main interest, here’s a very brief intuition to avoid leaving too many black boxes:

Normally, the verification process simulates going through the table row-by-row, ensuring that all constraints are satisfied. However, some constraints might require checking a cell outside the current row. In essence, an advice query is just a relative coordinate that the verifier checks during proof validation. This coordinate is obviously two-dimensional (because of the 2D table structure) and consists of the column column and a relative row offset at.

x? What is x?

You might have noticed another unknown in the snippet above: x. Tracing its origin, we find that:

let x: ChallengeX<_> = transcript.squeeze_challenge_scalar();

x is some challenge from the verifier that is computed based on the transcript so far.

Evaluation for advice queries

For every relevant advice query (column, at), we evaluate the polynomial corresponding to column at the point domain.rotate_omega(*x, at), where domain = &pk.vk.domain. Fortunately, we don’t need to delve deeper into the details of this domain.

The important question is: Which advice query is relevant to us? The most effective way to find out is by printing all advice queries. Upon doing this, we discover that one query (with index 1) takes the form (column = 1, at = 0). This query corresponds precisely to the cell we are interested in: the one at the current row in the advice column containing our desired secret.

Saving to Transcript

The final step in this process involves appending all these evaluations to the transcript:

// Hash each advice column evaluation
for eval in advice_evals.iter() {
    transcript.write_scalar(*eval)?;
}

Since the transcript we are using is a simple byte array, we can determine the exact offset in this loop where our evaluation will be stored. By printing this offset, we find that the evaluation corresponding to the query (column = 1, at = 0) is written to bytes 640..672.

Degrees

We are working with polynomials, but we haven’t discussed their degrees yet. Since these polynomials originally encode columns of the table, their degrees are bounded by the table's height.

To determine the table height, we can examine the parameter used to set up the circuit and keys. In the main.rs file, we see: const K: u32 = 6;. This indicates that the table has 26 = 64 rows. Consequently, the polynomials have a degree of at most 63.

An alternative way of learning this is, again, printing polynomials.

Final Attack

Let’s summarize how we can recover the secret from the proofs:

Recap

  1. The values in the target column C are encoded as coefficients of a polynomial P over the Lagrange basis.
  2. P is converted to the standard basis (coefficient form).
  3. A verifier’s challenge x is computed during proof generation.
  4. For the advice query (C, 0), P(x) is evaluated at a point derived from x.
  5. This evaluation (32 bytes) is saved into the proof transcript at offset 640.

Exploiting the Proofs

Given 64 proofs:

  • The polynomial P has degree 63.
  • We have 64 evaluations of P at different points (derived from x).

Using these evaluations, we can reconstruct P via Lagrange interpolation. Once P is recovered, we extract the secret as follows:

  • Convert P back to the Lagrange basis and read the third coefficient.
  • Alternatively, evaluate P at ω^2, where ω is the domain's primitive root.

Step-by-Step Attack

  1. Extract Evaluations

    • Read bytes 640–672 from all 64 proofs to gather P(x_i) evaluations.
  2. Extract Evaluation Points

    • Modify the verifier code to log the challenge x values during verification.
      Specifically, add logging where x is regenerated (verifier code line 180).
  3. Reconstruct Polynomial P

    • Use halo2_proofs::arithmetic::lagrange_interpolate to interpolate P from the evaluations and the corresponding points.
  4. Recover the Secret

    • Either:
      • Extract the third Lagrange coefficient of P, or
      • Evaluate P at ω^2 (with ω from domain.get_omega()).

Result

This process fully recovers the secret embedded in the column. 🎉

Are we in danger?

One might wonder: how can we protect ourselves and ensure that the verifier cannot restore our secrets? The typical approach in PLONK is to reserve the last few rows of each column and populate them with random values. This is a crucial step in ensuring that even if parts of the computation remain the same for different instances, the overall contents of the column will vary. This randomness breaks the pattern and prevents the verifier from easily deducing any secret values from the proof.

The added randomization helps to obscure the structure of the proof and hides the relationship between the prover’s inputs and the proof’s contents. As a result, while the verifier can check the validity of the proof and confirm that it corresponds to the given function F, they cannot extract any sensitive information, like the secret, without breaking the cryptographic guarantees provided by the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment