roynalnaruto/writeup.md Secret

## writeup.md

      
    Raw
  

              writeup.md
            
          
    Solution - Let's Hash it Out

Last week I attempted to solve the Puzzle 1 of the ZK Hack event, and I hope to put together a write-up decribing the puzzle, my learnings as I attempted to solve it and of course the solution.
A big thank you to Kobi Gurkan for designing such an interesting puzzle.
The Puzzle

The puzzle was shared as a repository on GitHub. Running the binary prompts with the following message:
Alice designed an authentication system in which users gain access by presenting it a signature on a username, which Alice provided.
One day, Alice discovered 256 of these signatures were leaked publicly, but the secret key wasn't. Phew.
The next day, she found out someone accessed her system with a username she doesn't know! This shouldn't be possible due to existential unforgeability, as she never signed such a message.

Can you find out how it happend and produce a signature on your username?

The dataset has the Alice's public key, the above mentioned 256 messages and the corresponding 256 BLS signatures on them (signed by Alice's secret key).
The binary uses the above dataset and asserts that each of those signatures are in fact signed by a secret key that's associated with the provided public key.
BLS Signature

I had in the past read this explainer by Ben Edgington and I believe its written masterfully, for cryptography beginners/newbies like me to get a really good introduction to BLS signatures.
To summarise:

Each one of Alice's messages is first hashed to a point on the BLS12-381 elliptic curve
$$
m \rightarrow H(m)
$$
This point is added to itself $sk$ times, where $sk$ is Alice's secret key, to construct the BLS signature
$$
S = [sk]H(m)
$$
Now seems like a good time to check how the message is hashed to a point on the curve.

Pedersen Hash (hash-to-curve)

Wild Attempts

I didn't really get a good picture of the Pedersen Hash on reading the supporting link provided along with the puzzle. So I decided to dig deeper.
I first tried searching for vulnerabilities/security concerns while implementing the Pedersen Hash. And while skimming through the Zcash Sapling Specs I noticed an interesting  point in section 5.4.1.7:

These hash functions are not collision-resistant for variable length inputs

Temporary Joy

I immediately jumped to the arkworks-rs source code for the Pedersen Collision-Resistant Hash (CRH). Notice here how every message's bytes are padded with the 0 byte for any length less than the NUM_WINDOWS * WINDOW_SIZE bytes.
I thought I've figured out the solution. Well, I still remembered having seen every message (in the puzzle's data) being of the same length (roughly the same length as a 32-bytes hex encoded string). So I looked for a message with the last 2 characters being 00, meaning the last byte would be a 0 byte. For such a message, removing that byte would yield the same hash-to-curve point. But it was only when I opened the hash.rs file in the puzzle repository that my joy was cut short. Every message was being hashed to 256-bits before being supplied as input to the Pedersen Hash evaluation. Sigh... Kobi Gurkan wasn't gonna set that as the challenge, right?
Re-attempt

I re-started studying the ark-crypto-primitives source code, and put the equation down on paper.


Instantiating the Pedersen Hash Generators (source)
At this point in time I had also checked that the Pedersen parameters were configured by setting WINDOW_SIZE to 1 and NUM_WINDOWS to 256. Makes sense, the product is 256 which is also the number of bits in the input message (to be precise, BLAKE2S(raw_message)). So in our case, this loop runs only once.
Say the base points generated at random are termed $G1$, $G2$, $G3$, ... and so on, we have:
$$
Generators = [[G_{1}], [G_{2}], [G_{3}], ..., [G_{256}]]
$$


Evaluating the Hash-to-curve (source)
For a 256-bit input $m$ with bits ${b_{1}, b_{2}, ..., b_{256}}$, we have:
$$
H(m) = \sum_{i=1}^{i=256} G_{i} \hspace{1em} \forall b_{i} \in {1}
$$
This means, for inputs $m_{1}, m_{2}, ..., m_{256}$, we have:
$$
H_{1} = [b_{1,1}]G_{1} + [b_{1,2}]G_{2} + ... + [b_{1,256}]G_{n}
$$
$$
H_{2} = [b_{2,1}]G_{1} + [b_{2,2}]G_{2} + ... + [b_{2,256}]G_{n}
$$
so on... and
$$
H_{256} = [b_{256,1}]G_{1} + [b_{256,2}]G_{2} + ... + [b_{256,256}]G_{n}
$$


Realization

One can notice the beautiful symmetry in the above equations. Since the secret key $sk$ is multiplied to every $H_{i}$ this also means that for an arbitrary input message $m_{arb}$ we have, if $AX = C$ then $S_{arb} = [x_{1}]S_{1} + [x_{2}]S_{2} + ... + [x_{256}]S_{256}$, where:
$$
A = \begin{bmatrix}
b_{1,1} & b_{2,1} ... & b_{256,1} \
b_{1,2} & b_{2,2} ... & b_{256,2} \
...                             \
b_{1,256} & b_{2,256} ... & b_{256,256}
\end{bmatrix}
$$
$$
X = \begin{bmatrix}
x_{1} \
x_{2} \
...   \
x_{256}
\end{bmatrix}
$$
and $C$ represents the bits of the arbitrary message:
$$
C = \begin{bmatrix}
b_{arb,1} \
b_{arb,2} \
...       \
b_{arb,256}
\end{bmatrix}
$$
In short, if the arbitrary message can be represented as a linear combination of the 256 messages (effectively its hash-to-curve $H(m_{arb})$ being represented as a linear combination of the 256 $H_{i}$'s), then we can translate the same linear combination to its 256 signatures $S_{i}$ to derive the signature $S_{arb}$ for the arbitrary message.
Attempts with Rust and Python

First I tried solving the above equation in Rust using the nalgebra crate. I got the code compiling but the results weren't correct. That's because I was using the f64 64-bit floating type that couldn't give me the precision needed to solve the equation.
Next, I moved on to using numpy in Python, but well, the same is true there (64-bit floating type).
I had already spent a few hours trying to compile the Rust code, get it to work, only to realise the lack of precision, then moving on to Python, which I had not written in a long long time. I was just about to call it a day when I messaged Kobi that I think I know how to solve it, but I can't get the required precision to arrive at the solution.
Kobi replied: "Sage :)"
Enter Sage

I had never used SageMath but the fact that university servers were mirrors for downloading it, meant that SageMath was used for serious stuff!
A little duck-duck-going helped me find this. Its pretty cool that one can save:
a=matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

to matrix.sage and use the load('./matrix.sage') command in Sage.
At this point I only had to:

Dump the 256 message inputs to a matrix.sage file in the $256x256$ matrix format. Since the dump format swapped rows and columns, note that we need to transpose the matrix before operating on it.
Dump the arbitrary input (I used "1234") to a vector.sage file in the $256x1$ matrix format.
Load those in sage.

Sage Commands

sage: load('matrix.sage')
sage: load('vector.sage')
sage: A = A.transpose()
sage: Ainv = A.inverse()
sage: X = Ainv * C

X.str() at this point is in fact the appropriate linear combination $X$ that we were hoping to solve and find.
Putting it altogether

Sage outputs the elements as rational fields, which also means that negative elements are printed with a - sign. But we wish to bring them within the Fr field, which is BLS12-381 curve's scalar field.
I wrote a tiny Python scipt to read every negative number and replace it with its modulus with the modulus of the Fr field. I am not sure if there was an easier way to achieve this (may be in Sage or Rust):
import os

modulus = 52435875175126190479447740508185965837690552500527637822603658699938581184513

x = open("./x", "r")
lines = x.readlines()
numerators = []

for line in lines:
    line = line.lstrip()
    if line.startswith("-"):
        numerators.append(int(line) % modulus)
    else:
        numerators.append(int(line))
x.close()

And I loaded these into a test function in Rust to calculate $S_{arb}$:
#[test]
fn verify_sig_construction() {
    let f = std::fs::File::open("./x").unwrap();
    let reader = BufReader::new(f);
    
    // these are the rational fractions represented as
    // the scalar field of bls12-381
    let mut xs = Vec::with_capacity(256);
    
    for line in reader.lines() {
        // clean up whitespaces and prepare to parse
        let line = line.unwrap();
        let line = line
            .trim_start_matches("[")
            .trim_end_matches("]")
            .trim();
        
        // read the `numerator/denominator` format
        let num_den = line
            .split("/")
            .collect::<Vec<&str>>();
        assert!(num_den.len() == 2);
        
        // parse them to Fr and populate the multipliers
        let n = Fr::from_str(num_den[0]).unwrap();
        let d = Fr::from_str(num_den[1]).unwrap();
        xs.push(n / d);
    }

    // get the puzzle data
    let (pk, _msgs, sigs) = puzzle_data();
    
    // initialise a forgery signature
    let mut forgery = G1Projective::zero();
    
    // compute summation of [x_i]S_i
    for (sig, x) in sigs.iter().zip(xs.iter().cloned()) {
        let mut si = sig.into_projective();
        si.mul_assign(x);
        forgery.add_assign(&si);
    }
    
    // The arbitrary message I had used was 1234
    let arb_msg = hex::decode("1234").unwrap();
    
    // validate that the forgery does in fact verify
    verify(pk, &arb_msg, forgery.into_affine());
    
    // print the hex-encoded signature
    let mut hexsig = vec![0u8; 48];
    G1Affine::serialize(
        &forgery.into_affine(),
        hexsig.as_mut_slice(),
    ).unwrap();
    println!("forgery = {:?}", hexsig.encode_hex::<String>());
}

How to fix this

In the explainer by Ben Edgington, the first (and the simplest) approach to hash a message to the curve is "Hash-and-check", which could potentially be implemented as:


$x \leftarrow BLAKE2S(m) \mod q$, where $m$ is the message and $q$ is the field modulus

$x \leftarrow x + 1 \mod q$, if $x$ does not satisfy the curve equation (there is no point on the curve with $x$ as the x-coordinate)
If $x$ is the x-coordinate of a point on the curve, return $(x, y)$


So we rely on the collision resistance of the hash function, while the avalanche effect means a slight change in the input $m$ would produce a completely uncorrelated $x$.
The same is not true for a Pedersen Hash. Although it is collision resistant for the same input size, past public data (messages and signatures) allow reconstruction of new valid signatures on arbitrary inputs.

Acknowledgements: Many thanks to Kobi Gurkan for the sage hint and insightful conversations