Last week I attempted to solve the Puzzle 1 of the ZK Hack event, and I hope to put together a write-up decribing the puzzle, my learnings as I attempted to solve it and of course the solution. A big thank you to Kobi Gurkan for designing such an interesting puzzle.
The puzzle was shared as a repository on GitHub. Running the binary prompts with the following message:
Alice designed an authentication system in which users gain access by presenting it a signature on a username, which Alice provided.
One day, Alice discovered 256 of these signatures were leaked publicly, but the secret key wasn't. Phew.
The next day, she found out someone accessed her system with a username she doesn't know! This shouldn't be possible due to existential unforgeability, as she never signed such a message.
Can you find out how it happend and produce a signature on your username?
The dataset has the Alice's public key, the above mentioned 256 messages and the corresponding 256 BLS signatures on them (signed by Alice's secret key).
The binary uses the above dataset and asserts that each of those signatures are in fact signed by a secret key that's associated with the provided public key.
I had in the past read this explainer by Ben Edgington and I believe its written masterfully, for cryptography beginners/newbies like me to get a really good introduction to BLS signatures. To summarise:
- Each one of Alice's messages is first hashed to a point on the BLS12-381 elliptic curve $$ m \rightarrow H(m) $$
- This point is added to itself
$sk$ times, where$sk$ is Alice's secret key, to construct the BLS signature $$ S = [sk]H(m) $$ Now seems like a good time to check how the message is hashed to a point on the curve.
I didn't really get a good picture of the Pedersen Hash on reading the supporting link provided along with the puzzle. So I decided to dig deeper. I first tried searching for vulnerabilities/security concerns while implementing the Pedersen Hash. And while skimming through the Zcash Sapling Specs I noticed an interesting point in section 5.4.1.7:
- These hash functions are not collision-resistant for variable length inputs
I immediately jumped to the arkworks-rs source code for the Pedersen Collision-Resistant Hash (CRH). Notice here how every message's bytes are padded with the 0 byte for any length less than the NUM_WINDOWS * WINDOW_SIZE
bytes.
I thought I've figured out the solution. Well, I still remembered having seen every message (in the puzzle's data) being of the same length (roughly the same length as a 32-bytes hex encoded string). So I looked for a message with the last 2 characters being 00
, meaning the last byte would be a 0
byte. For such a message, removing that byte would yield the same hash-to-curve point. But it was only when I opened the hash.rs file in the puzzle repository that my joy was cut short. Every message was being hashed to 256-bits before being supplied as input to the Pedersen Hash evaluation. Sigh... Kobi Gurkan wasn't gonna set that as the challenge, right?
I re-started studying the ark-crypto-primitives
source code, and put the equation down on paper.
-
Instantiating the Pedersen Hash Generators (source) At this point in time I had also checked that the Pedersen parameters were configured by setting
WINDOW_SIZE
to1
andNUM_WINDOWS
to256
. Makes sense, the product is256
which is also the number of bits in the input message (to be precise,BLAKE2S(raw_message)
). So in our case, this loop runs only once. Say thebase
points generated at random are termed$G1$ ,$G2$ ,$G3$ , ... and so on, we have: $$ Generators = [[G_{1}], [G_{2}], [G_{3}], ..., [G_{256}]] $$ -
Evaluating the Hash-to-curve (source) For a 256-bit input
$m$ with bits${b_{1}, b_{2}, ..., b_{256}}$ , we have: $$ H(m) = \sum_{i=1}^{i=256} G_{i} \hspace{1em} \forall b_{i} \in {1} $$ This means, for inputs$m_{1}, m_{2}, ..., m_{256}$ , we have: $$ H_{1} = [b_{1,1}]G_{1} + [b_{1,2}]G_{2} + ... + [b_{1,256}]G_{n} $$ $$ H_{2} = [b_{2,1}]G_{1} + [b_{2,2}]G_{2} + ... + [b_{2,256}]G_{n} $$ so on... and $$ H_{256} = [b_{256,1}]G_{1} + [b_{256,2}]G_{2} + ... + [b_{256,256}]G_{n} $$
One can notice the beautiful symmetry in the above equations. Since the secret key
First I tried solving the above equation in Rust using the nalgebra
crate. I got the code compiling but the results weren't correct. That's because I was using the f64
64-bit floating type that couldn't give me the precision needed to solve the equation.
Next, I moved on to using numpy
in Python, but well, the same is true there (64-bit floating type).
I had already spent a few hours trying to compile the Rust code, get it to work, only to realise the lack of precision, then moving on to Python, which I had not written in a long long time. I was just about to call it a day when I messaged Kobi that I think I know how to solve it, but I can't get the required precision to arrive at the solution.
Kobi replied: "Sage :)"
I had never used SageMath but the fact that university servers were mirrors for downloading it, meant that SageMath was used for serious stuff!
A little duck-duck-going helped me find this. Its pretty cool that one can save:
a=matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
to matrix.sage
and use the load('./matrix.sage')
command in Sage.
At this point I only had to:
- Dump the 256 message inputs to a
matrix.sage
file in the$256x256$ matrix format. Since the dump format swapped rows and columns, note that we need to transpose the matrix before operating on it. - Dump the arbitrary input (I used
"1234"
) to avector.sage
file in the$256x1$ matrix format. - Load those in sage.
sage: load('matrix.sage')
sage: load('vector.sage')
sage: A = A.transpose()
sage: Ainv = A.inverse()
sage: X = Ainv * C
X.str()
at this point is in fact the appropriate linear combination
Sage
outputs the elements as rational fields, which also means that negative elements are printed with a -
sign. But we wish to bring them within the Fr
field, which is BLS12-381 curve's scalar field.
I wrote a tiny Python scipt to read every negative number and replace it with its modulus with the modulus of the Fr
field. I am not sure if there was an easier way to achieve this (may be in Sage or Rust):
import os
modulus = 52435875175126190479447740508185965837690552500527637822603658699938581184513
x = open("./x", "r")
lines = x.readlines()
numerators = []
for line in lines:
line = line.lstrip()
if line.startswith("-"):
numerators.append(int(line) % modulus)
else:
numerators.append(int(line))
x.close()
And I loaded these into a test function in Rust to calculate
#[test]
fn verify_sig_construction() {
let f = std::fs::File::open("./x").unwrap();
let reader = BufReader::new(f);
// these are the rational fractions represented as
// the scalar field of bls12-381
let mut xs = Vec::with_capacity(256);
for line in reader.lines() {
// clean up whitespaces and prepare to parse
let line = line.unwrap();
let line = line
.trim_start_matches("[")
.trim_end_matches("]")
.trim();
// read the `numerator/denominator` format
let num_den = line
.split("/")
.collect::<Vec<&str>>();
assert!(num_den.len() == 2);
// parse them to Fr and populate the multipliers
let n = Fr::from_str(num_den[0]).unwrap();
let d = Fr::from_str(num_den[1]).unwrap();
xs.push(n / d);
}
// get the puzzle data
let (pk, _msgs, sigs) = puzzle_data();
// initialise a forgery signature
let mut forgery = G1Projective::zero();
// compute summation of [x_i]S_i
for (sig, x) in sigs.iter().zip(xs.iter().cloned()) {
let mut si = sig.into_projective();
si.mul_assign(x);
forgery.add_assign(&si);
}
// The arbitrary message I had used was 1234
let arb_msg = hex::decode("1234").unwrap();
// validate that the forgery does in fact verify
verify(pk, &arb_msg, forgery.into_affine());
// print the hex-encoded signature
let mut hexsig = vec![0u8; 48];
G1Affine::serialize(
&forgery.into_affine(),
hexsig.as_mut_slice(),
).unwrap();
println!("forgery = {:?}", hexsig.encode_hex::<String>());
}
In the explainer by Ben Edgington, the first (and the simplest) approach to hash a message to the curve is "Hash-and-check", which could potentially be implemented as:
-
$x \leftarrow BLAKE2S(m) \mod q$ , where$m$ is the message and$q$ is the field modulus -
$x \leftarrow x + 1 \mod q$ , if$x$ does not satisfy the curve equation (there is no point on the curve with$x$ as the x-coordinate) - If
$x$ is the x-coordinate of a point on the curve, return$(x, y)$
So we rely on the collision resistance of the hash function, while the avalanche effect means a slight change in the input
The same is not true for a Pedersen Hash
. Although it is collision resistant for the same input size, past public data (messages and signatures) allow reconstruction of new valid signatures on arbitrary inputs.
Acknowledgements: Many thanks to Kobi Gurkan for the sage hint and insightful conversations