niooss-ledger/writeup.md

## writeup.md

      
    Raw
  

              writeup.md
            
          
    Write-up for zkHack puzzle #1: zkhack-bls-pedersen


Author: Nicolas IOOSS
Date: 2021-10-27
Puzzle: https://www.zkhack.dev/puzzle1.html
Repository with commit hash: https://github.com/niooss-ledger/zkhack2021-bls-pedersen/tree/288a1d5945d751603510b75fecf522f109670b77

Puzzle

The puzzle consists in a Rust project:
$ git clone https://github.com/kobigurk/zkhack-bls-pedersen
$ cd zkhack-bls-pedersen
$ cargo run --release
    Updating crates.io index
...
   Compiling bls-pedersen v0.1.0 (/zkhack-bls-pedersen)
    Finished release [optimized] target(s) in 55.75s
     Running `target/release/verify-bls-pedersen`

    ______ _   __  _   _            _
    |___  /| | / / | | | |          | |
       / / | |/ /  | |_| | __ _  ___| | __
      / /  |    \  |  _  |/ _` |/ __| |/ /
    ./ /___| |\  \ | | | | (_| | (__|   <
    \_____/\_| \_/ \_| |_/\__,_|\___|_|\_\

Alice designed an authentication system in which users gain access by presenting
it a signature on a username, which Alice provided.
One day, Alice discovered 256 of these signatures were leaked publicly, but the
secret key wasn't. Phew.
The next day, she found out someone accessed her system with a username she
doesn't know! This shouldn't be possible due to existential unforgeability, as
she never signed such a message.

Can you find out how it happend and produce a signature on your username?
This project contains 256 messages and their signatures in src/data.rs:
    let ms_strs = [
        "f2faa8b1bb0f06c6142e788ad836d1f7d1abf95458a08a55593c594056ac225d",
        "518385d7e176c291bab8f797025de4b26a2b86c8e10f57d2e21b5a742ecb6029",
        /* ... */
        "f7ec1334115b5fe74475f662d3d0190b4526b2bb5dcfb3e6f235f1e90f61d85a",
    ];
    let sigs_strs = [
        "487df6a1d9149e0c6f205c93deebbedb0605f73f1b0b66f0be2aca4f5b5f1c5c9a8aa226aaceb0b0f6c25a7ea9548106",
        "8d93bb04c4513d3115ed9d07d682b304752dabccde5ae1ab84a8c95ad7d0a796c2a5d60acc6d1aaab799800928d04a15",
        /* ... */
        "067ffcb122c43181eb4c525d2a7b56714262aae808ae24b62aa5ec6e1035a9f6ce6473f19dc470957afa98b437c68814",
    ];
When running the project, it runs a loop in src/bin/verify-bls-pedersen.rs which verifies these signatures using a public key pk and a function verify:
    let (pk, ms, sigs) = puzzle_data();
    for (m, sig) in ms.iter().zip(sigs.iter()) {
        verify(pk, m, *sig);
    }
This function implements a BLS12-381 signature verification in src/bls.rs using crate ark-bls12-381.
The objective of this puzzle is to forge a signature from the 256 known signatures.
How is this even possible?
Crash course on BLS12-381 signature scheme

BLS12-381 is a signature scheme which uses elliptic curve mathematics with three groups, named $G_1$ and $G_2$ and $G_T$.
This means for example that given a number $x$ and a point $P \in G_1$, it is possible to perform a scalar multiplication $x.P$ and the result is a point in the same group ($G_1$ in this example).
These groups have generators which are defined in the specification: $g_1, g_2, g_T$.
BLS12-381 relies on two functions:

A hashing to point function $H$ which maps messages $m$ to points $H(m)$ in $G_1$.
A bilinear pairing function $e$ which maps 2 points $P_1 \in G_1$ and $P_2 \in G_2$ to a point $e(P_1, P_2) \in G_T$.

The $e$ has a specific property of being bilinear: given 2 number $x, y$ and 2 points $P_1 \in G_1$ and $P_2 \in G_2$:
$$e(x.P_1, y.P_2) = (x \times y).e(P_1, P_2)$$
This is used to define a signature scheme:

A BLS12-381 private key is a number $priv$

The matching public key is a point on $G_2$: $Pub = priv.g_2$

To sign a message $m$, a point on $G_1$ is computed using the hashing to point function, $H(m)$

The signature of $m$ is the result of a scalar product $sig = priv.H(m)$. This is a point in $G_1$.
Verifying a signature consists in checking that $e(sig, g_2) = e(H(m), Pub)$


This works because:
$$e(sig, g_2) = e(priv.H(m), g_2) = priv.e(H(m), g_2) = e(H(m), priv.g_2) = e(H(m), Pub)$$
In the puzzle, the verification function is:
use ark_bls12_381::{Bls12_381, G1Affine, G2Affine};
use ark_ec::{AffineCurve, PairingEngine};
use std::ops::Neg;

use crate::hash::hash_to_curve;
use ark_ff::One;

pub fn verify(pk: G2Affine, msg: &[u8], sig: G1Affine) {
    let (_, h) = hash_to_curve(msg);
    assert!(Bls12_381::product_of_pairings(&[
        (
            sig.into(),
            G2Affine::prime_subgroup_generator().neg().into()
        ),
        (h.into(), pk.into()),
    ])
    .is_one());
}
This computes $e(sig, -g_2) + e(H(m), Pub))$ and checks that the result is the one of $G_T$, which is a valid way to verify a BLS12-381 signature.
The $e$ which is used in the challenge is the usual one, but not the hashing to point function.
There is possibly an issue in this specific function.
For more details, BLS12-381 is documented in several places:

https://en.wikipedia.org/wiki/BLS_digital_signature
https://eips.ethereum.org/EIPS/eip-2333
https://datatracker.ietf.org/doc/draft-irtf-cfrg-bls-signature/
https://docs.rs/ark-bls12-381/0.3.0/ark_bls12_381/
https://hackmd.io/@benjaminion/bls12-381

Puzzle hashing to point function

The puzzle uses a function defined in src/hash.rs to hash a message into a point.
The algorithm which is implemented is the following:

A deterministic pseudo-random number generator (PRNG) is initialized (using rand_chacha::ChaCha20Rng)
This PRNG is used to generate 256 random points $(Gen_0, Gen_1..., Gen_{255})$ in $G_1$ called generators, in crate ark_crypto_primitives.
The message $m$ is hashed using BLAKE2s algorithm to produce a 256-bit number $b2(m)$.
Each bit $i$ is named $b2(m)_i$.
This number is evaluated on the 256 random points linearly: $$H(m) = b2(m)0.Gen_0 + b2(m)1.Gen_1 + ... + b2(m){255}.Gen{255}$$.

Normally the hash bits are used with windows.
But in the puzzle, src/hash.rs defines the windows as:
impl Window for ZkHackPedersenWindow {
    const WINDOW_SIZE: usize = 1;
    const NUM_WINDOWS: usize = 256;
}
These parameters lead to the described construction, where each bit of $b2(m)$ is considered separately.
However these parameters break the security of the signature scheme when some signatures are known!
Indeed, if an attacker knows the signature $sig_0$ of a message $m_0$ which has a BLAKE2s hash $b2(m_0) = (1, 0, 0..., 0)$, the signature $sig_1$ of a message $m_1$ with $b2(m_1) = (0, 1, 0..., 0)$, it has:
$$sig_0 = priv.H(m_0) = priv.\left(1.Gen_0 + 0.Gen_1 + ... + 0.Gen_{255}\right) = priv.Gen_0$$
$$sig_1 = priv.H(m_1) = priv.\left(0.Gen_0 + 1.Gen_1 + ... + 0.Gen_{255}\right) = priv.Gen_1$$
$$...$$
$$\forall i \in {0, 1..., 255}, sig_i = priv.Gen_i$$
To forge the signature of an arbitrary message $m$, this attacker needs to compute:
$$sign(m) = priv.H(m)$$
$$sign(m) = priv.\left(b2(m)0.Gen_0 + b2(m)1.Gen_1 + ... + b2(m){255}.Gen{255}\right)$$
$$sign(m) = b2(m)0.(priv.Gen_0) + b2(m)1.(priv.Gen_1) + ... + b2(m){255}.(priv.Gen{255})$$
$$sign(m) = b2(m)0.sig_0 + b2(m)1.sig_1 + ... + b2(m){255}.sig{255}$$
So with a linear combination of the known signatures, the attacker can sign arbitrary messages.
However the puzzle does not provide the $m_0$ which are needed in this attack.
Nevertheless there are linear relationship everywhere, so it is possible to recover the needed $sig_{0...255}$ from the 256 provided signatures.
The matrix attack

The puzzle provides 256 messages $msg_i$ with their signatures $sign(msg_i)$.
When considering the bits of the BLAKE2s hash of each messages, we can craft a matrix where each hash is a row:
$$\forall i, j \in {0, 1..., 255}, M_{i,j} = b2(msg_i)_j$$
Doing so, the hashing to point and the signature functions look like a matrix multiplied by a vector, even though it consists in scalar products (contrary to products between integers):
$$\forall i \in {0, 1..., 255}, H(msg_i) = \sum_{j=0}^{255} b2(msg_i)j.Gen_j = \sum{j=0}^{255} M_{i,j}.Gen_j$$
$$\forall i \in {0, 1..., 255}, sign(msg_i) = priv.H(msg_i) = \sum_{j=0}^{255} M_{i,j}.(priv.Gen_j)$$
To forge the signature of an arbitrary message $m$, we need to compute:
$$sign(m) = priv.H(m) = \sum_{j=0}^{255} b2(m)_j.(priv.Gen_j)$$
As everything is linear, we can look for coefficients $(a_0, a_1..., a_{255})$ to combine the known signatures.
These coefficients would verify the following equation:
$$sign(m) = \sum_{i=0}^{255} a_i.sign(msg_i)$$
Therefore:
$$\sum_{j=0}^{255} b2(m)j.(priv.Gen_j) = \sum{i=0}^{255}\sum_{j=0}^{255} a_i.(M_{i,j}.(priv.Gen_j))$$
$$\sum_{j=0}^{255} b2(m)j.(priv.Gen_j) = \sum{j=0}^{255} \left(\sum_{i=0}^{255} a_i \times M_{i,j}\right).(priv.Gen_j)$$
A possible way to find such $(a_i)$ consists in having:
$$\forall j \in {0, 1..., 255}, b2(m)j = \sum{i=0}^{255} a_i \times M_{i,j}$$
This is a matrix multiplication $B = A \times M$ which can be solved by inverting $M$.
With $N$ the inverse of $M$:
$$(b2(m)_j) \times N = (a_i) \times M \times N = (a_i)$$
So we can define the $(a_i)$ with:
$$\forall i \in {0, 1..., 255}, a_i = \sum_{j=0}^{255} b2(m)j N{j, i}$$
Solution

The attack consists in these steps.

Choose an arbitrary message $m$.
Compute a matrix $M$ using the 256 messages provided by the puzzle.
Compute the inverse $N = M^{-1}$.
Compute BLAKE2s digest of the attack message $b2(m)$.
Compute the vector $a_i = \sum_{j=0}^{255} b2(m)j N{j, i}$.
Compute the forged signature $sign(m) = \sum_{i=0}^{255} a_i.sign(msg_i)$ using the 256 signatures provided by the puzzle.

In these explanations, computing inverse $N = M^{-1}$ seemed simple but these is one caveat: when considering the matrix as a matrix of integers, it is unlikely to be inversible.
Nevertheless the operations can be done modulo $r = 52435875175126190479447740508185965837690552500527637822603658699938581184513$, which is the order of the groups used by BLS12-381.
The matrix inversion modulo $r$ can be implemented using SageMath (apt-get install sagemath on Debian/Ubuntu):
import sage.all

BLS12_381_order = 52435875175126190479447740508185965837690552500527637822603658699938581184513
GF_SCALAR = sage.all.Zmod(BLS12_381_order)

basis = [encode_as_bit_vector(b2_m) for b2_m in hashed_messages]
sage_basis = sage.all.matrix(GF_SCALAR, basis)
sage_inv = sage_basis.inverse()
As SageMath does not provide Rust bindings, and the libraries used by the puzzle does not seem to implement scalar multiplication, the implementation worked around this by splitting the Python code in xor_linear_space.py and the Rust code in src/bin/verify-bls-pedersen.rs.
In the end, the signature for message "IooNag" (my usual username) is "1775cdf800a8f9788f2c0c3f46ec2b70fda71d5285284ab0a238b87c7a8e9fd7e6622b1e391edae16fdea6ad5e619c8a":
let m = b"IooNag";
let sig_hex = "1775cdf800a8f9788f2c0c3f46ec2b70fda71d5285284ab0a238b87c7a8e9fd7e
    6622b1e391edae16fdea6ad5e619c8a";
let sig = G1Affine::deserialize(&mut Cursor::new(hex::decode(sig_hex).unwrap())).unwrap();
verify(pk, m, sig);