defuse/bcrypt-h.txt

## bcrypt-h.txt
Sketch of a security proof for BCRYPT(H(X)). This probably contains errors.

UPDATE: Only assume BCRYPT is collision resistant for X <= 72.

Define the BCRYPT-H(S, X) algorithm as follows:

UPDATE: Gah... the whole 'byte' thing isn't necessary at all. I originally
intended to pass *either* the actual X (with a zero byte prefix) or H(X) with
a 0x01 byte prefix, to bcrypt. I forgot to do that, and instead always passed
the hash with the byte prefix based on the length. The proof is still valid,
however, and I don't feel like rewriting it.

BCRYPT-H(S, X):

    If length(X) > 72
        byte = 0x01
    Else
        byte = 0x00
    End If

    hash = H(X)
    return BCRYPT(S, byte || hash)

We begin with two assumptions:

A1. BCRYPT(S, X) is collision-resistant when length(X) <= 72. That is, there
does not exist an efficient adversary who can find a pair of inputs (S, X), (S',
X') such that BCRYPT(S, X) == BCRYPT(S', X') and length(X) <= 72 and length(X')
<= 72.

A2. H(X) is a collision-resistant hash function whose output length is strictly
less than 72 bytes (at most 71). There does not exist an efficient adversary who
can find a pair of inputs X, X' such that H(X) == H(X').

We want to prove that BCRYPT-H(S, X) is collision resistant. In other words, we
want to prove that there is no efficient way to find two inputs (S, X,) and (S',
X') such that BCRYPT-H(S, X) == BCRYPT-H(S', X').

We can prove this by contradiction. Suppose there were an efficient way to find
BCRYPT-H collisions.

Suppose we have a pair of inputs (S, X) and (S', X') such that BCRYPT-H(S, X) ==
BCRYPT-H(S', X'). We know show that this implies a collision on BCRYPT(S, X)
*or* a collision on H(X).

Case 1: S != S'

    In this case, (S, byte1 || H(X)) and (S', byte2 || H(X')) is a BCRYPT
    collision, since they both BCRYPT to the same value, and S and S' are
    different, and length(byte || H(X)) <= 72 and length(byte || H(X')) <= 71.

Case 2: S == S'

    This case is more complicated, since the collision could either be due to
    a collision in BCRYPT, or a collision in H. There are two possibilities:

    Case 2.1. H(X) == H(X')

        If X != X', then this is the definition of a collision on H. If X == X',
        then (S, X) == (S', X'), which contradicts the assumption (it's not
        a BCRYPT-H collision).

    Case 2.2. H(X) != H(X')

        This implies a BCRYPT collision, since if H(X) != H(X'), then

            byte1 || H(X) != byte2 || H(X'),

        where byte1 is 0x01 if length(X) > 72 or 0x00 otherwise and byte2 is
        0x01 if length(X') > 72 or 0x00 otherwise. This is true because
        prefixing something to two different strings cannot make them the same.

        So (S, byte1 || H(X)) != (S, byte2 || H(X')), but BCRYPT(S, byte1 ||
        H(X)) == BCRYPT(S, byte2 || H(X')), while length(byte1 || H(X)) <= 72
        and length(byte2 || H(X') <= 72). This is a BCRYPT collision.

So, a BCRYPT-H collision implies *either* a H collision or BCRYPT collision. If
there exists an adversary that can efficiently find BCRYPT-H collisions, then
either there exists an adversary that can efficiently find BCRYPT collisions, or
there exists an adversary that can efficiently find H collisions (to prove this
rigorously you could prove that at least 50% of the BCRYPT-H collisions have to
be at least one of the kinds, then run the adversary 128 times and it finds
a collision on either BCRYPT or H (whichever one it is) with probability
1 - 1/2^128).

We have shown that if there exists an efficient collision-finding adversary for
BCRYPT-H, then either A1 or A2 is false. This is a contradiction, so if A1 and
A2 are true, then BCRYPT-H is collision-resistant.
	Sketch of a security proof for BCRYPT(H(X)). This probably contains errors.

	UPDATE: Only assume BCRYPT is collision resistant for X <= 72.

	Define the BCRYPT-H(S, X) algorithm as follows:

	UPDATE: Gah... the whole 'byte' thing isn't necessary at all. I originally
	intended to pass either the actual X (with a zero byte prefix) or H(X) with
	a 0x01 byte prefix, to bcrypt. I forgot to do that, and instead always passed
	the hash with the byte prefix based on the length. The proof is still valid,
	however, and I don't feel like rewriting it.

	BCRYPT-H(S, X):

	If length(X) > 72
	byte = 0x01
	Else
	byte = 0x00
	End If

	hash = H(X)
	return BCRYPT(S, byte \|\| hash)

	We begin with two assumptions:

	A1. BCRYPT(S, X) is collision-resistant when length(X) <= 72. That is, there
	does not exist an efficient adversary who can find a pair of inputs (S, X), (S',
	X') such that BCRYPT(S, X) == BCRYPT(S', X') and length(X) <= 72 and length(X')
	<= 72.

	A2. H(X) is a collision-resistant hash function whose output length is strictly
	less than 72 bytes (at most 71). There does not exist an efficient adversary who
	can find a pair of inputs X, X' such that H(X) == H(X').

	We want to prove that BCRYPT-H(S, X) is collision resistant. In other words, we
	want to prove that there is no efficient way to find two inputs (S, X,) and (S',
	X') such that BCRYPT-H(S, X) == BCRYPT-H(S', X').

	We can prove this by contradiction. Suppose there were an efficient way to find
	BCRYPT-H collisions.

	Suppose we have a pair of inputs (S, X) and (S', X') such that BCRYPT-H(S, X) ==
	BCRYPT-H(S', X'). We know show that this implies a collision on BCRYPT(S, X)
	or a collision on H(X).

	Case 1: S != S'

	In this case, (S, byte1 \|\| H(X)) and (S', byte2 \|\| H(X')) is a BCRYPT
	collision, since they both BCRYPT to the same value, and S and S' are
	different, and length(byte \|\| H(X)) <= 72 and length(byte \|\| H(X')) <= 71.

	Case 2: S == S'

	This case is more complicated, since the collision could either be due to
	a collision in BCRYPT, or a collision in H. There are two possibilities:

	Case 2.1. H(X) == H(X')

	If X != X', then this is the definition of a collision on H. If X == X',
	then (S, X) == (S', X'), which contradicts the assumption (it's not
	a BCRYPT-H collision).

	Case 2.2. H(X) != H(X')

	This implies a BCRYPT collision, since if H(X) != H(X'), then

	byte1 \|\| H(X) != byte2 \|\| H(X'),

	where byte1 is 0x01 if length(X) > 72 or 0x00 otherwise and byte2 is
	0x01 if length(X') > 72 or 0x00 otherwise. This is true because
	prefixing something to two different strings cannot make them the same.

	So (S, byte1 \|\| H(X)) != (S, byte2 \|\| H(X')), but BCRYPT(S, byte1 \|\|
	H(X)) == BCRYPT(S, byte2 \|\| H(X')), while length(byte1 \|\| H(X)) <= 72
	and length(byte2 \|\| H(X') <= 72). This is a BCRYPT collision.

	So, a BCRYPT-H collision implies either a H collision or BCRYPT collision. If
	there exists an adversary that can efficiently find BCRYPT-H collisions, then
	either there exists an adversary that can efficiently find BCRYPT collisions, or
	there exists an adversary that can efficiently find H collisions (to prove this
	rigorously you could prove that at least 50% of the BCRYPT-H collisions have to
	be at least one of the kinds, then run the adversary 128 times and it finds
	a collision on either BCRYPT or H (whichever one it is) with probability
	1 - 1/2^128).

	We have shown that if there exists an efficient collision-finding adversary for
	BCRYPT-H, then either A1 or A2 is false. This is a contradiction, so if A1 and
	A2 are true, then BCRYPT-H is collision-resistant.