|This is an EXPERIMENTAL password hash with time and memory parameters,|
|such that the time parameter does not affect the memory required (but|
|does affect the number of memory accesses).|
|This was quickly designed, with no real test, so it's probably a silly|
|design and the code may be broken. Therefore, please:|
|- Do not use it to hash real passwords!|
|- Attack it! (circumvent the time/memory requirements, find biases...)|
|h Hash function with the hashlib common interface|
|(at least digest() and hexdigest() should be implemented)|
|pwd Password, a string|
|salt Salt, a string|
|ptime Time parameter, an integer > 0|
|pmem Memory parameter, an integer > 0|
|Back-of-the-enveloppe time and space requirements:|
|Time: ~ 1.5*ptime*pmem compression evaluations, for any of the functions|
|in hashlib; for other functions, it depends on the hash and block|
|lengths and on the padding rule|
|Memory: ~ hlen*pmem bytes, where hlen is the byte length of a digest|
|Examples with approximate timings measured on an Ivy Bridge:|
|phs( hashlib.sha256, 'password', 'salt', 2, 2**16 )|
|uses ~2MB, runs in ~250ms|
|phs( hashlib.sha256, 'password', 'salt', 1, 2**17 )|
|uses ~4MB, runs in ~250ms|
|(An optimized C implementation would probably be much faster.)|
|JP Aumasson / @veorq / firstname.lastname@example.org|
|def phs( h, pwd, salt, ptime, pmem ):|
|s = h( h( pwd + salt ).digest() + salt ).digest()|
|hlen = len(s)|
|for t in xrange(ptime):|
|for m in xrange(pmem):|
|s = s + h( s[-hlen:] ).digest()|
|s = h( s[::-1] ).digest()|
I like the simplicity.
If you were building a pipeline in hardware, where each stage in the pipeline computes one iteration of the outer loop, you'd only need to pass hlen bytes (length of state) to the next stage. It would probably be better to keep the big 'buf' across iterations of the outer loop, so that more info needs to be passed between pipeline stages. In other words, the memory doesn't need to be 'sustained' throughout the entire computation, it has a break after each iteration of the outer loop.
Reversing a string in hardware is just a matter of wiring, whereas in software it takes time. That may give hardware implementations an advantage.
Your memory-hardness isn't actually memory-hard. Witness that this:
gives the same result as phs, but takes only O(1) memory.
I strongly recommend that you read the scrypt paper; it's pretty easy to follow. Colin Percival is a very good writer. http://www.tarsnap.com/scrypt.html
Instead of byte inversion, you could store all of the hashes that make up the final S in a list, then go through that list backwards with .update(). Doing that solves the string concatenation problem at the same time. You could still do it in O(1) memory by computing just the hash you need every time, but would take O(pmem^2) time.
There is an advantageous time memory trade off where 1/2 the memory only takes 3/2 the time. Compute the hashes all the way through, and save the first, and middle through last hash. Then reverse the half you have saved, feed it in, and calculate the first half again (since you have the first hash), reverse that and feed it in. This involves 3/2 the time, but 1/2 the memory.
Generalizing the idea you can take 1/k+k/M of the memory at a cost of at most 2 time the time by storing every kth hash and computing only 1/k of the hashes at a time.