bwindsor/PasswordHashing.md

## PasswordHashing.md

      
    Raw
  

              PasswordHashing.md
            
          
    This was copied and pasted from here.
Many years ago, it was standard practice to store passwords in plaintext in databases. This was a bad idea because, when the database was compromised, the attacker got all the passwords. To combat this, we started hashing passwords in the database using one-way cryptographic hash algorithms. MD5 became popular, but weaknesses (collisions, partial preimage, etc) discovered in it mean it's no longer recommended. A lot of people moved onto SHA1, which is more secure.
The problem with this approach is that it's possible to construct a huge table of hashes and their corresponding plaintexts. These are called rainbow tables. They work on the concept that it is more efficient to compute a huge list of hashes for all possible passwords (within a certain set) and then store it, so it can be quickly queried later on. So, instead of brute-forcing individual hashes, it became possible to just query the database for a hash and have its plaintext returned immediately.
In order to fix this, security nerds invented salts. Salts are large unique random values appended to passwords before they are hashed. This salt is stored with the hash, so that it can be computed again later. So, we compute H(m+s) = h, then store h and s in the database. This provides significant protection against rainbow tables because it essentially requires a separate rainbow table to be generated for each salt.
So, the bad guys switched back to dictionary attacks and brute-force cracking. With the advent of GPU computing, it became possible to compute billions of hashes per second on a moderately powerful graphics card. In fact, people have built computers that can compute nearly 50 billion MD5 hashes per second - pretty impressive / scary. The reason GPUs are capable of this is that they're designed to do huge numbers of parallel scalar operations. Scalar operations are math and logic operations that do not involve branches - i.e. they don't need to do much/any "if x then do y". Cryptographic hash algorithms tend to fit into this model.
In order to make this difficult, we have to make the hash operation slow enough to make brute forcing infeasible. Normal hash algorithms (e.g. SHA1) are designed to be fast, which makes them unsuitable for this purpose. HMAC adds very little overhead and no extra security margin, so it too isn't much use here.
Creating a slow cryptographic hash algorithm is easier said than done - it's very hard to come up with one that is slow, irreducible (i.e. cannot be optimised beyond its current state) and secure. There are three popular hash functions that can do this: PBKDF2, bcrypt, and scrypt. These are known as adaptive key derivation functions, because they accept a work factor value along with the plaintext and salt. The work factor alters the amount of time it takes to compute the hash, and is designed to safeguard against future hardware improvements.
So, for an adaptive key derivation algorithm H, we compute H(m,s,w) = h, where m is the message (password), s is the salt, and w is the work factor. The resulting h usually contains s and w, so that a verification function can later compute the same hash using the same parameters. The work factor generally controls the number of iterations performed of an internal cryptographic primitive. The goal is to make the computation take long enough to make cracking infeasible, but not to exceed the resources that we have.
In order to provide more security against dedicated hardware-based cracking, scrypt ensures that the hash computation is both CPU-hard and memory-hard, i.e. it requires significant CPU and memory resources in order to produce the hash value. This is important, because FPGAs usually have access to very little immediate memory.