Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Brief summary of what I know about hash functions


Today, people in general relate to cryptography mostly in regard to the security of their passwords. The passwords are worthless if others have resources to know it.

Today most of the websites dont simply use cryptographic hash functions like SHA256, MD5 etc. directly on the password. Instead random bits ( salt ) are added while encrypting them so that even when two users enter the same password the hashes that are generated are different from each other.

Hashes like SHA1 , SHA256, MD5 etc. are general purpose hashes. They have been designed to hash a large amount of data as quickly as possible.

An encryption algorithm for securely storing your password should have the following characteristics:

  1. Preimage resistance: Given h, it should be hard to find any value x with h = H(x).
  2. Second preimage resistance: Given x1, it should be hard to find x2 != x1 with H(x1) = H (x2).
  3. Collision resistance: It should be hard to find two values x1 != x2 with H(x1) = H(x2).

Another desirable characteristic of a secure hash algorithm should be slow execution speed. To calculate the value of a string x it should take some minimum amount of resources to calculate H(x).

There is a fundamental difference between collision attack and preimage attack:

  • Collision Attack: the attacker computes two messages M and M', distinct from each other, such that M and M' hash to the same value.
  • Preimage: the attacker is given a goal (a hash value H) and finds a message M which hashes to H.

A secure hash is intended to prevent collisions, even when an effort is being made to cause collisions. A slow hashing function would make a dictionary attack difficult because of the amount of time it takes.

##Avalanche Effect

Taking into account the above characteristics of a good hashing algorithm, one would expect that if the input is changed even slightly ( like flipping of a single bit ), the output changes significantly. This is called the 'avalanche effect'. If a hash function does not exhibit avalanche effect to a significant degree(50% of the bits change ), then it has poor randomization and given enough time a cryptoanalyst would be able to predict the input given only the output. The avalanche effect is one of the primary design objectives. It is also the reason why hash functions have large data blocks.

Avalanche effect is helpful when one needs to use only a specific portion of the hash.

##Attacks On Hash Functions

For an appropriately good hashing function removing some output bits demonstrably does not weaken it beyond the limit imposed by the number of remaining bits, with regards to collision resistance and preimage resistance and other useful properties. Collisions start are likely to occur when you have 2 to half the number of bits in the hash, but can occur at much earlier. So if you take a 256-bit hash with the avalanche effect like sha256 and take the 8 characters from its hex-digest (232 bits), you'll have a 50% chance of collision with 77,000 entries (roughly 216), a 1% chance of collision once you have about 9300 entries, 0.1% chance with 2900 entries etc, despite your hash about 4 billion different possible values of 232 bits.

Looking at other properties of hash functions, there is no proof that every output of the hash functuions is reachable for some input, but it is expected to be true. About 2134.5 (for MD5) or 2166.8 (for SHA-1) distinct messages are expected to be required to reach all output values, on the assumption that these hashes behave as random functions. If you found that it didn't generate some outputs, then this would be a flaw. It is at least a distinguisher, and most likely is indicative of some larger flaw, but how large the flaw is depends on many, many things.

These properties of hash functions have led us to decide how passwords should be stored. One should keep in mind that the space against which the attackers would be acting is not the output space of the hashing algorithm but only the password space. The most recommended hashing algorithm for storing passwords is 'bcrypt' which is typically slow.

##Bcrypt - The slow and secure hash function

Bcrypt uses a variant of the Blowfish encryption algorithm’s keying schedule, and introduces a work factor, which allows you to determine how expensive the hash function will be. Because of this, bcrypt can keep up with Moore’s law. As computers get faster you can increase the work factor and the hash will get slower.

##Salting In usual scenarios a salted hash should be used. A salt is stored with the password in the database and is hashed together with the password to produce the hash. One can also use secret salt to be more secure. Secret salts are not stored in the same database but at a location not accessible by the potential attacker.

##Hardware Level Cryptography

Another possibility would be to use some kind of cryptographic hardware (a token) attached on the server with an embedded key, which does some hashing operation on salt, key and password to produce the hash.It should not have an interface to retrieve the secret key, otherwise your attacker (if he succeeds to gain execute access on the server) can use this, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment