A lot of water has flowed under the bridge since then, when the theme of rainbow tables was at its peak, but, nevertheless, the idea is entertaining. It is about this kind of intricate tables that I want to tell you.
Let's figure out what, how and why. The task is: A certain hash is sent to the input of the program, at the output we must get the password from which this hash was taken.
To simplify the example, we will assume that passwords are only six characters long and can consist exclusively of lowercase Latin characters.
The most obvious hack is brute force. Its essence is to take a hash from every possible permutation of characters in the password until it matches the desired hash. Based on an experiment, my computer generated hashes for 100000000 ten character passwords approximately 1.11691 sec. Let's count the number of different ten character passwords consisting of upper and lower latin letters and numbers using the combinatorial formula for the number of placements.
k 10 n! 26!
A = A = ────────── = ───── = 21 ⋅ 22 ⋅ 23 ⋅ 24 ⋅ 25 ⋅ 26 = 165765600
n 62 (n - k)! 20!
Now let's calculate the maximum time it takes to crack:
1.11691
─────────── ⋅ 19275223968000 = 215287 seconds ≈ 60 hours
100000000
Not bad, but too long for a single password. Please note that our search alphabet consists of only 26 characters, this is essential reduces the number of combinations compared to real passwords.
The following thought comes to mind: what if you generate hashes for all passwords in advance and then search as you search words in the dictionary. No sooner said than counted. Let's assume that words are encoded in ASCII and a letter occupies one byte. Password will take 10 bytes. We will use md5 as a hash, it will take 32 bytes. By simple calculations we get: approximately 736 terabytes. We got into extremes again...
The rainbow table is a special variant of lookup tables for inverting cryptographic hash functions using the mechanism of the reasonable compromise between table lookup time and memory usage.
H R1 H R2 H R3
banana ───> 21e74d8ec2bde ───> friend ───> 009310b2075cf ───> potato ───> 2ef83e4ddb0aa ───> freeze
H R1 H R2 H R3
beezer ───> 21e74d8ec2bde ───> jumper ───> 009310b2075cf ───> expand ───> 2ef83e4ddb0aa ───> luxury
H R1 H R2 H R3
reflex ───> 21e74d8ec2bde ───> vertex ───> 009310b2075cf ───> anyway ───> 2ef83e4ddb0aa ───> mayday
Diagram of a rainbow table with a chain length of three. R1, R2, R3 - reduction functions, H - hashing function.
Suppose we have a finite set of P passwords. So, the idea is this - we will take a random password from P, then by we take a hash from it and apply another reduction function to the result. We repeat this operation k times.
H R1 H R2 H R3
qwerty ───> 21e74d9ec2bae ───> sdgcds ───> 0093272b205cf ───> kfpsen ───> 2ef82e4ddb1aa ───> zlfesp
The requirement for the reduction function is to return values from the same alphabet as passwords.
Now we have a chain that can be recalculated and contains k passwords. We construct n such chains and write only the opening and closing passwords to the table:
start_word | end_word |
---|---|
banana | freeze |
beezer | luxury |
reflex | mayday |
Now, let's figure out how to crack the notorious hash using this table.
For each hash, the value of which we want to reverse, calculate the sequence R (… R (H (R (h)))…). If any of intermediate values will coincide with some end of any chain, we take the beginning of this chain and we restore it completely. It is highly likely that the complete chain will contain the hash value h, and the preceding one the password will be the one he is looking for.
Now let's figure out what this means. We have a hash h that needs to be reversed. Our task is to assume that this hash is located on some of the reductions and build the chain to the end. We will go with the k reduction to the first. When we will get the end of the chain for each of the assumptions, we will look for the resulting password in our table among end_word. If there is a match, we will take start_word of the matched record in the table and restore the entire chain and look for the desired hash in it. The previous password in the chain is likely to be the desired one.
It is worth noting that the restored chain does not always contain the desired hash value h. This is possible with a collision occurs between the hashing function H or the reduction function R. In this case, there may be false triggers, they must be ignored and the search continued.
If, upon reaching the zero position, we still did not find the password, it means that it did not appear in the pre-calculated chains.
Simple hash chains have several disadvantages. The most serious is the possibility of merging two chains into one (generation the same value in different chains). All values generated after the merge will be the same in both chains, which narrows the number of passwords covered. Since the pre-generated chains are not completely saved, it is impossible to effectively compare all generated values with each other. As a rule, the absence of collisions in the hash function H is taken care of by the encryption side of the passwords, so the main problem lies in the reduction function R.
Another serious problem is the selection of such an R function that will generate passwords with the required coverage and distribution. The limitation of the output alphabet is a serious limitation to the choice of such a function.
First, let's define the length of the password to be cracked and the input alphabet. Offered as input the alphabet use lowercase latin letters. Their number is 26 letters. Password length, let it be equal to 6 characters. We will write the code in python for clarity.
import string
ALPHABET: list = list(string.ascii_lowercase)
WORD_LENGTH = 6
Using the combinatorial formula for determining the number of placements, we calculate the number of possible password combinations:
k 6 n! 26!
A = A = ────────── = ───── = 21 ⋅ 22 ⋅ 23 ⋅ 24 ⋅ 25 ⋅ 26 = 165765600
n 26 (n - k)! 20!
So, from a six-character password from lowercase letters of the Latin alphabet, you can get 165765600 different password combinations.
For example, we will use the md5 hashing algorithm.
About the reduction function. It should turn the hash into some password of a given length of characters in the input alphabet. It is proposed to read the hash bit by bit, as a number, and take modulo by the length of the alphabet. Thus, we get the number symbols in the alphabet. Further, in order to shift the values, it is necessary to completely divide the hash by the length of the alphabet. We do this until we collect characters for one word.
import string
def reduce(hash_string: str, iteration: int, alphabet=None, word_length: int = 6) -> str:
if alphabet is None:
alphabet = list(string.ascii_letters)
# Shifting input hash value by iteration and modulo by 2^40.
value = (int(hash_string, 16) + iteration) % (2 ** 40)
result = []
for i in range(word_length):
# Getting modulo by alphabet length. Result number will be between 0 and len(alphabet).
mod = value % len(alphabet)
# Dividing value by alphabet length.
value //= len(alphabet)
# Getting symbol from input alphabet by calculated value in range from 0 to len(alphabet).
result.append(alphabet[mod])
# Generating word from calculated symbols list.
return "".join(result)
First, let's create a function that will accept the password hash, the initial reduction iteration, and return the trailing password for the given chain.
import reduction
from collections import Callable
def get_end_of_chain(
hash_string: str,
hash_func: Callable,
chain_length: int = 1000,
start_step: int = 0,
word_length: int = 6,
alphabet=None
) -> str:
word: str = ""
for i in range(start_step, chain_length):
word = reduction.reduce(
hash_string=hash_string,
iteration=i,
alphabet=alphabet,
word_length=word_length
)
hash_string = hash_func(word)
return word
- hash_string: str - Input hash of the previous password in the chain.
- hash_func: Callable - Hash function.
- chain_length: int - Rainbow table chain length.
- start_step: int - The reduction iteration to start with. Default - 0.
- word_length: int - The length of the desired password.
- alphabet: list - Input alphabet.
So let's try to generate a rainbow table. We will generate a random six character password, which will become the beginning of the chain, and count the entire chain for it. The password obtained after calculating the chain will be the closing one. In the table, we need to put the opening chain and closing passwords.
from collections import Callable
from random import Random
import string
def generate(
quantity: int,
hash_func: Callable,
add_to_table: Callable,
chain_length: int = 1000,
alphabet=None,
word_length: int = 6
) -> None:
if alphabet is None:
alphabet = list(string.ascii_letters)
random = Random()
i = 0
while i < quantity:
start_word: str = "".join([alphabet[random.randint(0, len(alphabet) - 1)] for i in range(word_length)])
end_word: str = get_end_of_chain(
hash_string=hash_func(start_word),
hash_func=hash_func,
chain_length=chain_length,
word_length=word_length,
alphabet=alphabet
)
if add_to_table(start_word, end_word):
i += 1
- quantity: int - Target number of records in the table.
- hash_func: Callable - Hash function.
- add_to_table: Callable - Callback function to add record to table.
- chain_length: int - Rainbow table chain length.
- word_length: int - Target length of passwords.
- alphabet: list - Input alphabet.
- Alphabet - [a, b, c, d, e, f, g, h, i, j ,k, l, m, n, o, p, q, r, s, t u, v, w, x, y, z]
- Chain length - 5000
- Password length - 6
- Records quantity - 40000
The table was generated in 36 minutes 21 seconds on one thread.
The table is generated, the tea has already cooled down 10 times. It's time to use it. The password is calculated as follows:
- At the input, we receive the hash H, which must be reversed (find the password from which the hash was obtained).
- We assume that the resulting hash is the penultimate link in the chain of length n.
- Compute the
n-1
reduction for it. At the output of the reduction, we get a password. - We are looking for the received password in the table by trailing passwords (end_word).
- If there is a match, we take the password that opens the chain from the table record and build a full chain for it until we find the password we need in it. We do this for all matches until we find the password.
- If the password is not found, we apply a reduction to the input hash H
n-2
, hash the result and apply ann-1
reduction to it. We perform steps 4 and 5 again. And so on in a circle until we reach zero reduction or find a password - If the password is not found, the table does not contain the information required to reverse the input hash H.
from collections import Callable
import reduction
def get_password(
hash_string: str,
hash_func: Callable,
get_from_table: Callable,
chain_length: int = 1000,
word_length: int = 6,
alphabet=None
) -> str:
for i in range(chain_length - 1, -1, -1):
end_word: str = get_end_of_chain(
hash_string=hash_string,
hash_func=hash_func,
chain_length=chain_length,
start_step=i,
word_length=word_length,
alphabet=alphabet
)
entries: list = get_from_table(end_word)
if len(entries):
for word in entries:
for step in range(0, chain_length):
current_hash: str = hash_func(word)
if hash_string == current_hash:
return word
word = reduction.reduce(
hash_string=current_hash,
iteration=step,
alphabet=alphabet,
word_length=word_length
)
- hash_string: str - Target number of records in the table.
- hash_func: Callable - Hash function.
- get_from_table: Callable - Callback function for finding records in a table by end_word parameter.
- chain_length: int - Rainbow table chain length.
- word_length: int - Target length of passwords.
- alphabet: list - Input alphabet.
So, we randomly generate a six-character password from the characters of the input alphabet, calculate its hash and try to spin the password back. We do this 1500 times, along the way measuring some parameters.
- Average password computation time: 1 minute 24 seconds.
- Average time for successful password calculation: 37.5 seconds.
- Hacking probability: 47.34%.
The tests took approximately 36 hours.
In total, a rainbow table does not guarantee a 100% chance of hash cracking, but it significantly reduces costs. Fundamental part is a reduction function, the quality of the table depends on it.
Python is not the fastest language, but it is great for prototyping and learning algorithms. For maximum efficiency, you should implement this algorithm, for example, in C or C ++.
Also, it does not hurt to improve the reduction function so that it provides the most uniform distribution values.
In addition to that, here are some thoughts on how you can parallelize your key guessing process:
- It is possible to parallelize the steps of checking the n-th reduction.
- It is better to perform several reductions in one instance of the stream in order to optimize the cost of initializing the stream.
Thanks for attention! :)
Author: Danil Andreev