Skip to content

Instantly share code, notes, and snippets.

@BrianZanti
Last active May 30, 2019 19:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save BrianZanti/72450aebcb16e1f3f45e7eebaa98efb1 to your computer and use it in GitHub Desktop.
Save BrianZanti/72450aebcb16e1f3f45e7eebaa98efb1 to your computer and use it in GitHub Desktop.

Hashing in Practice

You can complete these exercises in JavaScript or Ruby.

... and you should probably work in irb.

... and start with this Codepen.

We'll provide timing constraints on the board. If you're moving quickly and feeling confident, complete the Extension questions. But if you're not, make sure you complete all the non-Extension exercises first.

Hashing Fundamentals

You've done some hash building in the physical space. Let's practice doing it with the machine, using better algorithms. Collaborating with the person next to you, experiment and discuss answers to these questions:

  1. What is the MD5 digest of your snack string?
  2. What is the Sha256 digest of the same string?
  3. Repeat both MD5 hashing and Sha256 hashing, but change the input to include a one-letter-off typo. What do you notice about the output? What if you add a blank space to the end of the string? What if you change the capitalization of one letter?
  4. Multiply your string 1000x (like "chipschipschipschips..." and hash it again through each algorithm. What's notable about the output as compared to previous runs?

Then, in your own notebook, jot quick notes to solidify your learning:

  1. How does a small change to the input of a hash change the output?
  2. Why does the answer to "A" matter?
  3. How does a massive change to the size of the input change the output?
  4. Based on this analysis, can you come up with three potential use cases for hash functions *besides* passwords?

Extension

Feeling good about hashing? Try answering these questions:

  1. Which algorithm (MD5 / Sha256) is faster? Can you prove it using a dataset of at least 200 inputs and calculating the percentage speed difference?
  2. Is the percentage difference consistent if you increase the size of each individual input data by 100 times?
  3. Is there a scenario where you'd want to intentionally choose a slower algorithm? Why?

Bad Secrets

Let's test your understandings by completing this section on your own. Use your pair as a resource if you get stuck, but try to complete the work on your own.

A hashing function is said to be "one-way" which is often useful in security, but it's not fool-proof. Say you hack into my application and are able to retrieve all my users' hashed passwords. You find that the account with username boss@example.com has this hashed password:

3e40106b8f4332e18d76e94124d9c82a

Based on the length of the digest you guess it's an MD5. You know that some users, particularly bosses, are lazy and they do dumb things like re-use their 4-digit ATM pin for their password. But the application required a password of eight digits, so they might have repeated the pin.

Work out answers to the following questions in your notebook:

  1. What's this user's password?
  2. Would the user's password have been "more secure" if they used eight letters rather than eight numbers? Explain your thinking.

Extension

A "rainbow table" makes this reverse engineering much faster.

  1. Can you generate a CSV file that has two columns: the first column contains all possible 8-digit codes following the 4+4 rule above, then the second column has the MD5 digest for that input.
  2. If you now have an MD5 digest for an input that is expected to follow the 4+4 rule, how long does it take you to "crack" a password using the rainbow table?
  3. Could you generate a similar table for the all words of eight or more letters in the dictionary? (hint: you have a text file dictionary on your filesystem at /usr/share/dict/words)
  4. Some developers choose to make passwords more obscure by applying the hashing algorithm more than once (ie: original input into the algorithm, then that output into the algorithm again). Can you expand your dictionary table with columns for double-hashing, quad-hashing, and octo-hashing? As our time is limited, you might need to constrain your input set ;)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment