Skip to content

Instantly share code, notes, and snippets.

@nficano
Created March 26, 2020 22:15
Show Gist options
  • Save nficano/aa599dc62c039f67ca8ac106f25a1629 to your computer and use it in GitHub Desktop.
Save nficano/aa599dc62c039f67ca8ac106f25a1629 to your computer and use it in GitHub Desktop.
hash_functions

Hash Functions

A hash table is a data structure that implements an dictionary abstract data type, and is used to map keys to values.

                         array (key)
                           ┌───┐
                         0 │ ⊙─┼─► value
                           ├───┤
                         1 │ ⊙─┼─► value
                           ├───┤
                         2 │   │  A Hash table is implemented by using a large array.
                           ├───┤  This is usually called the table's "backing array
                         3 │   │  because it backs the hash table.
                           ├───┤
To place an item A       4 │   │
into the backing array,    ├───┤
calculate a hash value.  5 │   │
                           ├───┤
                         6 │   │
                           ├╶╶╶│
               ┌──────┐    │   │
     "cab" ──► │ hash │    │   │
               │ func │    │   │
               └───┼──┘    │   │   Multiple items might have the same hash value.
                   │       │   │   This is called a collision. There are 2 ways
                   │       │   │   to handle collisions: chaining and linear probing.
                   │       ├╶╶╶│
                   │       │   │     linked-list
                   │       ├───┤    ┌───────┬───┐  ┌─────────┬───┐
                   └──► 98 │ ⊙─┼──► │ "cab" │ ⊙─┼──┼─► "cat" │ ⊙─┼────►
 the hash value is         ├───┤    └───────┴───┘  └─────────┴───┘
 then mod(%) with       99 │   │
 the array size.           └───┘

Additional Notes:

  • Load Factor is the Number of Entries in the Hash Table divided by the size of the backing array. The higher the load factor, the greater the probability of collisions.
  • A Load factor of 0.75 is generally considered a good point to double the size of the backing array.
  • When you resize the backing array, you have to remap all the entries in the table.

Requirements:

  1. Deterministic
  2. Evenly distributed
  3. Should use all values of the input

Hash Codes:

To generate hash codes, we use polynomials. We set x to a prime number, (e.g, 31, 53 or 101).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment