mtroiani/hashmaps.md

## hashmaps.md

      
    Raw
  

              hashmaps.md
            
          
    Hashmaps and Hashsets

What are hashmaps?

Hashmaps are a way to store key/value pairs with a fast (O(1) on average) lookup time.
We commonly use arrays for constant time lookups, but hashmaps give us more flexability as to what we can store. For example, you may have used an array's index to 'map' to character values:
Let's count the number of times each character appears:
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus sed posuere nunc, sit amet blandit felis. Ut non orci eget."


Index
Data


97
5


98
1


99
5


100
4


101
12


102
1


Using an array we can look up the count of any particular character in constant time just by using its ASCII value. But what if we wanted to look up something that didn't have an ASCII value, say for example an entire word?
We could try to save the words alphabetically, but then how would we know which word maps to which index? And what would happen if we need to add a new word? We need a way to consistently map a word to a space in the array. We can do this using something called a hashing function.
Hashing Functions

Hash functions are a way to map a piece of data from one form to a number - typically an integer or a long. So, for example, let's take one of the words from our sample text: "Lorem". We already have one way of converting characters to numbers using the ASCII chart - let's start with that.
L (76) o (111) r (114) e (101) m (109)
There are many, many ways these could be combined, but let's keep it simple and just add them.
76 + 111 + 114 + 101 + 109 = 511
Great, the word "Lorem" could map to array index 511. This could make for a very long array though, so let's map it to a smaller index. Say our array is of length 5. We can use the modulo operator: 511 % 5 = 1. Now our word belongs in index 1. We can continue this for the next words in our string.

ipsum -> 558 % 5 = 3
dolor -> 544 % 5 = 4
sit -> 337 % 5 = 2


Index
Key
Value


0


1
Lorem
1


2
sit
1


3
ipsum
1


4
dolor
1


Collisions

amet -> 423 % 5 = 3
Oh no! The word "amet" should hash to index 3, but we already have the count for the word ipsum there. This is called a collision, and it happens when there are two or more hashed values that map to the same spot. What can we do about this? Here are two common approaches for resolving collisions:

Linear probing: In linear probing we just put the hashed key into the next clostest empty space. (In this particular example it would be index 0). Later, when we're searching for the key we start at the original hash location and keep searching until we either find the correct key or an empty space.
Linked Lists: Instead of putting the key/value pair directly into the array we could store a pointer to a linked list. This linked list continues to grow as necessary. While we'd need to search through the entire linked list for the value we're looking for as long as there are enough 'buckets' in the hashmap none of the lists would get too long.

Preventing Collisions

Our goal for using a hashmap was to have constant time O(1) lookups, but when there are too many collisions our lookup times can get longer if we need to search through a linked list, or if we need to search many locations due to linear probing. One of the main ways to prevent our lookup time from getting too long is by increasing the size of our hashtable. This is called dynamic resizing. With dynamic resizing once our table reaches a certain percentage of 'fullness' it automatically resizes. This typically occurs at around 2/3 load (Python) to 3/4 load (Java).
Resizing takes extra time to rehash the existing items in the hashmap and takes more space than the previous map, but it reduces the chance of future collisions. Resizing typically doesn't happen very often, so while our worst case insertion is O(n) the typical case is O(1).
Why would we use a hashmap?


Hashmaps are very useful - they allow us to map a key to a value.
Hashmaps have constant time insertion, deletion, and lookup in the average case (and O(n) in the worst case)

Why wouldn't we use a hashmap?


Items in a hashmap are unordered, so if you know your use case would include iterating through items in order you'd want to consider other options.

Hashsets

Hashmaps are such an important datastructure that they can lay the foundation for other data structures such as sets. A hashset is a set construction that uses a hashtable underneath to implement a set. There are other ways to construct a set, but we'll focus on hashsets.
What is a set?

A set is an unordered collection of items that cannot have duplicates. Sets typically have fast lookup for membership (hashsets do this in O(1) on the average case), and allow for set operations such as union, intersection, difference, and subset.
How are hashsets related to hashmaps?

Let's think back to the hashmap we made before:


Index
Key
Value


0


1
Lorem
1


2
sit
1


3
ipsum
1


4
dolor
1


What if, instead of counting the occurances of the words in our string we only wanted to check if they are present or not? In that case, the value becomes unnecessary - the key is what really matters. Instead, the value can be changed to constant dummy value.


Index
Key
Value


0


1
Lorem
Present


2
sit
Present


3
ipsum
Present


4
dolor
Present


In this case we use essentially the same hashing methods, collison prevention methods, and lookup methods as we do with hashmaps.
Why would we use a hashset?


Useful for checking membership
Constant time insertion, deletion, and lookup in the average case (and O(n) in the worst case)

Why wouldn't we use a hashset?


Items in a hashset are unordered (Note that some language have ordered sets, but they typically use a tree to do so - not a hashmap)

Where can I learn more?

Check your favorite language's docs!
Hashmaps


Python - https://docs.python.org/3/library/stdtypes.html#index-46
Java - https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html
Javascript - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map

Sets


Python - https://docs.python.org/3/library/stdtypes.html#set
Java - https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html
Javascript - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set

Let's Use Hashmaps and Sets

Contains Duplicate

Given an array of integers, find if the array contains any duplicates. Your function should return true if any value appears a least twice in the array, and it should return false if every element is distinct.
Couple Sum

Given a List of integers, find two numbers contained in it such that they sum up to a specific target.
The function couple_sum should return the indices of the two numbers such that they add up to the target, where index1 must be less than index2.
Please note that the indices are not zero based, and you can assume that each input has exactly one solution.
Example: Input: [2, 3, 4, 7], Target = 7, Output = [2, 3]
Valid Anagram

Given two strings s and t, write a function to determine if t is an anagram of s.
For example,
s = "anagram", t = "nagaram", return true.
s = "rat", t = "car", return false.
Note:
You may assume the string contains only lowercase alphabets.
Follow up:
What if the inputs contain unicode characters? How would you adapt your solution to such case?
Keyboard Row

Given a List of words, return the words that can be typed using letters of alphabet on only one row's of American keyboard like the image below.
Example 1:
Input: ["Hello", "Alaska", "Dad", "Peace"]
Output: ["Alaska", "Dad"]
Note:
You may use one character in the keyboard more than once.
You may assume the input string will only contain letters of alphabet.
Find the Difference

Given two strings s and t which consist of only lowercase letters.
String t is generated by random shuffling string s and then add one more letter at a random position.
Find the letter that was added in t.
Example:
Input: s = "abcd", t = "abcde"
Output: e

Gist content by Mary Troiani for presentation she gave at the Algorithms and Technical Interview Study Group - Jan 24, 2018 meetup
This is a part of a series of presentations for Learn Teach Code - Algorithm Study Group