Skip to content

Instantly share code, notes, and snippets.

@grant-roy
Last active August 29, 2015 14:06
Show Gist options
  • Save grant-roy/04aa07e8160031e265d5 to your computer and use it in GitHub Desktop.
Save grant-roy/04aa07e8160031e265d5 to your computer and use it in GitHub Desktop.

####What is a hash?

A semi-famous joke in computer science goes like this:

There are only two hard things is computer science: naming things, cache invalidation, and off by one errors.

In terms of the first diffuculty, naming things, let's think about Hash. What does that bring to mind....breakfast potatoes?..something rockstars in the 1960's smoked in Morrocco?...Twitter hash tags?

Hash is actually short for HashMap...still not very descriptive, Lets see if we can do a bit better.

As an analogy, Lets look at a hash object as the computer science equivalent of a common dictionary. When you want to know the definition of a word, you find the word in the dictionary, and look at the definition paired with it. The fundamental idea of how data is stored in these structures is that of a pairing...or association.

#####We can think of the word as a key, and we use that word to look up the associated value(which in this case definition) of the word in the dictionary. The key contains information, information that can be used to locate the value joined with that key, so the key in essence acts as a map to the data, it has the info on how to find the value.

In the preceding paragraph, the italics highlight the terms often used interchangebly to name these data structures: Key-Value stores, Associative Arrays, Maps/HashMaps,Lookup Tables, and Dictionaries.

The name Hash actually is only descriptive of the iternal function (a hashing function) that works the behind the scenes magic which makes these data structures actually efficient for storing and retreving large amounts of data.

So I would urge to create in your mind a 'Key-Value' association, that is when you here Hash in the context of ruby..associate that word with a 'Dictionary', which is a more familiar, more concrete analogy of what the data structure is, and how it works. ####Lets look at how we create and uses Hashes in Ruby

#create a new Hash object of default size
dict = Hash.new

#crete a new Hash object of a certain size
dict = Hash.new(10)

#initialize a hash object with values
states = {
    :Oregon => 'OR',
    :Florida => 'FL',
    :California => 'CA',
    :New York => 'NY',
    :Michigan => 'MI'
}

#add another entry
cities[:NY] = 'New York'

#remove an entry
cities.delete(:NY)

#fetch the value of an entry
cities.fetch(:NY)

Symbols in Ruby

Symbols in ruby are essentially immutable strings, once we create a Symbol, we cannot change its value.

When the Ruby Interpreter(the program that actually translates your code into something transitors on a chip can execute) doesn't have to worry about you changing the value of a string, it can be sure of something...the value of that string, and in genral when a compiler or interpreter can be certain of something, it can optimize it.

So this is what happens, our code gets optimized and executes faster when we use symbols. Now, can we prove it?

####So what is this buisness of a hash function then? After all, they named a really important object after it right?

def h3(key,ht_size)
    x = 0
    for i in 0..key.length-1 do
      ascii_code = key[i]
      x = 128 * x + ascii_code
    end
    return x % ht_size
  end

####Performance charachteristics of Hash vs. Array ?

If we have an array of 1000 elements, and I put the information for 'Sal' into one of the array elements....how do I look at the array and find Sal? If it's an array I have no choice but to start at the first element, and look at each element in succesion, so we get performance of O(n).

If we have a hash table of 1000 elements, and I do the same thing, by computing a hash from the key...lets say one operation, I immediately know the index to lookup. So the performance of the Hash structure does not degrade, and in fact is not affected by the size or number of elements. This we say is O(1), or constant time performance...and in fact this is the pinnacle of algorithm performance.

####Take-aways.... what do I do with this knowledge?

The important thing to remember in day to day development is when to use a Hash vs an Array, and what the performance consequences are. We can look at some benchmarks to see the implications of this.

*Scenario 1

You are an early user of Itunes,and you're really annoyed by the fact that there isn't an easy way to simply press a button and remove dupicate song titles from your music collection. So you call up apple, complain, and they say well if you can figure out a function to do it that would be great, let us know.

*Scenario 2

You retrieve 300 names from a data base record and need to convert them all to uppercase? What data structure is better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment