mwylde/hash.rb

## hash.rb
# Oftentimes, we want to create lists of things that have something in
# common. For example we may have a list of objects, each with a
# name. We want to count how many times each name occurs. We could do
# so with a hash, like this:

h = {}
a.each{|o|
  # make sure there's a value here
  h[o.name] ||= 0
  h[o.name] += 1
}

# But it turns out there's an easier way, using the less common
# Hash.new method:

h = Hash.new(0)
a.each{|o| h[o.name] += 1}

# (or using a reduce to get it down to one line)

h = a.reduce(Hash.new(0)){|h, o| h[o.name] += 1; h}

# What's going on here? When we supply an argument to Hash.new, we
# provide a default value that's used for any key that's not already
# in the hash. When we omit the argument (or use the short-hand form
# {}) the default value is set to nil.

g = Hash.new(5)
h = {}
g["nancy"] = 10

g["nancy"] # => 10
g["sean"]  # => 5
h["sean"]  # => nil

# However, care should be taken with using normal objects as default
# values, as the following example shows:

h = Hash.new([])
h["mary"] << 5
h["abigail"] << 10

h["mary"]    # => [5, 10]
h["abigail"] # => [5, 10]

# What's going on here? We're setting a particular object--an instance
# of the Array class--to the default value. However, this means that
# the exact same array is used for every index of the Hash. Needless
# to say, this probably isn't what you want.  But there is another,
# more powerful form of Hash.new that can solve this exact problem:
# Hash.new {|hash, key| ... }. Using the block form, we can fix our
# above example:

h = Hash.new{|hash, key| hash[key] = []}

h["mary"] << 5
h["abigail"] << 10

h["mary"]    # => [5]
h["abigail"] # => [10]

# Now we're producing a new array element for every key. This sort of
# thing can be useful if we have a list of objects each with names and
# values, and we want a list of the values corresponding to each
# name. We can now write:

h = Hash.new{|h, k| h[k] = []}
a.each{|o| h[o.name] << h.value}

# The possibilities are endless. Effectively the block form makes
# hashes into lightweight methods much like proc and lambda, with the
# added benefit that we get memoization for free. For example, lets
# say we want a function to calculate the fibonacci numbers. We all
# know the formula: f(n) = f(n-1) + f(n-2). We can compute it
# naievely like this:

def f(n)
  case n
  when 0 then 1
  when 1 then 1
  else
    f(n-1) + f(n-2)
  end
end

# This works fine for small values of n. f(20) takes less than three
# hundreths of a second on my MacBook Pro. But it quickly becomes
# useless: f(30) takes 0.34 seconds, but I got bored waiting for f(40)
# to finish. Is calculating the Fibbonaci sequence fundamently a hard
# problem, where we can't expect to find a good solution? Or do we
# just need to be smarter about it? Consider the following expansion,
# of the n = 5 case:

f(5) = f(4) + f(3)
     = f(3) + f(2) + f(2) + f(1)
     = f(2) + f(1) + f(1) + f(0) + f(1) + f(0) + f(1)
     = f(1) + f(0) + f(1) + f(1) + f(0) + f(1) + f(0) + f(1)
     = 8

# While we got the right answer, notice how we kept repeating the same
# calculations. Even in the trivial case of n = 5, we calculated f(3)
# twice, and f(2) three times. Since we know that larger values of the
# problem depend on the values of smaller values, we can save
# ourselves an enormous amount of time my memoizing the function,
# which simply means saving the results of our earlier runs for
# later. The Ruby hash is a natural at this. Here's a much more
# effecient and yet concise implementation of the above:

f = Hash.new do |h, k|
  case k
  when 0 then h[0] = 1
  when 1 then h[1] = 1
  else
    h[k] = h[k-2] + h[k-1]
  end
end

# With our new and improved Fib, we can easily compute f(5000):
def time; t = Time.now; yield ; Time.now - t; end
time{f[5000]} # => 0.033939

# I should take this time to note that there is a very simple
# iterative algorithm for fibbonaci, and you should never actually use
# a recursive algorithm to compute it. However, this technique can be
# very useful for the many recursive algorithms which can not be
# trivially made iterative.
	# Oftentimes, we want to create lists of things that have something in
	# common. For example we may have a list of objects, each with a
	# name. We want to count how many times each name occurs. We could do
	# so with a hash, like this:

	h = {}
	a.each{\|o\|
	# make sure there's a value here
	h[o.name] \|\|= 0
	h[o.name] += 1
	}

	# But it turns out there's an easier way, using the less common
	# Hash.new method:

	h = Hash.new(0)
	a.each{\|o\| h[o.name] += 1}

	# (or using a reduce to get it down to one line)

	h = a.reduce(Hash.new(0)){\|h, o\| h[o.name] += 1; h}

	# What's going on here? When we supply an argument to Hash.new, we
	# provide a default value that's used for any key that's not already
	# in the hash. When we omit the argument (or use the short-hand form
	# {}) the default value is set to nil.

	g = Hash.new(5)
	h = {}
	g["nancy"] = 10

	g["nancy"] # => 10
	g["sean"] # => 5
	h["sean"] # => nil

	# However, care should be taken with using normal objects as default
	# values, as the following example shows:

	h = Hash.new([])
	h["mary"] << 5
	h["abigail"] << 10

	h["mary"] # => [5, 10]
	h["abigail"] # => [5, 10]

	# What's going on here? We're setting a particular object--an instance
	# of the Array class--to the default value. However, this means that
	# the exact same array is used for every index of the Hash. Needless
	# to say, this probably isn't what you want. But there is another,
	# more powerful form of Hash.new that can solve this exact problem:
	# Hash.new {\|hash, key\| ... }. Using the block form, we can fix our
	# above example:

	h = Hash.new{\|hash, key\| hash[key] = []}

	h["mary"] << 5
	h["abigail"] << 10

	h["mary"] # => [5]
	h["abigail"] # => [10]

	# Now we're producing a new array element for every key. This sort of
	# thing can be useful if we have a list of objects each with names and
	# values, and we want a list of the values corresponding to each
	# name. We can now write:

	h = Hash.new{\|h, k\| h[k] = []}
	a.each{\|o\| h[o.name] << h.value}

	# The possibilities are endless. Effectively the block form makes
	# hashes into lightweight methods much like proc and lambda, with the
	# added benefit that we get memoization for free. For example, lets
	# say we want a function to calculate the fibonacci numbers. We all
	# know the formula: f(n) = f(n-1) + f(n-2). We can compute it
	# naievely like this:

	def f(n)
	case n
	when 0 then 1
	when 1 then 1
	else
	f(n-1) + f(n-2)
	end
	end

	# This works fine for small values of n. f(20) takes less than three
	# hundreths of a second on my MacBook Pro. But it quickly becomes
	# useless: f(30) takes 0.34 seconds, but I got bored waiting for f(40)
	# to finish. Is calculating the Fibbonaci sequence fundamently a hard
	# problem, where we can't expect to find a good solution? Or do we
	# just need to be smarter about it? Consider the following expansion,
	# of the n = 5 case:

	f(5) = f(4) + f(3)
	= f(3) + f(2) + f(2) + f(1)
	= f(2) + f(1) + f(1) + f(0) + f(1) + f(0) + f(1)
	= f(1) + f(0) + f(1) + f(1) + f(0) + f(1) + f(0) + f(1)
	= 8

	# While we got the right answer, notice how we kept repeating the same
	# calculations. Even in the trivial case of n = 5, we calculated f(3)
	# twice, and f(2) three times. Since we know that larger values of the
	# problem depend on the values of smaller values, we can save
	# ourselves an enormous amount of time my memoizing the function,
	# which simply means saving the results of our earlier runs for
	# later. The Ruby hash is a natural at this. Here's a much more
	# effecient and yet concise implementation of the above:

	f = Hash.new do \|h, k\|
	case k
	when 0 then h[0] = 1
	when 1 then h[1] = 1
	else
	h[k] = h[k-2] + h[k-1]
	end
	end

	# With our new and improved Fib, we can easily compute f(5000):
	def time; t = Time.now; yield ; Time.now - t; end
	time{f[5000]} # => 0.033939

	# I should take this time to note that there is a very simple
	# iterative algorithm for fibbonaci, and you should never actually use
	# a recursive algorithm to compute it. However, this technique can be
	# very useful for the many recursive algorithms which can not be
	# trivially made iterative.