Skip to content

Instantly share code, notes, and snippets.

@mboeh
Created July 19, 2012 23:23
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mboeh/3147594 to your computer and use it in GitHub Desktop.
Save mboeh/3147594 to your computer and use it in GitHub Desktop.
Memoization and Threads

This shows how different approaches to memoization work (or don't) in different Ruby engines.

Synopsis

If you're using the idiomatic Ruby approach to memoization, like this:

def data
  @memo ||= expensive_action
end

you might not get the behavior you expect in multi-threaded environments. If more than one thread calls #data above at the same time (or before the first expensive_action is completed, to be more specific), #expensive_action will run more than once. This might cause problems for you in the following scenarios:

  1. #expensive_action is very expensive (in terms of time, API limits, usage fees, locking requirements, etc.).
  2. #expensive_action has side effects.
  3. #expensive_action may return a different value each time, and it's important that all threads see the same value.

This is important to understand: only scenario #1 is actually memoization. Memoization is a time optimization applied to an idempotent task. If you need a task to be run only once, you need to combine synchronization with an indicator that the task has been completed. In single-threaded Ruby, ||= is a convenient tool to achieve both goals. In multi-threaded Ruby, you need to provide the synchronization yourself.

In all common Ruby environments, a 'sleepy' task (one which is IO-bound, usually) must be synchronized to be memoized efficiently. However, in GIL environments (like YARV), you can memoize a CPU-bound task without using synchronization.

Keep in mind that this is only a problem for the run-time of the expensive task. If you have a method you expect to be called intermittently by many threads, or if the cost of running it several times is not much worse than the cost of running it once, you don't need to worry about this.

A convenient way to synchronize and memoize at the same time (see below for caveats) is the following:

class SyncedAndMemoized
  include MonitorMixin
  
  def memoized
    synchronize { @memo ||= expensive_task }
  end
  # ...
end

The test

I have three classes demonstrating different tasks and synchronization approaches. For each one, I spin up 10 threads, each of which call a method which attempts to memoize the results of an expensive task. That method also prints a message, which allows you to see how many times the expensive task was executed.

SleepMemo attempts to memoize a 'sleepy' task without synchronization. The task is sleeping 1 second. In all Ruby environments, the memoization fails, because even Ruby 1.8 yields to other threads during a sleep (or during IO).

BusyMemo attempts to memoize a 'busy' task without synchronization. The task is computing the factorial of 10000. In GIL environments (REE 1.8, YARV 1.9) the task only runs once because the first thread to start the task blocks all other threads until it's done. In non-GIL environments (JRuby, Rubinius 2) the task runs more than once.

SyncMemo uses the standard Ruby library 'monitor' to provide synchronization to the memoization task. If you only have one method to memoize, or if it's acceptable to share the lock between all methods in the class, this is a good approach. Otherwise, you'll have to create your own Mutex or Monitor objects to do the job. The task runs only once in all Ruby environments, regardless of the threading model, existence of a GIL, length of the task, busyness/sleepiness of the task, etc. This approach incurs a synchronization cost, but it is probably cheaper than running your expensive task more than once.

Addendum

This is not necessarily simply a problem of threading. If you're using an evented concurrency system that might yield during the execution of the expensive task (em-synchrony, for instance), you're still at risk.

require 'monitor'
class SleepMemo
def get_me_that
@that ||= go_fetch_that
end
private
def go_fetch_that
print "SleepMemo: Fetching that\n"
sleep 1
:that
end
end
class BusyMemo
def get_me_that
@that ||= go_fetch_that
end
private
def go_fetch_that
print "BusyMemo: Fetching that\n"
j = 1
10000.times do |i|
j *= i
end
:that
end
end
class SyncMemo
include MonitorMixin
def get_me_that
synchronize do
@that ||= go_fetch_that
end
end
private
def go_fetch_that
print "SyncMemo: Fetching that\n"
sleep 1
:that
end
end
puts "Ruby version: #{RUBY_ENGINE} #{RUBY_VERSION}"
[SleepMemo, BusyMemo, SyncMemo].each do |klass|
puts "Sequential #{klass}:"
memo = klass.new
10.times do
memo.get_me_that
end
puts
puts "Threaded #{klass}:"
memo = klass.new
ths = []
10.times do
ths << Thread.new do
memo.get_me_that
end
end
ths.each(&:join)
puts
end
Ruby version: jruby 1.8.7
Sequential SleepMemo:
SleepMemo: Fetching that
Threaded SleepMemo:
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
Sequential BusyMemo:
BusyMemo: Fetching that
Threaded BusyMemo:
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
Sequential SyncMemo:
SyncMemo: Fetching that
Threaded SyncMemo:
SyncMemo: Fetching that
Ruby version: rbx 1.8.7
Sequential SleepMemo:
SleepMemo: Fetching that
Threaded SleepMemo:
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
Sequential BusyMemo:
BusyMemo: Fetching that
Threaded BusyMemo:
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
BusyMemo: Fetching that
Sequential SyncMemo:
SyncMemo: Fetching that
Threaded SyncMemo:
SyncMemo: Fetching that
Ruby version: ruby 1.8.7
Sequential SleepMemo:
SleepMemo: Fetching that
Threaded SleepMemo:
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
Sequential BusyMemo:
BusyMemo: Fetching that
Threaded BusyMemo:
BusyMemo: Fetching that
Sequential SyncMemo:
SyncMemo: Fetching that
Threaded SyncMemo:
SyncMemo: Fetching that
Ruby version: ruby 1.9.3
Sequential SleepMemo:
SleepMemo: Fetching that
Threaded SleepMemo:
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
SleepMemo: Fetching that
Sequential BusyMemo:
BusyMemo: Fetching that
Threaded BusyMemo:
BusyMemo: Fetching that
Sequential SyncMemo:
SyncMemo: Fetching that
Threaded SyncMemo:
SyncMemo: Fetching that
#!/bin/sh
for ng in ree-1.8.7-2011.03 jruby-1.6.6 rbx-head ruby-1.9.3-p125; do
rvm $ng do ruby memo-threads.rb > results-$ng.txt;
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment