Skip to content

Instantly share code, notes, and snippets.

@miloops
Created October 13, 2010 13:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save miloops/624017 to your computer and use it in GitHub Desktop.
Save miloops/624017 to your computer and use it in GitHub Desktop.
class WeakHash
def initialize(cache = Hash.new)
@cache = cache
@key_map = {}
@rev_cache = Hash.new{|h,k| h[k] = {}}
@reclaim_value = lambda do |value_id|
if value = @rev_cache.delete(value_id)
value.each_key{|key| @cache.delete key}
end
end
end
def [](key)
value_id = @cache[key]
value_id && ObjectSpace._id2ref(value_id)
rescue RangeError
nil
end
def []=(key, value)
key2 = case key
when Fixnum, Symbol, true, false, nil
key
else
key.dup
end
@rev_cache[value.object_id][key2] = true
@cache[key2] = value.object_id
@key_map[key.object_id] = key2
ObjectSpace.define_finalizer(value, @reclaim_value)
end
def clear
@cache.clear
end
def delete(key)
@cache.delete(key)
end
end
module ActiveRecord
# = Active Record Identity Map
#
# Ensures that each object gets loaded only once by keeping every loaded
# object in a map. Looks up objects using the map when referring to them.
#
# More information on Identity Map pattern:
# http://www.martinfowler.com/eaaCatalog/identityMap.html
#
# == Configuration
#
# In order to disable IdentityMap, set <tt>config.active_record.identity_map = false</tt>
# in your <tt>config/application.rb</tt> file.
#
# IdentityMap is enabled by default.
#
module IdentityMap
extend ActiveSupport::Concern
class << self
attr_accessor :repositories
attr_accessor :current_repository_name
attr_accessor :enabled
def current
repositories[current_repository_name] ||= Hash.new { |h,k| h[k] = WeakHash.new }
end
def with_repository(name = :default)
old_repository = self.current_repository_name
self.current_repository_name = name
yield if block_given?
ensure
self.current_repository_name = old_repository
end
def without
old, self.enabled = self.enabled, false
yield if block_given?
ensure
self.enabled = old
end
def get(class_name, primary_key)
if obj = current[class_name.to_s.to_sym][primary_key]
return obj if obj.id == primary_key
end
nil
end
def add(record)
current[record.class.name.to_s.to_sym][record.id] = record
end
def remove(record)
current[record.class.name.to_s.to_sym].delete(record.id)
end
def clear
current.clear
end
alias enabled? enabled
alias identity_map= enabled=
end
self.repositories ||= Hash.new
self.current_repository_name ||= :default
self.enabled = true
module InstanceMethods
# Reinitialize an Identity Map model object from +coder+.
# +coder+ must contain the attributes necessary for initializing an empty
# model object.
def reinit_with(coder)
@attributes_cache = {}
dirty = @changed_attributes.keys
@attributes.update(coder['attributes'].except(*dirty))
@changed_attributes.update(coder['attributes'].slice(*dirty))
@changed_attributes.delete_if{|k,v| v.eql? @attributes[k]}
_run_find_callbacks
self
end
end
module ClassMethods
def identity_map
ActiveRecord::IdentityMap
end
end
end
end
@jfirebaugh
Copy link

Can #[] potentially return the wrong object in the following scenario?

w = WeakHash.new
k = Object.new
v = Object.new
w[k] = v
# Time passes, v is garbage collected (but not k), and then
# a new object is allocated at the same location as v
w[k] # Returns new object rather than nil

@jfirebaugh
Copy link

BTW WeakHash has_a Hash not is_a Hash. Otherwise you need to override #values etc for correct semantics.

@jfirebaugh
Copy link

Yeah, if I understand http://bit.ly/comdqj and http://bit.ly/aBd4x7 correctly, it's not possible to write a fully correct WeakHash using id2ref and finalizers. There's always a chance (tiny, but possible) that a new object will receive the same object id before the finalizer for the previous object id is run.

@miloops
Copy link
Author

miloops commented Oct 13, 2010

Added in get() conditional check to see if the id of the object is the same as the primary_key to avoid running into that issue.

@jfirebaugh
Copy link

id2ref could return any object at all in this situation. There's no guarantee that it's an object of the type you expect or that it responds to #id, or that you can do anything at all with it really.

IMHO, trying to avoid using WeakRef here is a bad idea. One of the reasons WeakRef exists is because there's no way to do something like this safely without it.

@miloops
Copy link
Author

miloops commented Oct 13, 2010

Not really, in this case we are looking inside the WeakHash of that class hash[:Post][1]... we are searching inside [:Post] so it's a Post...

@jfirebaugh
Copy link

Not necessarily.

This is the situation I'm talking about:

w = WeakHash.new
k = 1
v = Post.new
w[k] = v

At this point it's true that w[k] == v. But what about just after v is garbage collected? At that point you have a race condition. There are several possibilities:

  • The @reclaim_value finalizer is called, and then w[k] is called. In that case w[k] returns nil.
  • w[k] is called before the @reclaim_value finalizer is run, and before any other object fills the space that was occupied by v. That's the case where ObjectSpace._id2ref raises RangeError; w[k] returns nil.
  • w[k] is called before the @reclaim_value finalizer is run and after another object (not necessarily a Post, it could be anything at all) fills the space that was occupied by v (and therefore gets the same object ID v had). Now w[k] returns that other object.

You cannot assume that the finalizer runs immediately after v is garbage collected -- other objects can be allocated in the meantime, and they might receive the same object_id as you have stored in @cache, even though they are not even Posts.

@miloops
Copy link
Author

miloops commented Oct 14, 2010

First and second case you expose are not problems, it can perfectly happen that object is no longer in the map, and in that case nil will be returned and it will be fetched again.

On the last case, we could compare not only obj.id but also obj.class.name == class_name, that would prevent that 2 different objects were given the same object_id after the first one was GCed (very tiny chance) and both have same id but are from a different class ("very tiny" * 10000 chance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment