Skip to content

Instantly share code, notes, and snippets.

@richmolj
Last active December 18, 2015 14:08
Show Gist options
  • Save richmolj/5794539 to your computer and use it in GitHub Desktop.
Save richmolj/5794539 to your computer and use it in GitHub Desktop.
Dependency Injection
# Everyone should like some amount of Dependency Injection in Ruby.
# How much you like usually depends on how much importance you place on
# two competing values:
#
# 1. Build extensibly, loosely coupled components so you can properly
# adapt as your application scales. And things that start simple usually
# end up complex, so keep a solid foundation (read: solid OOP foundation
# in these concepts early on).
#
# and
#
# 2. Build the simplest thing possible, because YAGNI.
#
# I tend to agree with Jeff Atwood's perspective:
# http://www.codinghorror.com/blog/2004/10/kiss-and-yagni.html
# And with probably the seminal post on DI in ruby by Jamis Buck:
# http://weblog.jamisbuck.org/2008/11/9/legos-play-doh-and-programming
# And follow-up http://weblog.jamisbuck.org/2007/7/29/net-ssh-revisited
#
# DI isn't without benefits, but I feel proponents tend to turn a blind
# eye to the tradeoffs.
#
# DI is an obtrusive level of indirection
# Example from Objects on Rails:
class Post
def publish(clock = DateTime)
self.pubdate = clock.now
blog.add_entry(self)
end
end
# Set the published date on the post when adding to a blog.
# This is great because if we want to add an alternate strategy
# we just sub-out the injected class with a different one. For instance,
# publish after a review period:
class DelayClock
def now
DateTime.now + 24.hours
end
end
@post.publish(DelayClock.new)
# So that's the upside. The downside is our mental model goes from
# "A post becomes published in 24 hours after it is added" to
# "A blog entry could be a post, or something else. In the event it's
# a post, it will be published some period after saving. Look up the
# injected strategy to figure out when".
#
# This is only going to get more complex as we keep adding classes and
# treat everything as an abstraction. Abstraction == level of indirection
# == added complexity. I recommend looking at Net::SSH 1.1.4 (pointed
# out in the blog post above) for a larger, real-world example
#
# When this is a good thing
class Post
def initialize(backend)
@backend = backend
end
def all
@backend.all
end
end
Post.new(DB::Post).new
Post.new(Service::Post).new
# This is great because we have a subbable backend we can inject
# *and we actually need to use one or the other*. It's the difference
# between "we need to ways of finding Posts" and "we MIGHT need two ways
# of finding posts". Not only that, but the obtrusiveness of DI is actually a
# benefit: the code is screaming, "HEY! This is specially configured to support
# multiple strategies!"
#
# Service Locator Pattern
# I was disapponted Objects on Rails didn't touch on this, because a) it's the
# alternative to DI in acheiving the D in SOLID and b) it's more common in the
# ruby community.
#
# It's also usually less heavy-handed than DI. This post has a great
# example that progresses from DI to Service Locator:
# http://kresimirbojcic.com/2011/11/19/dependency-injection-in-ruby.html
#
# Let's say we want all Net::HTTP requests to have a configurable timeout. We could
# inject this:
class LongTimeout
def self.setup
http = Net::HTTP.new
http.read_timeout = 99999
http
end
end
SomeClass.new(LongTimeout.method(:setup))
# But we have to change the constructor to support this injection, and
# it's a little clunky. We can get the same modularily from
class Configuration
@settings = {}
def self.[](key)
@settings[key]
end
def self.[]=(key, value)
@settings[key] = value
end
end
Configuration.settings[:http] = proc {
http = Net::HTTP.new
http.read_timeout = 99999
http
}
class SomeClass
def query_google
Configuration[:http].get 'http://google.com'
end
end
# Which is basically the same thing as DI but less obtrusive and more commonly seen.
# This will also help when we want to use 3rd-party libraries that aren't expecting
# you to dependency inject everything. Carrierwave suggests overriding store_dir
# within an Uploader:
class MyUploader < CarrierWave::Uploader::Base
def store_dir
Configuration[:store_dir]
end
end
# Instead of subclassing MyUploader, modifying the constructor, and passing it
# in (or using a factory) we just reference Configuration. In addition,
# CarrierWave associates an Uploader with an ActiveRecord class by:
mount_uploader :avatar, AvatarUploader
# We don't have to look at mount_uploader and make sure our custom instance will
# work correctly. We just use Carrierwave like normal.
# This pattern is so popular you'll often see this abstracted in many gems:
# http://robots.thoughtbot.com/post/344833329/mygem-configure-block
#
# Service Locator vs DI
#
# Once again, I use DI when I WANT its obtrusiveness. If I really do have an
# abstracted domain where things are being subbed in and out, I want to call
# attention to this. If I just want a damn timeout or per_page setting, I'll
# use Service Locator.
#
# The Problem With Service Locator
#
# Configuration is a mutable global state, so if we want this value dependent
# on the request, we'll have clashes.
#
# Sidekiq (think multithreaded Resque) has this problem. It starts a bunch of
# workers in the same process. Each worker should be able to write to a common log,
# but it would be useful to include some metadata about the worker
# and about the job it's working on, so we can get something like this:
"worker 1 on job foo : Hello World!"
# It'd be pretty heavy-handed to pass around a custom logger to every class that would
# like to call Sidekiq.logger.info. This is how it's done (altered for brevity):
module Sidekiq
module Logging
class Pretty < Logger::Formatter
def call(message)
"#{Thread.current[:sidekiq_context]} : #{message}"
end
end
def self.with_context(msg)
begin
Thread.current[:sidekiq_context] = msg
yield
ensure
Thread.current[:sidekiq_context] = nil
end
end
end
end
Sidekiq::Logging.with_context("#{worker.class.to_s} JobID #{item['jid']}") do
Sidekiq.logger "Hello world!"
end
# with_context sets a thread local, a value only accessible to the current thread.
# we no longer have global mutable state, but we do have configuration we can reference
# anywhere. Take the same concept and apply to our Carrierwave example:
class ConfigurationMiddleware
def call(env)
begin
Thread.current[:__configuration__] = {:store_dir = something_via_request}
@app.call(env)
ensure
Thread.current[:__configuration__] = nil
end
end
end
class Configuration
def self.[](key)
Thread.current[:__configuration__][key]
end
end
class MyUploader < CarrierWave::Uploader::Base
def store_dir
Configuration[:store_dir]
end
end
# So now we have a Service Locator that *does not have mutable global state*.
# We have created a request local.
#
# Time Zones
#
# Rails takes the Thread.current idea and uses it without either pattern. This
# is a great example of how run-time configuration can be so simple you don't
# even have to think about it. Let's say we want time zones to be specific to
# the user. Straight from the Rails docs:
around_filter :set_time_zone
def set_time_zone
begin
old_time_zone = Time.zone
Time.zone = current_user.time_zone if logged_in?
yield
ensure
Time.zone = old_time_zone
end
end
# Underneath the hood you'll find (altered for simplicity):
def self.zone=(val)
Thread.current[:time_zone] = val
end
def self.zone
Thread.current[:time_zone]
end
# DI is often useful. But I don't usually find that the problem with a piece
# of legacy code is that it's well-written and understandable but tightly coupled.
# I do often find hard-to-navigate levels of indirection and abstraction (often
# without even bringing DI into the mix). That's why my experience has me value
# the KISS principle over dogmatic OOP. While both have their problems, I personally
# have less with the format than the latter, which is why I value it more.
@AndrewO
Copy link

AndrewO commented Jun 18, 2013

# Thread local variables are still global mutable state, they just happen to be local to the current thread.
# They don't have the same concurrency problems as $global variables, but they have the exact same visibility
# problems. Take the following code.

class KittenSidekicks
  def self.all
    kittens = load_a_bunch_of_cute_kittens
    # Using a funny spelling of "sidekick" so one else clobers it.
    Thread.current[:sidekiq_context] = kittens
    kittens
  end
end

@AndrewO
Copy link

AndrewO commented Jun 18, 2013

Re the CarrierWave example: we can do all this without a thread local global variable in a way
that leads to smaller classes with fewer responsibilities:

Step 1: Put the configuration on the Rack Env rather than a thread local

class ConfigurationMiddleware
  def call(env)
    env[:__configuration__] = {:store_dir = something_via_request}
    @app.call(env)
  end
end

class MyUploader < CarrierWave::Uploader::Base
  def store_dir
    model.store_dir
    # If this seems messy because you're wondering where "model" came
    # from, you're right: CarrierWave::Uploader::Base is a super
    # object pulling in 15 modules, one of which has an initialize
    # method and "model" accessor in it.
  end
end

class SomeClass < Struct.new(:store_dir, :avatar)
  # For the sake of demonstration, let's assume
  # this works just like ActiveRecord's mount command
  mount_uploader :avatar, :my_uploader
end

class MyApp
  def call(env)
    SomeClass.new(env[:__configuration__][:store_dir], some_input)
  end
end

Step 2: Split data access from data representation

If we abandon the idea that uploading needs to be done by what's otherwise a dumb data structure, we start to
centralize our stateful code.

Note that we also now have a domain object that can be reused in any context (console, tests, rake tasks, messaging, persistance) with no setup.

(ConfigurationMiddleware is the same as above and assumed to be in the rack stack. Forget all others.)

SomeClass = Struct.new(:avatar)

class MyUploader < CarrierWave::Uploader::Base
  # Turns out, CarrierWave defines store_dir as an accessor that
  # defaults to a class configuration.
end

class MyApp
  def call(env)
    data = SomeClass.new(some_input)
    get_uploader.call(
      env[:__configuration__][:store_dir]
    ).store!(date.avatar) # Or we could pass the uploader into a method on SomeClass
  end

  private
  def get_uploader(store_dir)
    MyUploader.new.tap {|u| u.store_dir = store_dir }
  end
end

Step 3: Set up the uploader upstream from the app

Why don't we just create the uploader in the place where it can vary?

(SomeClass and MyUploader are the same as the last example)

class SetupUploader
  def call(env)
    env[:avatar_uploader] = MyUploader.new.tap {|u| u.store_dir = something_via_request }
    @app.call(env)
  end
end

class MyApp
  def call(env)
    data = SomeClass.new(some_input)
    env[:avatar_uploader].store!(data.avatar)
    some_response
  end
end

The end result is that we have a few small classes that can be tested and reused in complete isolation, using only their parameters and method calls.

@ljsc
Copy link

ljsc commented Jun 18, 2013

I think I'm starting out pretty firmly in the +1 DI column. Response/thoughts inline below:

# lr> DI isn't without benefits, but I feel proponents tend to turn a
# lr> blind eye to the tradeoffs. DI is an obtrusive level of
# lr> indirection
#
# But DI has nothing to do with indirection, per se. You already have
# indirection b/c you are delegating to a collaborator. You can make the
# argument whether that is an appropriate use of indirection or not, but
# it has nothing to do with instanciating the dependency, which is
# what DI is all about.

# Example from Objects on Rails:

class Post
  def publish(clock = DateTime)
    self.pubdate = clock.now
    blog.add_entry(self)
  end
end

class DelayClock
  def now
    DateTime.now + 24.hours
  end
end
@post.publish(DelayClock.new)

# Granted that you are correct about the overhead of the delegation
# here; however, it's use seems entirely appropriate to me in this
# instance. Consider you mentioned SOLID a few paragraphs down. The
# "extra" abstraction in this example can be argued for citing the SRP,
# and the OCP, (and possibly LSP if you consider ducktypes subtypes).

# lr> The downside is our mental model goes from "A post becomes published
# lr> in 24 hours after it is added" to "A blog entry could be a post, or
# lr> something else. In the event it's a post, it will be published some
# lr> period after saving. Look up the injected strategy to figure out
# lr> when".

# Again, DI doesn't do anything to change the mental model. If you had
# hard coded the first line of the publish method to use DateTime
# directly, you'd still have the indirection.

# lr> This is only going to get more complex as we keep adding classes and
# lr> treat everything as an abstraction. Abstraction == level of indirection
# lr> == added complexity.
#
# I disagree. Abstraction does not mean added complexity, because
# abstraction is not the same thing as indirection. The whole purpose of
# creating an abstraction is to remove complexity, not add it. If the
# particular abstraction is complicating things, I would suggest that
# perhaps we have the wrong abstraction.
#
# In this case, I think it is appropriate. When you're looking at this
# code you shouldn't be thinking about pubdates at all. If you are, then
# you are trying to poke a hole in the abstraction---The post shouldn't
# care, that's the responsibity of the clock.
#
# Is there a cost to to this indirection? Sure. Obviously, there's a
# cost/benefit analysis that has to be taken up here, but I think that
# generally these types of abstractions are worth it because they enable
# composibility. Work out the interface for the collaboration, identify
# the invariants, then forget about the extra complexity b/c you've
# sealed that away in a black box.
#
# This will help us in our quest to have more reusable code.

class Post
  def initialize(backend)
    @backend = backend
  end

  def all
    @backend.all
  end
end
Post.new(DB::Post).new
Post.new(Service::Post).new

# lr> This is great because we have a subbable backend we can inject
# lr> *and we actually need to use one or the other*. It's the difference
# lr> between "we need to ways of finding Posts" and "we MIGHT need two ways
# lr> of finding posts". Not only that, but the obtrusiveness of DI is actually a
# lr> benefit: the code is screaming, "HEY! This is specially configured to support
# lr> multiple strategies!"
#
# In one hundred percent agreement here. I only contend that this way of
# being explicit about your dependencies is almost always worth the
# cost, IMNSHO.

## Service Locator Pattern

class Configuration
  @settings = {}

  def self.[](key)
    @settings[key]
  end

  def self.[]=(key, value)
    @settings[key] = value
  end
end

Configuration.settings[:http] = proc {
  http = Net::HTTP.new
  http.read_timeout = 99999
  http
}

class SomeClass
  def query_google
    Configuration[:http].get 'http://google.com'
  end
end

# lr> Which is basically the same thing as DI but less obtrusive and more commonly seen.

# Sure, but this forces you to only have one value for the
# configuration. There are some ways around it, as pointed out below,
# but it means having to deal with spooky action at a distance.

# lr> This will also help when we want to use 3rd-party libraries that aren't expecting
# lr> you to dependency inject everything.

# How so? Either they use DI or a service locator, it shouldn't have
# anything to do with how your code handles dependencies, no?

# lr> This pattern is so popular you'll often see this abstracted in many gems:
# lr> http://robots.thoughtbot.com/post/344833329/mygem-configure-block

# Agreed, it works for things that are truely one time settings. If it's
# really a configuration option for the system as a whole, it's a good
# pattern. But it certainly hampers the reusability of the code if you
# need more than one setting. See comments below.

## Service Locator vs DI
#
# The Problem With Service Locator
#
# lr> Configuration is a mutable global state, so if we want this value dependent
# lr> on the request, we'll have clashes.
#
# Yes. This not a minor drawback.
#
# lr> Sidekiq (think multithreaded Resque) has this problem. It starts a bunch of
# lr> workers in the same process. Each worker should be able to write to a common log,
# lr> but it would be useful to include some metadata about the worker
# lr> and about the job it's working on, so we can get something like this:

"worker 1 on job foo : Hello World!"

# lr> It'd be pretty heavy-handed to pass around a custom logger to every class that would
# lr> like to call Sidekiq.logger.info. This is how it's done (altered for brevity):

module Sidekiq
  module Logging
    class Pretty < Logger::Formatter
      def call(message)
        "#{Thread.current[:sidekiq_context]} : #{message}"
      end
    end

    def self.with_context(msg)
      begin
        Thread.current[:sidekiq_context] = msg
        yield
      ensure
        Thread.current[:sidekiq_context] = nil
      end
    end
  end
end


Sidekiq::Logging.with_context("#{worker.class.to_s} JobID #{item['jid']}") do
  Sidekiq.logger "Hello world!"
end

# This is not something I think we should be doing. The problem is
# that the `with_context` call is not guarenteed to be located lexically
# close to the point where you call the logger. First of all, you have
# no clue that the context call is actually modifying #logger without
# a priori knowledge of it's implementation. Secondly, whenever you see a
# call to logger, to know what it's doing you actually have to trace the
# call stack all the way back to the entry point of the program to find
# out where the configuration was last set. Moreover, this is impossible
# to do this statically.

# lr> with_context sets a thread local, a value only accessible to the current thread.
# lr> we no longer have global mutable state, but we do have configuration we can reference
# lr> anywhere... So now we have a Service Locator that *does not have
# lr> mutable global state*.  # We have created a request local.

# The variable most certainly is still global, we've just created a way
# to give it dynamic extent. Most modern languages implement
# lexical scoping over dynamic scoping b/c it's very easy to shoot
# yourself in the foot with the latter.

# lr> DI is often useful. But I don't usually find that the problem with a piece
# lr> of legacy code is that it's well-written and understandable but tightly coupled.
# lr> I do often find hard-to-navigate levels of indirection and abstraction (often
# lr> without even bringing DI into the mix). That's why my experience has me value
# lr> the KISS principle over dogmatic OOP. While both have their problems, I personally
# lr> have less with the format than the latter, which is why I value it more.

# I agree that you can go overboard with countless levels of
# indirection, but in my experiance the more common problem is using the
# wrong/non-sensible abstractions. DI encourages thinking about the
# contracts between collaborating objects, which I think more often than
# not encourages better factored code.

@richmolj
Copy link
Author

@ljsc - I'll try to address this more in-depth, but I think

http://therealadam.com/2013/01/03/design-for-test-vs-design-for-api

sums it my thoughts. I don't care too much about his test-vs-api point, but I do strongly agree with his examples that DI adds complexity (side point: we have different definitions of 'indirection', this illustrates what mine is) and is therefore often not worth it.

I expect you disagree with the same example, which is fine as this is pretty subjective. How strongly you agree/disagree with his point likely has a high correlation with each point above.

@ljsc
Copy link

ljsc commented Jun 18, 2013

Sure, I think that's right. I guess it's all what you optimize for, and my preference is to aim for composability wherever reasonably possible. But I think that post offers up some nice points for enriching the discussion. Namely, what do we think of the descriptions the author offers of his own API examples? Personally I think he botched it, and here's why:

  1. This one is definitely arguable, but I think that he's digging too much into implementation details. Here:

Publishes the current post. The created_at timestamp is set to the current time. Returns the created_at timestamp.

should setting the created_at timestamp be documented here? I'm skeptical. If our goal is to document the api, we shouldn't be describing the guts of the method itself. What happens when we change the implementation? This mindset leads to the abstraction hole-poking I was describing above.
2. Failure to identify and document types (abstractions):

Publishes the current post. By default, created_at is set to the current time. Optionally, callers may pass in a Time object, or any object that returns a Time object when sent the now message. The created_at column is set according to that Time value. Returns the value of the created_at timestamp.

This is the wrong place to be describing the interface for the parameter. That's why you have types. We should just say it takes a Timeish and be done with it. Elsewhere in the documentation, you say what a Timeish does. This might seem like splitting hairs, but I think it's actually a really important distinction.

It speaks to I think the problem with Ruby's dynamic nature---yes it's convenient and can help get things done, but it also tends to lure programmers into the trap that there are no types at all. Even though there aren't any annotations, the types are still there, and not thinking about them leads to ignoring the contracts, and thus abstraction Swiss cheese.

This is normally the point where somebody says, hey duck-types to the rescue. This never really prevents things from running off the tracks though, b/c rather than documenting unambiguously somewhere in the system that a duck #quacks, and it #swims, we instead just throw the duck in the water and see if it drowns. See:

There’s some further potential trouble lurking in this API. What if a caller passes in the wrong kind of Time object? What if sending the now message raises an exception?

When I'm looking at any level of abstraction, I want to know what the inputs are to the thing, what the outputs are, and most importantly the invariants. I suggest that those are the most important items for being able to efficiently reason about code. Having DI makes us think about our dependencies, where they're used, and what we expect from them, explicitly.

Gregory Brown gives a really good look into some of these issues in Practicing Ruby article. It's a really good read, so I'll finish by offering that up and then step back and let some others offer thoughts.

@richmolj
Copy link
Author

@AndrewO - I had a response I accidentally deleted, because the iPhone has taught me to immediately click OK whenever I get any sort of confirmation pop-up. In any case:

The Kitten example is true, but I can also have conflicting middleware that both set the same key on env, or conflicting gem modules that both add the same method name. In any case, gems like Rails, Sunspot, Authlogic, Paper Trail, Sidekiq, etc are all using this same pattern. It's not stopping them from being widely used.

For the Carrierwave example, I feel like we're talking about different things. Yes, I can add middleware/initialization, use the env hash. And also add some fake env setup/handling for when we're not running through Rack (console, cron job, etc). My point was: I just want to use Carrierwave, not reconstruct my app and learn the innards of Carrierwave.

I believe you note this when referring to how 'model' comes out of nowhere. Whether we like Carrierwave's implementation seems irrelevant to me. It's an example of a popular gem we may want to use, and Joe Developer now needs to open up the source code and figure out how to rejigger the system, instead of just referring to Configuration and moving on.

@richmolj
Copy link
Author

I had a side-discussion with Lou, which I think is the best, most concise summary of each viewpoint. Please see https://gist.github.com/richmolj/5815196, I think it really distills the debate to its core concepts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment