danielsdeleo/chef-3694-proposal.md Secret

## chef-3694-proposal.md

      
    Raw
  

              chef-3694-proposal.md
            
          
    Solving CHEF-3694

Problem(s)

Resource Naming Collisions

The main problem here is that something has to happen when the user
creates two resources with the same type and name:
log "danger" do
  message "deleting all the backups!"
end

log "danger" do
  message "restarting a service, you might get paged"
end

Currently, Chef will copy the first resource, omitting some elements,
but this is pretty confusing for non-power users (especially the
question about whether not_if and only_if guards get copied). It
also violates an implicit assumption that resources are unique by type
and name:
file "/tmp/whatever" do
  content rand.to_s
  notifies :write, "log[danger]"
end

What message does this print? How would you get it to print the other
one?
Resource Namespacing

Another problem we may want to address at the same time as CHEF-3694 is
that there is a single namespace for resources (the resource
collection). The example above (with the two log resources) is silly
because they're right next to each other, but as it stands now, they
could be in different cookbooks (perhaps community cookbooks for
different pieces of software, written by different authors and combined
by the user). In the current implementation, you probably won't notice
unless you're trying to notify the duplicated resource and your
notifications are going to the wrong resource. Depending on how the
primary question (what to do with resource collisions) is answered, we
may want to do something to reduce the "collision domain" for resources.
Multiple Resource Actions at Different Times

The current ("resource cloning") behavior is the way it is in order to
provide a feature like "notifications as a recipe-level construct." From
the original ticket:
service "monkeypants" do
  supports :reload => true
  action :enable
end

# configure "monkeypants"

notify :start, resources(:service => "monkeypants"), :immediately

Given that you can now notify resources where the notify-ee is defined
after the notify-er resource, the above example would be better written
as action :enable, :start on the service resource, which would be
declared after the templates, etc. that configure it.
There are more complex cases where this feature is handy. For example:
service "mysql" do
  action :stop
end

execute "mv /var/lib/mysql /mnt/mysql"

service "mysql" do
  action :start
end

This code is in the EC2 recipe for mysql. The mysql service may be
configured with particular stop/start/restart commands, etc., so this
can only work correctly when the service[mysql] resource knows about
those settings.
SRP Side Discussion

If we look at this problem with our OO-design hat on, we'll see that
this problem looks like a violation of the single responsibility
principle that's baked deeply into Chef. When we declare a resource in a
recipe, we're doing two things:

Creating a data structure describing some configurable aspect of the
system.
Adding an item to Chef's ordered list of configuration actions to
take.

We see this in resource declarations, where we specify an action on the
resource itself; we also see this in the resource collection, where we
have an ordered list of resources combined with their associated
actions. To use the cooking analogy, we've combined the ingredients list
with the preparation.
Thought Experiment

Let's imagine we've completely separated the ingredients from the
recipe. It could look like this:
# "ingredients.rb"

file "/tmp/foo" do
  content "blah"
  mode "0644"
  # action :nothing (implicit)
end

execute "run-a-command" do
  # action :nothing (implicit)
end

service "apache2" do
  restart_command "sv 1 apache2"
  # action :nothing (implicit)
end

# "recipe.rb"

converge(:create, "file[/tmp/foo]").and_if_changed do
  # These replace notifications
  converge(:execute, "execute[run-a-command]")
  converge(:reloade, "service[apache2]
end
This has some neat properties, but has big downsides:

For the 80% (90% ?) use case, it's more verbose than Chef is now,
since you only need this level of control over ordering in a small
subset of Chef use cases.
It breaks all Chef code ever.

Though I'd personally enjoy exploring this design space, these two
downsides are too enormous to seriously consider implementing the above.
That said, we can implement Chef-as-it-is-now on top of these ideas.
Proposed Solution

Separate resource declaration from adding action to the ordered list:

Internally separate resource declaration from
append-a-resource-to-the-todo-list. The current resource collection
will be split into a "resource set" and "action list."
Normal resource declarations create a resource, add it to the
"resource set", and insert a [resource, action] pair into the "action
list." action :nothing could be a shortcut for not inserting the
resource into the action list, or there could be some other means to
accomplish this, such as a generic declare_resource method.
Add a DSL method for appending a [resource, action] pair into the
"action list." This replaces the functionality provided by resource
cloning.

This leaves the question of what to do when conflicting resources are
declared. I am strongly in favor or this raising an error; however,
looking at the CHEF-3694 warnings in our own Chef runs, this could be
considered overly strict. Furthermore, for heavy users of community
cookbooks, it could be annoying to deal with conflicts between different
externally sourced cookbooks. If desired, this could be mitigated by
namespacing resources by cookbook. Duplicating a resource within a
cookbook would be an error, while duplicating a resource from a
different cookbook would work but possibly emit a warning. Resource
lookup (for notifications, etc.) would prefer a resource from the
current cookbook in the case of multiples. Additional syntax would be
added to fully qualify a resource for the purpose of notifying a
resource from an outside cookbook when duplicates are present.
Justification

There's obviously a much quick-n-dirtier solution to the
bug-as-reported, which is to create a notification resource. Here's why
I think we should do a deeper refactor:

As a matter of philosophy plus pragmatism, I want to embrace
reprogrammability
in a way that is stable, supportable, and has a lower barrier to entry
than we have now. Reprogrammability in Chef currently involves lots of
magic, code duplication, and (ab)use of internal code that can easily
be broken by an innocent refactor.
Splitting the resource collection's responsibilities into two
components opens lots of opportunities to fix issues and limitations
in the existing code:

The run_action hack is useful for some circumstances, but its
limitations can lead to running entire recipes during the compile
phase, for example to install build tools, database libraries, and
then a database gem. To some degree this problem could be addressed
by splitting cookbooks into multiple components, but this is
frustrated by the way run_lists are composed (you can't make a role
insert some items before and some items after the rest of the
run_list). By exposing the action list for user manipulation, it's
easy to write code to insert {resource, action} pairs in other
locations.
Nested Chef runs (recipe_eval in the deploy resource, LWRPs with
inline compilation enabled) can access resources in the main
resource set, but have their own action list. This means we can make
inline compilation the default for LWRPs with no downsides.


Design quality. We solve the problems in CHEF-3694 by making the
internal and external models of resource creation and action ordering
consistent with their intended use and design.