ravibhure/Chef Metal_Configuration_and_Drivers.md

## Chef Metal_Configuration_and_Drivers.md

      
    Raw
  

              Chef Metal_Configuration_and_Drivers.md
            
          
    Chef Metal, Configuration and Drivers

As Chef Metal approaches 1.0, we've landed a huge configuration and driver interface improvement intended to enable:

A standard way to specify credentials and keys that keeps them out of recipes and allows them to be used in multiple places
External commands (like "metal execute") that can look up information and manage a node independent of the original Metal recipe
Environmental and directory-specific configuration
Make the drivers easily usable in test-kitchen and knife

Herein I want to talk about the Driver interface and how it is used by provisioning programs like the machine resource or kitchen, and driver implementors.
Configuring and using Metal drivers

Metal's machine resource has not changed, but the way you specify drivers has changed significantly. No longer are you encouraged to specify the driver name and credentials in the recipe (though it is still possible); instead, it is preferred to configure this information through Chef configuration files and the environment.
Basic recipes

In a recipe, you use the machine and machine_batch resources to manipulate machines:
machine 'webserver' do
  recipe 'apache'
end
You'll notice you don't specify where the machine should be or what OS it should have.  This configuration happens in the environment, configuration files or in recipes, described below.
(There are many things you can do with the machine resource, but we'll cover that in another post.)
Installing drivers

chef-metal drivers are generally named chef-metal-.  To install, just use gem install chef-metal-docker (for instance).  Fog and Vagrant come pre-installed when you install chef-metal.
Using a driver

To specify where the machine should be (AWS, Vagrant, etc.), you need a driver. There are several drivers out there, including:

Fog (which connects with AWS EC2, OpenStack, DigitalOcean and SoftLayer)
IBM VSphere
Vagrant (VirtualBox and VMWare Fusion)
LXC
Docker
Raw SSH (with a list of already-provisioned servers)

(Note: as of this writing, only Fog and Vagrant are up to date with the new Driver interface, but that will change very quickly.)
Setting the driver with a driver URL

The driver you want is specified by URLs.  The first part of the URL, the scheme, identifies the Driver class that will be used.  The rest of the URL uniquely identifies the account or location the driver will work with.  Some examples of driver URLs:

fog:AWS:default: connect to the AWS default profile (in ~/.aws/config)
fog:AWS:123514512344: connect to the AWS account # 123514512344
vagrant: a vagrant directory located in the default location (<configuration directory>/vms)
vagrant:~/machinetest: a vagrant directory at ~/machinetest

To set the driver that will be used by default, you can place the following in your Chef or Knife config (such as .chef/knife.rb):
local_mode true
log_level :debug
driver 'vagrant:~/machinetest'
You can also set the CHEF_DRIVER environment variable:
CHEF_DRIVER=fog:AWS:default chef-client -z my_cluster.rb

Driver options (credentials)

Driver options contain the credentials and necessary information to connect to the driver.
To specify driver_options, you can
driver 'fog:AWS:default'
driver_options {
  :aws_profile => 'jkeiser_work'
}
If you alternate between many drivers, you can also set options that are "glued" to a specific driver by putting this in your Chef config:
drivers {
  'fog:AWS:123445315112' => {
    :driver_options => {
      aws_profile => 'jkeiser_work'
    }
  }
}
As you can see, machine_options can be specified as well.  We'll talk about those more later.
There will be easier ways to specify this as Chef profiles and configuration evolve in the near future, as well.
Machine options

Machine options can be specified in Chef configuration or in recipes.  In Chef config, it looks like this:
driver 'vagrant:'
# This will apply to all machines that don't override it
machine_options :vagrant_options => {
  :bootstrap_options => {
    'vm.box' => 'precise64'
  }
}
And with the with_machine_options directive to affect multiple machines:
with_driver 'vagrant:'

with_machine_options :vagrant_options => {
  'vm.box' => 'precise64'
}

machine 'webserver' do
  recipe 'apache'
end
machine 'database' do
  recipe 'mysql'
end
Or directly on the machines:
machine 'webserver' do
  driver 'vagrant:'
  machine_options :vagrant_options => {
    'vm.box' => 'precise64'
  }
  recipe 'apache'
end
This sort of mixing of physical and logical location is often not advisable, but there are situations where it's expedient or even required, so it's supported.
NOTE: with_machine_options can also take a do block that will apply to all machines inside it.
As before, you can even attach options to specific drivers (defaults for specific drivers and accounts can be useful):
drivers {
  'fog:AWS:123445315112' => {
    :driver_options => {
      aws_profile => 'jkeiser_work'
    }
    :machine_options => {
      :bootstrap_options => {
        :region => 'us-east-1'
      }
    }
  },
  'vagrant:/Users/jkeiser/vms' => {
    :machine_options => {
      :vagrant_options => {
        'vm.box' => 'precise64'
      }
    }
  }
}
Using Chef profiles

You can set the CHEF_PROFILE environment variable to identify the profile you want to load.
In Chef config:
profiles {
  'default' => {
  }
  'dev' => {
    :driver => 'vagrant:',
    :machine_options => {
      :vagrant_options => {
        'vm.box' => 'precise64'
      }
    }
  },
  'test' => {
    :driver => 'fog:AWS:test',
    :machine_options => {
      :bootstrap_options => {
        :flavor_id => 'm1.small'
      }
    }
  },
  'staging' => {
    :driver => 'fog:AWS:staging',
    :driver_options => {
      :bootstrap_options => {
        :flavor_id => 'm1.small'
      }
    }
  }
}
This will get better tooling and more integrated Chef support in the future, but it is a good start.  You can set the current profile using the CHEF_PROFILE environment variable:
CHEF_PROFILE=dev chef-client -z my_cluster.rb

The Driver interface

The Driver interface is a set of 4 objects that allow provisioning programs to communicate with drivers.  There are several key objects in the Driver interface:

Driver: Represents a "machine warehouse"--an AWS account, a set of vagrant machines, a PXE machine registry. You cam ask it for new machines, power machines on and off, and get rid of machines you are no longer using.
Machine: Represents a ready, connectable machine.  The machine interface lets you run commands, upload and download files, and converge recipes.  This returned by Driver methods that create and connect to machines.
MachineSpec: Represents the saved information about a Machine.  Drivers use this to save information about how to locate and manipulate individual machines (like the AWS instance ID, PXE MAC address, or Vagrantfile location).
ActionHandler: this is how Metal communicates back to the host provisioning program (like the machine resource, test-kitchen, or knife/metal command line). It primarily uses it to report actions it performs and progress, so that the host can print pretty output informing the user.

Writing Drivers

When you need to access a new PXE or cloud service, you need to write a new Driver. (For cloud services, often modifying chef-metal-fog will be sufficient rather than creating a whole new driver.)
Picking a URL scheme

Every driver instance must be identified uniquely by a URL.  This generally describes where the list of servers lives.  For cloud providers this will generally be an account or a server. For VMs an containers it will either be a directory or global to the machine.
Example URLs from real drivers:
fog:AWS:1231241212          # account ID (canonical)
fog:AWS:myprofile           # profile in ~/.aws/config
fog:AWS                     # implies default profile
vagrant:/Users/jkeiser/vms  # path to vagrant vm (canonical)
vagrant:~/vms               # path to vagrant vm (non-canonical)
vagrant                     # implies <chef config dir>/vms

The bit before the colon--the scheme--is the identifier for your driver gem.
Some of these URLs are canonical and some are not.  When you create a driver with one of these URLs, the driver_url on the resulting driver must be the canonical URL.  For example, ChefMetal.driver_for_url("fog:AWS").driver_url would equal "fog:AWS:12312412312" (or whatever your account is).  This is important because the cannical URL will be stored in the URL and may be used by different people on different workstations with different profile names.
from_url

To instantiate the driver, you must implement Driver.from_url.  This method's job is to canonicalize the URL, and to make an instance of the Driver.  For example:
require 'chef_metal/driver'

class MyDriver < ChefMetal::Driver
  def self.from_url(url, config)
    MyDriver.new(url, config)
  end

  def initialize(url, config)
    super(url, config)
  end

  def cloud_url
    scheme, cloud_url = url.split(':', 2)
    cloud_url
  end

  def the_ultimate_cloud
    TheUltimateCloud.connect(cloud_url, driver_config['username'], driver_config['password'])
  end
end
driver_config and credentials

As you can see in the previous example, driver_config is where credential information is passed to your driver. It ultimately comes from config[:driver_config] passed to the from_url method. For example, our hypothetical driver could allow the user to specify this in their Chef config:
driver 'mydriver:http://the_ultimate_server.com:8080'
driver_config :username => 'me', :password => 'mypassword'
This is the standard place for users to put credentials.  It is a freeform hash, so you should document what keys you expect users to put there to help you connect.
Please feel free to work with any files or environment variables that drivers typically support (like ~/.aws/config), so that you can share configuration with standard tools for that cloud/VM/whatever.
allocate_machine

Allocate machine is the first method called when creating a machine.  Its job is to reserve the machine, and to return quickly.  It may start the machine spinning up in the background, but it should not block waiting for that to happen.
allocate_machine takes an action_handler, machine_spec, and a machine_options argument.  action_handler is where the method should report any changes it makes.  machine_spec.location will contain the current known machine information, loaded from persistent storage (like from the node).  machine_options contains the desired options for creating the machine.  Both machine_spec.location and machine_options are freeform hashes owned by the driver.  You should document what options the user can pass in your driver's documentation.
By the time the method is finished, the machine should be reserved and its information stored in machine_spec.location.  If it is not feasible to do this quickly, then it is acceptable to defer this to ready_machine.
  def allocate_machine(action_handler, machine_spec, machine_options)
    if machine_spec.location
      if !the_ultimate_cloud.server_exists?(machine_spec.location['server_id'])
        # It doesn't really exist
        action_handler.perform_action "Machine #{machine_spec.location['server_id']} does not really exist.  Recreating ..." do
          machine_spec.location = nil
        end
      end
    end
    if !machine_spec.location
      action_handler.perform_action "Creating server #{machine_spec.name} with options #{machine_options}" do
        private_key = get_private_key('bootstrapkey')
        server_id = the_ultimate_cloud.create_server(machine_spec.name, machine_options, :bootstrap_ssh_key => private_key)
        machine_spec.location = {
          'driver_url' => driver_url,
          'driver_version' => MyDriver::VERSION,
          'server_id' => server_id,
          'bootstrap_key' => 'bootstrapkey'
        }
      end
    end
  end
In all methods, you should wrap any substantive changes in action_handler.perform_action.  Progress can be reported with action_handler.report_progress.  NOTE: action_handler.perform_action will not actually execute the block if the user passed --why-run to chef-client.  Why Run mode is intended to simulate the actions it would perform, but not actually perform them.
If you notice the user wants the machine to be different than it is now--for example, to have more RAM or disk or processing power--you should either safely move the data over to a new instance, or warn the user that you cannot fulfill their desire.
Working with private keys

You'll notice the service is passed a private key for bootstrap.  This is the bootstrap key, and in our example, TheUltimateCloud will allow you to ssh to the machine with the root user using that private key after it is bootstrapped.  (Several cloud services already work this way.)
The issue one has here is, the user needs to be able to pass you these keys.  chef-metal introduces configuration variables :private_keys and :private_key_paths to allow the user to tell us about his keys.  We then refer to the keys by name (rather than path) in drivers, and look them up from configuration.
Here is what the get_private_key method looks like:
  def get_private_key(name)
    if config[:private_keys] && config[:private_keys][name]
      if config[:private_keys][name].is_a?(String)
        IO.read(config[:private_keys][name])
      else
        config[:private_keys][name].to_pem
      end
    elsif config[:private_key_paths]
      config[:private_key_paths].each do |private_key_path|
        Dir.entries(private_key_path).each do |key|
          ext = File.extname(key)
          if ext == '' || ext == '.pem'
            key_name = key[0..-(ext.length+1)]
            if key_name == name
              return IO.read("#{private_key_path}/#{key}")
            end
          end
        end
      end
    end
  end
ready_machine

ready_machine is the other half of the machine creation story. This method will do what it needs to bring the machine up. When the method finishes, the machine must be warm and connectable.  ready_machine returns a Machine object.  An example:
  def ready_machine(action_handler, machine_spec, machine_options)
    server_id = machine_spec.location['server_id']
    if the_ultimate_cloud.machine_status(server_id) == 'stopped'
      action_handler.perform_action "Powering up machine #{server_id}" do
        the_ultimate_cloud.power_on(server_id)
      end
    end

    if the_ultimate_cloud.machine_status(server_id) != 'ready'
      action_handler.perform_action "wait for machine #{server_id}" do
        the_ultimate_cloud.wait_for_machine_to_have_status(server_id, 'ready')
      end
    end

    # Return the Machine object
    machine_for(machine_spec, machine_options)
  end
ready_machine takes the same arguments as allocate_machine, and machine_spec.location will contain any information that was placed in allocate_machine.
Creating the Machine object

The Machine object contains a lot of the complexity of connecting to and configuring a machine once it is ready.  Happily, most of the work is already done for you here.
require 'chef_metal/transport/ssh_transport'
require 'chef_metal/convergence_strategy/install_cached'
require 'chef_metal/machine/unix_machine'

  def machine_for(machine_spec, machine_options)
    server_id = machine_spec.location['server_id']
    hostname = the_ultimate_cloud.get_hostname()
    ssh_options = {
      :auth_methods => ['publickey'],
      :keys => [ get_key('bootstrapkey') ],
    }
    transport = ChefMetal::Transport::SSHTransport.new(the_ultimate_cloud.get_hostname(server_id), ssh_options, {}, config)
    convergence_strategy = ChefMetal::ConvergenceStrategy::InstallCached.new(machine_options[:convergence_options])
    ChefMetal::Machine::UnixMachine.new(machine_spec, transport, convergence_strategy)
  end
WindowsMachine and WinRMTransport are also available for Windows machines.  You can look at how these are instantiated in the chef-metal-vagrant driver.
destroy_machine

The destroy_machine function is fairly straightforward:
  def destroy_machine(action_handler, machine_spec, machine_options)
    if machine_spec.location
      server_id = machine_spec.location['server_id']
      action_handler.perform_action "Destroy machine #{server_id}" do
        the_ultimate_cloud.destroy_machine(server_id)
        machine_spec.location = nil
      end
    end
  end
stop_machine

Same with stop_machine:
  def stop_machine(action_handler, machine_spec, machine_options)
    if machine_spec.location
      server_id = machine_spec.location['server_id']
      action_handler.perform_action "Power off machine #{server_id}" do
        the_ultimate_cloud.power_off(server_id)
      end
    end
  end
connect_to_machine

This method should return the Machine object for a machine, without spinning it up.  Because of how we coded ready_machine, we can just do this:
  def connect_to_machine(machine_spec, machine_options)
    machine_for(machine_spec, machine_options)
  end
Creating the init file

Drivers are automatically loaded based on their driver URL.  The way Metal does this is by extracting the scheme from the URL, and then doing require 'chef_metal/driver_init/schemename'. So for our driver to load when driver is set to mydriver:http://theultimatecloud.com:80, we need to create a file named chef_metal/driver_init/mydriver.rb` that looks like this:
require 'chef_metal_mydriver/mydriver'
ChefMetal.register_driver_class("mydriver", ChefMetalMyDriver::MyDriver)
After this require, chef-metal will call ChefMetalMyDriver::MyDriver.from_url('mydriver:http://theultimatecloud.com:80', config) and will have a driver!
Publishing it all as a gem

For users to actually use their gem, you need to release the gem on rubygems.org, and people will do gem install chef-metal-mydriver.  Instructions for publishing a gem are at rubygems here.
Parallelism!  allocate_machines

By default Chef Metal provides parallelism on top of your driver by calling allocate_machine and ready_machine in parallel threads.  But many providers have interfaces that let you spin up many machines at once.  If you have one of these, you can implement the allocate_machines method.  It takes the action_handler you love and know, plus a specs_and_options hash (keys are machine_spec and values are machine_options), and a parallelizer object you can optionally use to run multiple ruby blocks in parallel.
  def allocate_machines(action_handler, specs_and_options, parallelizer)
    private_key = get_private_key('bootstrapkey')
    servers = []
    server_names = []
    specs_and_options.each do |machine_spec, machine_options|
      if !machine_spec.location
        servers << [ machine_spec.name, machine_options, :bootstrap_ssh_key => private_key]
        server_names << machine_spec.name
      end
    end

    # Tell the cloud API to spin them all up at once
    action_handler.perform_action "Allocating servers #{server_names.join(',')} from the cloud" do
      the_ultimate_cloud.create_servers(servers)
    end
  end
You can also implement ready_machines, destroy_machines and stop_machines.
Using Metal drivers directly in your programs

There are many programs that could benefit from creating and manipulating machines with Metal.  For example, the machine and machine_batch resources in Chef recipes, test-kitchen, and knife all use the Metal Driver interface for provisioning.  This is an explanation of how the Driver interface is used.
Configuration

The fundamental bit of Metal is the configuration, passed in to.  This is a hash, with symbol keys for the important top level things:
{
  :driver => 'fog:AWS:default',
  :driver_options => { <credentials here, if you must> },
  :machine_options => { <options here> }
  :chef_server_url => 'https://api.opscode.com/organizations/myorg'
  :node_name => 'jkeiser', # Client or username to connect to Chef server
  :client_key => '/Users/jkeiser/.chef/keys/jkeiser.pem'
}
Getting the Chef config

To get the Chef config, you can use this code:
require 'chef/config'
require 'chef/knife'
require 'chef/config_fetcher'
require 'cheffish'

chef_config = begin
  Chef::Config.config_file = Chef::Knife.locate_config_file
  config_fetcher = Chef::ConfigFetcher.new(Chef::Config.config_file, Chef::Config.config_file_jail)
  if Chef::Config.config_file.nil?
    Chef::Log.warn("No config file found or specified on command line, using command line options.")
  elsif config_fetcher.config_missing?
    Chef::Log.warn("Did not find config file: #{Chef::Config.config_file}, using command line options.")
  else
    config_content = config_fetcher.read_config
    config_file_path = Chef::Config.config_file
    begin
      Chef::Config.from_string(config_content, config_file_path)
    rescue Exception => error
      Chef::Log.fatal("Configuration error #{error.class}: #{error.message}")
      filtered_trace = error.backtrace.grep(/#{Regexp.escape(config_file_path)}/)
      filtered_trace.each {|line| Chef::Log.fatal("  " + line )}
      Chef::Application.fatal!("Aborting due to error in '#{config_file_path}'", 2)
    end
  end
  Cheffish.profiled_config # This adds support for Chef profiles
end
This will handle everything including environment variables.
You may also want to do this:
Chef::Config.local_mode true
If you have your own configuration mechanism, you can either merge it with the Chef config using `Cheffish::MergedConfig.new(my_config, chef_config), or just pass it directly and ignore Chef.
Respecting local mode

If you want to work with local mode (spin up a chef-zero server), you will need to spin it up.  You can use this code to do that:
require 'chef/application'

Chef::Application.setup_server_connectivity
The Chef server URL will be in Chef::Config.chef_server_url.
Use this code to stop it when you are done with it:
Chef::Application.destroy_server_connectivity
Listening to Metal: implementing ActionHandler

ActionHandler is how Metal communicates back to your application. It will report progress and tell you when it updates things, so that you can print that information to the user (whether it be to the console or to a UI). To create an ActionHandler, you implement these methods:
require 'chef_metal/action_handler'

class MyActionHandler < ChefMetal::ActionHandler
  # Loads node (which is a hash witha  bunch of attributes including 'name')
  def initialize(name, my_storage)
    @node = my_storage.load(name) || { 'name' => name }
    super(@name)
    @my_storage = my_storage
  end

  # Globally unique identifier for this machine.  For Chef, we use
  # <chef_server_url>/nodes/#{name}.  Does not have to be a URL.
  def id
    "#{@my_storage.url}/#{name}"
  end

  def save(action_handler)
    # much-vaunted idempotence
    if @my_storage.node_is_different(name, @node)
      action_handler.perform_action "save #{name} to storage" do
        @my_storage.save(@node)
      end
    end
  end
end
Storing machine data: implementing MachineSpec

MachineSpec is the way you communicate the persisted state of a machine to metal (including save and load).
MachineSpec has a save() method that saves the machine location data (like its instance ID or Vagrantfile) to persistent storage for later retrieval. For chef-client, this location is a Chef node. For other applications, you may prefer to store this sort of persistent data elsewhere (test-kitchen has its own server state storage). To do that, you will override MachineSpec and implement the save method (as well as create a method to instantiate YourMachineSpec by loading it back in).
In many Chef-centric cases,
If you are OK with just storing the nodes in the Chef server, then you can just use the ChefMachineSpec to take care of saving and loading:
require 'cheffish'
require 'chef_metal/chef_machine_spec'

chef_server = Cheffish.chef_server_for(config)
machine_spec = ChefMetal::ChefMachineSpec.new(machine_name, chef_server)
Instantiating a driver

When you want to work with machines, you need a driver.  There are two principal reasons to get a driver.  First, for connect, destroy and delete type operations, you may want to work with an existing machine, defined by a machine_spec.  Second, to create a desired machine (allocate and ready_machine), you will want to create a driver straight from configuration or from a driver URL.
To get a driver URL from config:
require 'chef_metal'
driver = ChefMetal.driver_for_url(chef_config[:driver], chef_config)
To get a driver URL from a machine spec:
if machine_spec.driver_url
  driver = ChefMetal.driver_for_url(machine_spec.driver_url, chef_config)
end
Creating a machine

To create a machine, you do this:
machine_options = ChefMetal.config_for_url(driver.driver_url, chef_config)
ChefMetal.allocate_machine(action_handler, machine_spec, machine_options)
ChefMetal.ready_machine(action_handler, machine_spec, machine_options)
Creating multiple machines in parallel

driver = ChefMetal.driver_for_url(chef_config[:driver], chef_config)
machine_options = ChefMetal.config_for_url(driver.driver_url, chef_config)
machine_options = Cheffish::MergedConfig.new(machine_options, { :convergence_options => { :chef_server => Cheffish.default_chef_server(chef_config) } })
specs_and_options = {}
machine_specs.each do |machine_spec|
  specs_and_options[machine_spec] = machine_options
end
driver.allocate_machines(action_handler, specs_and_options)
driver.ready_machines(action_handler, specs_and_options)
NOTE: if you have specific options for each individual machine, you can use Cheffish::MergedConfig.new({ :machine_options => new_options }, machine_options) instead of machine_options inside the loop.
Connecting to, destroying or stopping a machine

driver.connect_to_machine(action_handler, machine_spec, machine_options)
driver.destroy_machine(action_handler, machine_spec, machine_options)
driver.stop_machine(action_handler, machine_spec, machine_options)
driver.destroy_machines(action_handler, specs_and_options)
driver.stop_machines(action_handler, specs_and_options)