Skip to content

Instantly share code, notes, and snippets.

@jkeiser
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jkeiser/9924067 to your computer and use it in GitHub Desktop.
Save jkeiser/9924067 to your computer and use it in GitHub Desktop.
Chef Metal and LXC: Best Friends

Chef Metal and LXC: Best Friends!

Chef Metal now supports LXC! All the benefits of Metal, including idempotent, versioned, reusable cluster definitions can be realized. You can use LXC containers for testing, or you can use them as a part of your application stack. You can now write a Metal recipe like so:

chef-metal is still alpha software. People are experimenting with it and contributing to it, but it is still evolving.

require 'chef_metal_lxc'
with_provisioner ChefMetalLXC::LXCProvisioner.new
with_provisioner_options :template => 'download', :template_options => %w(-d ubuntu -r trusty -a amd64)
machine 'a' do
  recipe 'apache'
end
machine 'b' do
  recipe 'mysql'
end

Run this and you've got a couple of LXC machines!

gem install chef-metal-lxc
chef-client -z myrecipe.rb

Testing, Testing, 1 2 3

Why is LXC so important to us? Because one of chef-metal's primary requirements is to bring up automated tests in Continuous Integration environments. Bringing up automated tests means creating VMs, cloud machines, or providers with your application running on them. The vagrant provider was already working, though; why not that?

Many CI environments are already sitting in VirtualBox or VMWare virtual machines, which cannot be nested but which can run LXC containers. (Travis, sadly, is not one of these, but others are.) You could bring up instances on EC2 from your CI, but there are concerns about money, credentials, and time-to-start. Thus: LXC!

There are other reasons, of course--LXC containers are smaller and more efficient than their VM counterparts--but the ability to run in a CI environment is the primary requirement.

The rest of this blog post is mostly about how we did it, so if you were just interested in what it is, you can stop reading now :)

Making The EC2 Provisioner

Ranjib Dey and I decided to get an LXC driver up in working order, using ruby-lxc. To do that, we had to make a provisioner, which is ultimately responsible for absolutely everything Metal does around creating and talking to machines. This is the process we went through to make the LXC provisioner:

  1. Create a provisioner class that extends from Provisioner. Implement the four methods there:
  2. acquire_machine: idempotently creates the desired machine. The general form of acquire_machine is:
    • create the machine if it doesn't already exist, using provisioner_options (a freeform hash) to get input from the user
    • start the machine if it exists but is stopped
    • record information about the machine (like ec2 instance id or Vagrantfile location) in node['normal']['provisioner_output'], another freeform hash
    • create a Machine object that knows how to contact and set up the machine. (This code can usually use the common SSHTransport and UnixMachine / WindowsMachine.)
  3. connect_to_machine: returns a Machine object just like in the previous step, but does not try to create the machine.
  4. destroy_machine: idempotently destroys the machine.
  5. stop_machine: idempotently stops the machine.

In general, provisioner_options is the input The machine object returned from #2 and #3 is responsible for running commands on a machine, setting up and converging Chef, and manipulating files and forwarding ports. (Generally it uses a transport and convergence strategy to actually implement these operations.) In the case of LXC, we didn't need to create any new Machine objects, at least at first, so we just reused UnixMachine, SSHTransport, and InstallSh convergence strategy:

  def connect_to_machine(node)
    ChefMetal::Machine::UnixMachine.new(node,
      ChefMetal::Transport::SSHTransport.new(hostname, username, ssh_options, options),
      ChefMetal::ConvergenceStrategy::InstallSh.new)
  end

LXC Transport: Who Needs SSH Anyway?

At this point we had a working provisioner--as long as the LXC machine came with SSH preinstalled and networking configured. Sad. Panda.

So Ranjib sat down and built a new LXC transport that would bypass the need for an SSH login. A transport's job is running things on a machine, transferring files, and (if available) forwarding ports. Right now SSH, WinRM and LXC transports exist.

The LXC transport executes programs by using lxc-extra's container.execute function to fork a process into the container and run some Ruby there. It reads and writes files to/from the container by accessing the container's filesystem directly through the host. At this point, it did not forward ports, because that is Hard.

def connect_to_machine(node)
  ChefMetal::Machine::UnixMachine.new(node,
    ChefMetal::Transport::LXCTransport.new(lxc_path, node['name']),
    ChefMetal::ConvergenceStrategy::InstallSh.new)
end

This got us working LXC!

New Convergence Strategy: Caching the Installer

But we still had a problem: minimal containers don't have the interpreters or command line tools to run Chef's install.sh (no wget, curl or ruby), one of the primary ways to detect and install Chef. Another problem in a similar vein: even with wget, some containers have local network, but cannot connect to the Internet. We'd like to be able to set up containers with no network!

The solution was something Dan DeLeo suggested a while back: a bootstrap that downloads the Chef installer on the host, copies it up into the container, and runs it from there. Now our setup looked like this:

def connect_to_machine(node)
  ChefMetal::Machine::UnixMachine.new(node,
    ChefMetal::Transport::LXCTransport.new(lxc_path, node['name']),
    ChefMetal::ConvergenceStrategy::InstallCached.new)
end

We made this new convergence strategy the default for all provisioners (including EC2 and Vagrant), since in many cases it will be more efficient (only one download and verify of the installer) and unburdens us of some of our dependence on host capabilities like wget.

LXC Insanity: Forwarding the Ports

Now that we had a shiny new LXC we had a further problem: if you are running chef-zero (a common way to run Chef Metal), it binds to localhost on the host. We automatically tunnel that through SSH, which supports port forwarding, so that EC2 and Vagrant machines can talk to your local chef-zero; but the LXC transport didn't support port forwarding.

So we wrote it.

Now, in lxc-extra, you will find an LXCProxy that will proxy TCP signals over IPC between a process inside the container, to the chef-client process running on the host. So when chef-client on the container connects to 127.0.0.1:8889, our LXCProxy listens for the connection and shuffles the data over IPC to an LXCProxy on the host, who delivers it to 127.0.0.1:8889 there.

With this, you can now run your Metal recipes with chef-client -z and still use LXC with Chef Metal.

Another fun fact: you can now "inception" machine creation with metal. To wit, with this recipe:

with_fog_ec2_provisioner ...
machine 'ec2_vm' do
  recipe 'install_lxc_containers'
end

cookbooks/install_lxc_containers/recipe/default.rb:

with_provisioner ChefMetalLXC::LXCProvisioner.new
with_provisioner_options :template => 'download', :template_options => %w(-d ubuntu -r trusty -a amd64)
# Create two LXC machines inside the container!
machine 'a' do
  recipe 'install_lxc_containers::hello_world'
end
machine 'b' do
  recipe 'install_lxc_containers::hello_world'
end

cookbooks/install_lxc_containers/recipe/default.rb:

file '/tmp/hi.txt' do
  content 'hello world'
end

This will create a Vagrant VM, and inside that Vagrant VM create two LXC containers, which converge against chef-zero on the original host by forwarding ports all the way back down the chain.

Thread Friendliness and the Ruby GVL

This wasn't quite enough; we were seeing hangs when using the LXC proxy. It turns out this is due to Ruby's global interpreter lock (GVL). ruby-lxc was not releasing the GVL when it did long-running operations like start() or wait().

What this boils down to is, when you called container.start(), nothing else in Ruby could run until start() was finished. This is very thread-unfriendly; what if you want (for example) to listen for signals from the container in another thread? Thus, our hang.

The solution: we went through all of ruby-lxc and added GVL-friendliness (it releases the lock before it does something that takes a long time, and then reacquires after). LXC in Ruby will be a much happier thing now that ruby-lxc1.0.2 has been released!

Leftover Processes ...

The final thing we needed to deal with: runs of chef-client that created LXC containers would sometimes leave processes hanging around. This is because creating containers and running code inside containers will fork your current process. If you don't kill the processes, they can leave server ports and files open, blocking you from doing anything else on the machine!

This is why you see this code at the end of LXCTransport:

at_exit do
  ChefMetalLXC::LXCTransport.disconnect_active_transports
end

It kills the "proxy" processes that may have been hanging around, so that your machine stays clean.

The Journey's End

LXC is an early technology and we had to do a lot of learning to work with it effectively. At this point we're in pretty good shape, though. I hope you enjoyed reading about the journey!

If you have questions, you can file issues at the Github repository, send an email to jkeiser@getchef.com, or ping us on Twitter at jkeiser2 and RanjibDey.

@smith
Copy link

smith commented Apr 1, 2014

Is GVL the same thing as GIL?

@mmzyk
Copy link

mmzyk commented Apr 2, 2014

@smith It is. It stands for global VM lock apparently. I'd advised John he should update to GIL or change over to using global vm lock so that GVL makes sense.

@adamhjk
Copy link

adamhjk commented Apr 3, 2014

:shipit:

@jdblack
Copy link

jdblack commented Jul 21, 2014

I haven't figured out yet how to use chef-metal with a remote lxc server. Is such a thing possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment