lusis/autostart.md

## autostart.md

      
    Raw
  

              autostart.md
            
          
    For a moment, I'm going to throw away my automation and configuration management hat. I'll let you know when I put it back on.
Also, let's ignore that we're talking about Riak specifically for a moment.
Also also, let's ignore any (for a brief moment) the proper role of a package manager.
What's your target market?

If you're writing server software, you have two target markets. The system administrator/operations team and the developer.
Why do you want autostart?

The main reason you want autostart is to get people up and running quickly. To do this, you need to ship safe and sane defaults. This means something like a default configuration that listens only on localhost.
This is a great goal. People can easily install a package and BOOM, they can start using it. No additional thought required.
However, let's look at what you've just done. You've started a program running in the background (or worse - foreground) that, in the case of Riak, is now writing persistant data to the file system. You've just made some pretty arrogant assumptions about not only whether or not the user actually wants it to run full time but also about where that data should go.
So maybe you think, I'll add a post-install dialog for the user so they can tell me where they want the data. If autostarting hadn't hinted that the market you're catering to is the developer, requiring human interaction pretty much cemented it.
But wait! Debian package files support preseeding. This works for system admins too! Except for those who happen to not be using a debian based distro. So now you've narrowed your target market to developers on debian-based systems.
Doing yourself and your users a disservice

We're going to move firmly back into Riak territory now. An autostart configuration with localhost only listening does Riak a disservice. The ONLY valid use case for running in localhost-only mode is local testing/development. Even then it's probably not a valid use case. How many times has the Riak list seen people "benchmarking" Riak with a single node only to be told "You really should be using more than one node for this"?
The fact is that Riak is not a localhost-only single node system. It's a complex system (not in the difficulty sense). Designing a Riak cluster SHOULD require thought. Which backend do I want to use? How many nodes to I need to start? What should my ring creation size be? That last one in particular is a big one.
The reality is you can't reasonably take a single node Riak install that has had no additional configuration done and add it to a production cluster as is. You HAVE to make changes.
Mind you, Riak is wonderfully friendly to operations staff but by making your official packages autostart with what is essentially a developer config makes it LESS friendly to operations folks.
Adding automation into the mix

A bit of a side note. At my last company we were testing using Datastax Enterprise. The official packages behaved in exactly the way you're proposing now.
It was one of the single biggest pain points for me. When I went to automate the entire thing with Chef, I ended up having to jump through hoops cleaning up after the default behavior of the package that it would have been LESS work just having Chef install from tarball and setting up all the additional environment variables and symlinks!
I've attached two files that made up our DSE Analytics node install. Note the "cleanup-default-install". This was neccessary because when the default install started up, configuration data was actually written not only to a Cassandra keyspace itself but to the filesystem. Essentially to add the node to a cluster, I had to blow away the data dir AND clean up the local cached settings stuff. Note those recipes don't even deal with having to rebalance the ring. Let's not even get into the clusterfuck that we ran into because Datastax treated /tmp as persistent storage (which ubuntu happily blows away by default at each restart)
Mind you, Basho is not Datastax.
Basho should be encouraging customers down the proper path - that of automating node installation. Autostarting and shipping developer-only defaults does not encourage that. I would hate to manage a Riak cluster without automation tools. Technically I hate to manage ANYTHING without automation. You guys already point your customers in the right direction with being so operationally friendly. You HAVE a working developer-friendly setup in the 3-node quick start. Shit, you guys have even made EVERY erlang packagers life easy with Rebar.
Proper role of the package manager

Debian packages should NOT be looked to as best practices for how to install your software. Pre and Post steps in RPM packages should have never been invented.
IMHO (a very strong O), the role of the package manager should be to lay bits on the disk. Nothing more. I would even argue that creating users is not even the role of the package manager. Let's not even get into creating the user as a system user or not (useradd -r) which some packages do and some don't. I might not want /etc/skel copied over for that user.
Package manager's opinionated workflow are great for desktop systems. Debian packages are fairly decent at ensuring they never trample all over user changes. However servers are not desktops. The configuration is (or at least should be) a know state before the software is even installed. Default configuration files are useless for pretty much everyone.
Lay the bits on disk, ship well documented example configs and support storing persistent data in customizable locations. If you really want to make it easier for developers and the riak-curious to get started, ship a shell script that enables "developer-mode" - copy localhost-only configs over to real configs and start the service for the user (just don't add it to the system startup scripts!).
As someone who's going to be standing up a large multi-datacenter Riak cluster that will be automated with Chef in the near future, please don't make my job any harder ;)

  
## default.rb
include_recipe "xfs"
include_recipe "apt"
include_recipe "jdk::sun"

package "mdadm"
# Clean up ubuntu default mount on Natty AMI
if node[:platform] == "ubuntu" && node[:lsb][:release] == "11.04"
  mount "/mnt" do
    device "/dev/xvdb"
    action [:umount, :disable]
  end

  mdadm "/dev/md0" do
    devices [ "/dev/xvdb", "/dev/xvdc" ]
    level 0
    action [:create, :assemble]
  end

  execute "build cassandra filesystem" do
    command "mkfs.xfs /dev/md0 -L data"
    not_if { File.exists?("/mnt/va/data/cassandra") }
    not_if "mount  | grep '/mnt'"
  end

  mount "/mnt" do
    device "/dev/md0"
    action [:mount, :enable]
  end
end

user "cassandra" do
  system true
  action [:create, :modify, :manage]
  home "/var/lib/cassandra"
  shell "/bin/bash"
  supports :manage_home => true
end

directory "/mnt/va/"
directory "/mnt/va/data"

directory "/mnt/va/data/cassandra" do
  owner "cassandra"
  group "cassandra"
  mode "0750"
end

directory "/var/log/cassandra" do
  owner "cassandra"
  group "cassandra"
  mode "0755"
end

link "/var/lib/cassandra/data" do
  to "/mnt/va/data/cassandra"
end

apt_repository "datastax" do
  uri "http://debian.datastax.com/enterprise"
  components ["stable", "main"]
  key "http://debian.datastax.com/debian/repo_key"
  action [:add]
end

execute "apt-get update"

bash "cleanup_default_install" do
  user "root"
  cwd "/var/lib/cassandra/data"
  code <<-EOH
  rm -rf /var/lib/cassandra/data/system/*
  EOH
  action :nothing
  not_if "test -f /var/lib/cassandra/.va"
end

service "dse" do
  action :nothing
  notifies :run, resources(:bash => "cleanup_default_install"), :immediately
end

package "dse-full" do
  action [:install]
  version "#{node[:dse][:cassandra_version]}"
  notifies :stop, resources(:service => "dse"), :immediately
end

file "/var/lib/cassandra/.va" do
  action :create_if_missing
end

if node.run_list.roles.include?("dse_hadoop_node")
  include_recipe "dse::hadoop"
end

if node.run_list.roles.include?("dse_cassandra_node")
  include_recipe "dse::cassandra"
end

## hadoop.rb
okens = search(:cassandra, "id:tokens").first

cluster_name = node[:dse][:cluster_name]
node_token = tokens['nodes'][node.name] || ""
endpoint_snitch = node[:dse][:endpoint_snitch]

template "/etc/dse/cassandra/cassandra.yaml" do
  mode "0644"
  owner "root"
  group "root"
  action :create
  source "cassandra.yaml.erb"
  variables({:endpoint_snitch => endpoint_snitch, :cluster_name => cluster_name, :node_token => node_token, :seed => node.dse.seed})
end

runit_service "cassandra-hadoop" do
  action [ :enable, :start ]
  subscribes :restart, "template[/etc/dse/cassandra/cassandra.yaml]"
end
	include_recipe "xfs"
	include_recipe "apt"
	include_recipe "jdk::sun"

	package "mdadm"
	# Clean up ubuntu default mount on Natty AMI
	if node[:platform] == "ubuntu" && node[:lsb][:release] == "11.04"
	mount "/mnt" do
	device "/dev/xvdb"
	action [:umount, :disable]
	end

	mdadm "/dev/md0" do
	devices [ "/dev/xvdb", "/dev/xvdc" ]
	level 0
	action [:create, :assemble]
	end

	execute "build cassandra filesystem" do
	command "mkfs.xfs /dev/md0 -L data"
	not_if { File.exists?("/mnt/va/data/cassandra") }
	not_if "mount \| grep '/mnt'"
	end

	mount "/mnt" do
	device "/dev/md0"
	action [:mount, :enable]
	end
	end

	user "cassandra" do
	system true
	action [:create, :modify, :manage]
	home "/var/lib/cassandra"
	shell "/bin/bash"
	supports :manage_home => true
	end

	directory "/mnt/va/"
	directory "/mnt/va/data"

	directory "/mnt/va/data/cassandra" do
	owner "cassandra"
	group "cassandra"
	mode "0750"
	end

	directory "/var/log/cassandra" do
	owner "cassandra"
	group "cassandra"
	mode "0755"
	end

	link "/var/lib/cassandra/data" do
	to "/mnt/va/data/cassandra"
	end

	apt_repository "datastax" do
	uri "http://debian.datastax.com/enterprise"
	components ["stable", "main"]
	key "http://debian.datastax.com/debian/repo_key"
	action [:add]
	end

	execute "apt-get update"

	bash "cleanup_default_install" do
	user "root"
	cwd "/var/lib/cassandra/data"
	code <<-EOH
	rm -rf /var/lib/cassandra/data/system/*
	EOH
	action :nothing
	not_if "test -f /var/lib/cassandra/.va"
	end

	service "dse" do
	action :nothing
	notifies :run, resources(:bash => "cleanup_default_install"), :immediately
	end

	package "dse-full" do
	action [:install]
	version "#{node[:dse][:cassandra_version]}"
	notifies :stop, resources(:service => "dse"), :immediately
	end

	file "/var/lib/cassandra/.va" do
	action :create_if_missing
	end

	if node.run_list.roles.include?("dse_hadoop_node")
	include_recipe "dse::hadoop"
	end

	if node.run_list.roles.include?("dse_cassandra_node")
	include_recipe "dse::cassandra"
	end
	okens = search(:cassandra, "id:tokens").first

	cluster_name = node[:dse][:cluster_name]
	node_token = tokens['nodes'][node.name] \|\| ""
	endpoint_snitch = node[:dse][:endpoint_snitch]

	template "/etc/dse/cassandra/cassandra.yaml" do
	mode "0644"
	owner "root"
	group "root"
	action :create
	source "cassandra.yaml.erb"
	variables({:endpoint_snitch => endpoint_snitch, :cluster_name => cluster_name, :node_token => node_token, :seed => node.dse.seed})
	end

	runit_service "cassandra-hadoop" do
	action [ :enable, :start ]
	subscribes :restart, "template[/etc/dse/cassandra/cassandra.yaml]"
	end