Skip to content

Instantly share code, notes, and snippets.

@lusis
Created February 2, 2012 05:36
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save lusis/1721727 to your computer and use it in GitHub Desktop.
Save lusis/1721727 to your computer and use it in GitHub Desktop.
Why autostarting default config packages are a bad thing

For a moment, I'm going to throw away my automation and configuration management hat. I'll let you know when I put it back on. Also, let's ignore that we're talking about Riak specifically for a moment. Also also, let's ignore any (for a brief moment) the proper role of a package manager.

What's your target market?

If you're writing server software, you have two target markets. The system administrator/operations team and the developer.

Why do you want autostart?

The main reason you want autostart is to get people up and running quickly. To do this, you need to ship safe and sane defaults. This means something like a default configuration that listens only on localhost.

This is a great goal. People can easily install a package and BOOM, they can start using it. No additional thought required.

However, let's look at what you've just done. You've started a program running in the background (or worse - foreground) that, in the case of Riak, is now writing persistant data to the file system. You've just made some pretty arrogant assumptions about not only whether or not the user actually wants it to run full time but also about where that data should go.

So maybe you think, I'll add a post-install dialog for the user so they can tell me where they want the data. If autostarting hadn't hinted that the market you're catering to is the developer, requiring human interaction pretty much cemented it.

But wait! Debian package files support preseeding. This works for system admins too! Except for those who happen to not be using a debian based distro. So now you've narrowed your target market to developers on debian-based systems.

Doing yourself and your users a disservice

We're going to move firmly back into Riak territory now. An autostart configuration with localhost only listening does Riak a disservice. The ONLY valid use case for running in localhost-only mode is local testing/development. Even then it's probably not a valid use case. How many times has the Riak list seen people "benchmarking" Riak with a single node only to be told "You really should be using more than one node for this"?

The fact is that Riak is not a localhost-only single node system. It's a complex system (not in the difficulty sense). Designing a Riak cluster SHOULD require thought. Which backend do I want to use? How many nodes to I need to start? What should my ring creation size be? That last one in particular is a big one.

The reality is you can't reasonably take a single node Riak install that has had no additional configuration done and add it to a production cluster as is. You HAVE to make changes.

Mind you, Riak is wonderfully friendly to operations staff but by making your official packages autostart with what is essentially a developer config makes it LESS friendly to operations folks.

Adding automation into the mix

A bit of a side note. At my last company we were testing using Datastax Enterprise. The official packages behaved in exactly the way you're proposing now.

It was one of the single biggest pain points for me. When I went to automate the entire thing with Chef, I ended up having to jump through hoops cleaning up after the default behavior of the package that it would have been LESS work just having Chef install from tarball and setting up all the additional environment variables and symlinks!

I've attached two files that made up our DSE Analytics node install. Note the "cleanup-default-install". This was neccessary because when the default install started up, configuration data was actually written not only to a Cassandra keyspace itself but to the filesystem. Essentially to add the node to a cluster, I had to blow away the data dir AND clean up the local cached settings stuff. Note those recipes don't even deal with having to rebalance the ring. Let's not even get into the clusterfuck that we ran into because Datastax treated /tmp as persistent storage (which ubuntu happily blows away by default at each restart)

Mind you, Basho is not Datastax.

Basho should be encouraging customers down the proper path - that of automating node installation. Autostarting and shipping developer-only defaults does not encourage that. I would hate to manage a Riak cluster without automation tools. Technically I hate to manage ANYTHING without automation. You guys already point your customers in the right direction with being so operationally friendly. You HAVE a working developer-friendly setup in the 3-node quick start. Shit, you guys have even made EVERY erlang packagers life easy with Rebar.

Proper role of the package manager

Debian packages should NOT be looked to as best practices for how to install your software. Pre and Post steps in RPM packages should have never been invented.

IMHO (a very strong O), the role of the package manager should be to lay bits on the disk. Nothing more. I would even argue that creating users is not even the role of the package manager. Let's not even get into creating the user as a system user or not (useradd -r) which some packages do and some don't. I might not want /etc/skel copied over for that user.

Package manager's opinionated workflow are great for desktop systems. Debian packages are fairly decent at ensuring they never trample all over user changes. However servers are not desktops. The configuration is (or at least should be) a know state before the software is even installed. Default configuration files are useless for pretty much everyone.

Lay the bits on disk, ship well documented example configs and support storing persistent data in customizable locations. If you really want to make it easier for developers and the riak-curious to get started, ship a shell script that enables "developer-mode" - copy localhost-only configs over to real configs and start the service for the user (just don't add it to the system startup scripts!).

As someone who's going to be standing up a large multi-datacenter Riak cluster that will be automated with Chef in the near future, please don't make my job any harder ;)

include_recipe "xfs"
include_recipe "apt"
include_recipe "jdk::sun"
package "mdadm"
# Clean up ubuntu default mount on Natty AMI
if node[:platform] == "ubuntu" && node[:lsb][:release] == "11.04"
mount "/mnt" do
device "/dev/xvdb"
action [:umount, :disable]
end
mdadm "/dev/md0" do
devices [ "/dev/xvdb", "/dev/xvdc" ]
level 0
action [:create, :assemble]
end
execute "build cassandra filesystem" do
command "mkfs.xfs /dev/md0 -L data"
not_if { File.exists?("/mnt/va/data/cassandra") }
not_if "mount | grep '/mnt'"
end
mount "/mnt" do
device "/dev/md0"
action [:mount, :enable]
end
end
user "cassandra" do
system true
action [:create, :modify, :manage]
home "/var/lib/cassandra"
shell "/bin/bash"
supports :manage_home => true
end
directory "/mnt/va/"
directory "/mnt/va/data"
directory "/mnt/va/data/cassandra" do
owner "cassandra"
group "cassandra"
mode "0750"
end
directory "/var/log/cassandra" do
owner "cassandra"
group "cassandra"
mode "0755"
end
link "/var/lib/cassandra/data" do
to "/mnt/va/data/cassandra"
end
apt_repository "datastax" do
uri "http://debian.datastax.com/enterprise"
components ["stable", "main"]
key "http://debian.datastax.com/debian/repo_key"
action [:add]
end
execute "apt-get update"
bash "cleanup_default_install" do
user "root"
cwd "/var/lib/cassandra/data"
code <<-EOH
rm -rf /var/lib/cassandra/data/system/*
EOH
action :nothing
not_if "test -f /var/lib/cassandra/.va"
end
service "dse" do
action :nothing
notifies :run, resources(:bash => "cleanup_default_install"), :immediately
end
package "dse-full" do
action [:install]
version "#{node[:dse][:cassandra_version]}"
notifies :stop, resources(:service => "dse"), :immediately
end
file "/var/lib/cassandra/.va" do
action :create_if_missing
end
if node.run_list.roles.include?("dse_hadoop_node")
include_recipe "dse::hadoop"
end
if node.run_list.roles.include?("dse_cassandra_node")
include_recipe "dse::cassandra"
end
okens = search(:cassandra, "id:tokens").first
cluster_name = node[:dse][:cluster_name]
node_token = tokens['nodes'][node.name] || ""
endpoint_snitch = node[:dse][:endpoint_snitch]
template "/etc/dse/cassandra/cassandra.yaml" do
mode "0644"
owner "root"
group "root"
action :create
source "cassandra.yaml.erb"
variables({:endpoint_snitch => endpoint_snitch, :cluster_name => cluster_name, :node_token => node_token, :seed => node.dse.seed})
end
runit_service "cassandra-hadoop" do
action [ :enable, :start ]
subscribes :restart, "template[/etc/dse/cassandra/cassandra.yaml]"
end
@jaredmorrow
Copy link

My, put down the flaming pitch forks, preface!

I'm going to comment here, and @lusis, if you want to delete this, by all means do so. Let me first say this... holy crap did I hit a hot topic!

So, lets start by clarifying one thing, I never suggested that we were going to have Riak auto-start on install. What I asked in this tweet was if people want packages to auto-start for them on reboot?

The Background

In previous releases, we had a bug in our packaging where not all of the debhelper scripts were properly running. If you know debian packaging, you know that misplacing one dh_ in a rules file somewhere means that lots of things don't happen that you might expect. Moving on. For our next major release 1.1, I had a self imposed task to turn on lintian to clean up some known issues we had in our debian packages. Why was lintian off in the first place? We had it off because embedding erlang in a package auto-fails lintian because erlang includes things like Windows .bat files that are exectubable and lintian for some odd reason take offense as do I Sidenote: I've heard Erlang R15B fixes this, but alas we are shipping R14B04 with 1.1 of Riak due to 15B not making it in time for proper testing. So I turn on lintian, I get 400 errors or some ridiculous number like that, I spend a couple of days fixing issues, and bam now we have like 20 issues and they are all related to stuff in the erlang install I can't fix.

But what about auto-start

I'm getting there, sheesh (I have kids, I use "sheesh" get over it!). When I "fixed" all of our deb issues, one magically great/sucky thing happened. All of the stuff that Debian does by default now works. So the auto-dependencies of shared libraries listing that didn't work before now works... yay! The auto-start on package install now works... booooo! So if you didn't know this already, as a package maintainer you don't need to do anything extra to get the magic of auto-start on install working. If you think to yourself, "hey, I think it sucks to make people create all the rc.d links themselves, I'll just call dh_installinit and have debhelper create all those links for them!" well then you also just decided to have that package start on install. Yes they are linked as one, yes that sucks.

TL;DR I "fixed" our packaging, it creates sane rc.d links to avoid one more annoyance, it also added auto-start when I didn't want it, crap.

Eating our own dogfood

This came up because with every release now we are going to make sure our Riak chef cookbook is up-to-date with the latest release and Sean Carey (not to be confused with our Ruby client dev Sean Cribbs) is finishing the cookbook for 1.1 and testing it thoroughly. So Sean says in our group chat, "so this is an issue, when we install riak with dpkg it start riak before the true app.config is set". WAT? Discussion ensues, I figure out why stuff is auto-starting, he explains how bad that sucks for a proper Chef cookbook, etc., etc.

The proposed solution

So we are going to fix that, but that brings us all the way back to my actual tweet do you as a user / ops guy / dev want the startup scripts linkified in rc.d for you?

Assuming it doesn't auto-start on install (I really don't like that, and get that you all HATE that) do you want something like Riak to startup on reboot, or would you rather take care of all that configuration yourself?

One solution that was proposed, so I don't have to pass the -n flag to dh_installinit to disable it, is to go the route of Pound, Haproxy, and Postgres (at least once-upon-a-time) to create an /etc/default/riak file that contains ENABLED=false and our init script reads that file and doesn't startup if it is false. This seems to be a common practice to get around auto-start on install, while still having everything in place for start-on-boot. So I ask, is this reasonable? It will force you to configure the app.config properly, change that flag, and everything is good to go. As you point out @lusis, it might frighten some users who are just trying things out, so it doesn't come without some downsides.

So yeah, let me know here (if that's okay with @lusis?), tweet at me at _jared, message @jaredmorrow on github, ping me on Freenode in #riak (I'm Spyplane in there, it was an old throwback name I had, don't taze me bro). Thanks for all the feedback on twitter btw, I'm sure glad I didn't actually suggest auto-starting on install! Phew.

@jordansissel
Copy link

@jaredmorrow - seeing more lame crap you end up dealing with in debian packages (dh_installinit, etc) only makes me happier that I made fpm. Building debian packages without any debian tools was the best move I've made in packaging ;)

Regarding startup scripts, I tend to split things up a bit judiciously, so I would have a 'riak' package for riak itself, then a 'riak-startup' (swap 'startup' for your favorite word) for the init script. This is mainly so you could do things like have riak-upstart, riak-sysvinit, riak-daemontools, and folks who use a particular service launcher can install the specific thing they need.

This is a good compromise between folks who use config management and folks who don't, I think.

@tels7ar
Copy link

tels7ar commented Feb 2, 2012

Jared I think your proposed solution is just fine. Set up the framework in a way that allows people or automation to easily enable autostart if they chose. Set all the defaults to no autostart.

Those of us who deal with packaging all the time are a prickly bunch because software packaging sucks SO HARD. There are two factors here:

  1. the open ended nature of packaging tools
  2. the gap between devs and ops

Traditionally, developers (disconnected from operations) create software package. This is wrong. The purpose of packages is to move software in to production. How can you write package scripts if you don't have an understanding of the operational environment? This is further compounded by the open ended nature of packaging tools. They don't enforce good policy about things like autostart. That leaves the package creators (developers) to guess.

The result of this is broken packages that are not automatable.

Note that I'm not advocating for ops to create packages without developer involvement. I'm saying that the devops approach is needed here. You have to open up communication channels. Set up a culture where sysadmins are involved in the packaging process. Do cross-team codereviews. Have sysadmins give presentations to developers about how the operational environment works. There are lots more ways to encourage this sort of communication. The result of this will be much improved software packages.

Now I realize that much (most?) software isn't packaged with the intent of installing it in the same organization or company. If that's your use case, that's fine too. I'm sure you have ops folks in your company, right? Invite them in to the process. Even if they aren't directly installing your software, they will still be able to help with operability issues.

Bottom line: operations must be involved in the process of crafting software packages. Let's devops this whole thing, people!

edit Seeing Jordan's comment reminds me that fpm might often be the right way to go here. fpm concentrates on just creating the package, not implementing policy around it. I think a world where everyone used fpn to create packages would be just fine because we wouldn't end up with all these horrible package scripts. :)

I'm philiph on twitter.

@siebenmann
Copy link

As a sysadmin I prefer packages to not autostart things but to set themselves up so that fully enabling them is easy once they've been configured. Ideally it should take a single standard-for-the-system command to enable all of the daemon starting (update-rc.d on Debian, chkconfig or the like on CentOS, etc), and this enabling should definitely include starting on reboot.

@jaredmorrow
Copy link

@jordansissel I watched your FPM talk, and have looked at it heavily for inspiration. It was long past due and a great gift to the community btw! We have some internal workings of something similar that is more heavily based around erlang and rebar so we can sanely package multiple erlang apps in a similar fashion.

@tels7ar packaging is hard. That's it.

No seriously we have several different users of riak. We have paying customers who mostly go the route of using our Chef cookbook or our profserv people to do the install. Understandably, it is a big priority to get things 100% good there, and how we figured out that Something Changed (tm) in our packages this go around. We also have OSS users of riak who use it production, some use Chef, some take our packages and know exactly what to do with them, and some even take our source and build their own packages (the NetBSD community did this). Lastly we have people who download the OSS version to just try it out. They will most likely put it on an EC2 free micro instance that's 32bit ubuntu. Getting things working perfect all these places is hard, possibly impossible, but it still needs to be done. I think the latter group wants it to autostart and just work, but they are probably the only group who wants that. So understandably when there are clashes in wants and needs, we will lean towards what we think is sane and what will make sense for the majority of our users. Auto-start on install is not wanted by 98% of people, and it doesn't make sense for riak, so we won't do that. Creating rc.d startup scripts though is something I think many people DO want, but we need to put a stop gap in there to make sure it has been configured first. We have ops / profserv people and they are all involved, but I asked on twitter mostly to get a view from outside our company. IMHO that is a REALLY good thing to do if you have to make dogmatic decisions like what a package is going to do when installed and configured. I'm certainly glad I did and got the feedback that I did. So I appreciate your comments.

@tjake
Copy link

tjake commented Feb 3, 2012

@lusis We did disable autostart in DSE 1.0.1 release. I'm not aware of anything using /tmp in DSE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment