sa2ajj/braindump.rst

## braindump.rst

      
    Raw
  

              braindump.rst
            
          
    Riak with Docker: Brain Dump


Note
For the actual food for mind see https://gist.github.com/sa2ajj/5323326#file-summary-rst

This document _tries_ to outline important items that need to be covered in
order to get riak running with docker.
Please note that this is an outline of what I'm trying to do, not a
step-by-step instruction (though it might become one day).
(It's possible that this document will end up somewhere else, but for now it
just lives here @ gist.github.com)
This document is vaguely based on the excellent documentation offered by basho.

Overview

Pre-requisites
Initial Setup
Prepare the initial image
Perform common configuration
/etc/riak/app.config
/etc/riak/vm.args


Perform node specific configuration
/etc/riak/app.config
/etc/riak/vm.args


Setup the actual cluster
Normal Operation
Perform a node upgrade


Pre-requisites

docker offers a different approach for using linux containers.
The main difference is docker's container does not have to have a full
installation of a guest os: you may install only as little as necessary.
Docker is being very actively developed, so grab the best version from github:
$ git clone git://github.com/dotcloud/docker.git


Initial Setup

Please bear in mind that this might be a suboptimal setup.
The goal is to create a riak cluster with 5 nodes (as the minimum number of
nodes recommended for riak).
Prepare the host directory structure:
$ mkdir ~/riak-cluster
$ cd ~/riak-cluster
$ for i in seq 1 5; do mkdir -p node$i/{etc,data,log}; done

Things to check:

directory ownership
host and container user "mapping"(?)


Prepare the initial image

One way to start is to use base image available at the docker's registry.  (Please note that at least for now, registry
server does not offer any fancy UI.)

Warning
The command below will actually fail as riak packages are not available
from standard repositories.
Next version of the document will address it properly.

$ docker pull base
$ C_ID=`docker run base apt-get install riak`
$ docker commit -m 'added riak package' $C_ID my/riak-base


Perform common configuration

Riak has two configurations files: /etc/riak/app.config and
/etc/riak/vm.args.  Both files have parameters that you'd probably like to
share among all nodes, as well as node specific ones.  (Detailed information
about available configuration parameters can be found at
http://docs.basho.com/riak/latest/references/Configuration-Files/.)
The way how you perform the actual configuration is not covered here (for now),
for example, you have a magic script that magically appeared in your image
called riak-magic that does all the configuration for you.  After you run
it, create a new image:
$ C_ID1=`docker run my/riak-base /usr/sbin/riak-magic`
$ docker commit -m 'common configuration is applied' $C_ID1 my/riak-configured

At this point, I'd like to extract the configuration files to the host (as I do
not really know how to maintain them otherwise):
TODO

And place the files in the host hierarchy:
$ for i in seq 1 5; do cp /tmp/app.config /tmp/vm.args node$i/etc/; done

BIG QUESTION: is it necessary to be done inside container??

/etc/riak/app.config

Important (for this use case) parameters are various directories where riak stores data.
Most notable are /var/lib/riak and /var/log/riak.  These do not have to
be changed (less changes, easier to maintain).
The other important parameter is the IP address.  You can make riak listen for
connections coming from anywhere (which is not a problem if you run it in a
dedicated network): use 0.0.0.0 as the IP address for various service:

http(s) interface ({riak_core, http | https})
protobuffer api interface ({riak_api, pb_ip})


/etc/riak/vm.args

Erlang VM allows to establish communication between nodes provided those nodes
have a common cookie set up for them, hence the -setcookie parameter is the
most important common one.

Perform node specific configuration


/etc/riak/app.config

If you put 0.0.0.0 as an address to accept connections to, nothing needs to be done at this step.

/etc/riak/vm.args

-name parameter specifies the node's name.  It mentions node's IP address,
so if each node has own IP address only this part can be modified.  If some
nodes share the same IP address, then the name part (before @) must be
modified as well.
So modify the extracted vm.args for each node in node<I>/etc/vm.args.

Setup the actual cluster

Start the first node:
$ NODE1=`docker run -volume rw:/var/lib/riak=$(PWD)/riak-cluster/node1/data \
                    -volume rw:/var/log/riak=$(PWD)/riak-cluster/node1/log \
                    -d my/riak-configured ...`

Good question: how do I get the container's IP address.
Another good question: do I really need that address?  Maybe I could resort to
locally resolvable FQDN?  (In this case, how docker would handle this??)
For each other node:

Start the container:
$ NODEX=`docker run ...`


Add the node to the cluster:
$ riak-admin cluster join riak@first-node-ip-address


After all nodes are added, review and commit your changes to the cluster:
$ riak-admin cluster plan
$ riak-admin cluster commit

Now it should be set...

Normal Operation

Just run the thing:
$ docker run -d ...

The important bit is that we need to retain certain things between runs:

IP address
content of /var/lib/riak (or other location that was specified to store riak's data)


Perform a node upgrade

Nothing special:

Stop the node
Upgrade my/riak-configured
If necessary, update common configuration
Start the node


## summary.rst

      
    Raw
  

              summary.rst
            
          
    Riak on Docker: Use Case

Having dumped all the information flowing in my mind related to the use case in
the braindump.rst file, and having thought about what's written, here's the
summary (I hope) of what's important for the use case.
The assumption is that there are two kinds of things:

those that all riak containers must share
those that are that are unique and once configured/established must live
thereafter

Shared things:

riak system itself
riak subsystem configuration (like backend configuration, ring creation size)
any riak based application (e.g. map/reduce functions that are part of that
application)
erlang cookie (otherwise, riak nodes won't be able to talk to each other)

Note

Unique things that live forever (i.e. between container's run or between 'run'
in run+commit sequence):

riak node id (see Node ID for more concerns)
content of /var/lib/riak directory (ring state, storage content)

It seems that the unique things would require some sort of persistent
storage (gh#111)


Life Cycle

Riak cluster would have the following life cycle elements:

prepare initial image

this image would have:

riak installed
app related components installed (e.g. map/reduce functions)

common configuration would include (in order of importance):

ring creation size
erlang cookie
app related paths configured for riak
storage backend configuration

More information about riak configuration is available at Configuration
Files


actually create cluster

for each node:

configure node id
if it's not the first node, perform riak cluster join <first-node-id>

finally:

review the cluster configuration (riak cluster plan)
commit the changes (riak cluster commit)

Important: see note above


normal run

if a node dies, restart it (automatically would be preferred)
if necessary, stop node, start node


riak upgrade/common riak configuration changes

for each node:

stop node
perform the upgrade
start node

in some cases, it should be enclosed in (riak cluster leave + riak
cluster commit and riak cluster join + riak cluster commit)
Important: see note above


app related components are updated

pretty much the same as the previous element, except leaving/joining most
likely is not required


Other notes


Network Configuration

Network Security and Firewall Configurations
discusses standard configurations and port settings to use when thinking about
how to secure a Riak Cluster.
(Based on the IRC discussion):

it would be a good idea to have support for cross-host shared network (@unclejack)
it might also be a good idea to be able to pick the bridge to put the container on at runtime (@unclejack)


Node ID

There are two ways to specify it: name@ip and name@f.q.d.n
In the first case, ip address must accompany that node throughout its life.
In the second case, there should be a way to always resolve that f.q.d.n to the node's current ip address.
Neither seem to be possible at the moment (a RFC is at moby/moby#353)