bwalsh/dockerized-tools-checklist.md

## dockerized-tools-checklist.md

      
    Raw
  

              dockerized-tools-checklist.md
            
          
    Dockerized Tool Installation Checklist

Note: the following instructions have been validated with a CentOS 6.6 virtual machine
Overview

This document outlines the installation of Docker clients (v1.6), a Docker registry (v2.0), approaches on updating Nginx configurations, and testing methodologies for verifying Dockerized tools work in Galaxy.
Installing Docker on compute and gateway nodes


Enable EPEL Yum repository. This allows us to install Docker 1.5.
> sudo yum install epel-release


Install Docker 1.5.
> sudo yum install docker-io


Upgade the Docker binary to version 1.6.
Note that we can install version 1.5 via the repositories, but we're going to want to upgrade to version 1.6 so that we can take advantage of the new Docker Registry architecture.
> curl -sSL -O https://get.docker.com/builds/Linux/x86_64/docker-1.6.2
> chmod +x docker-1.6.2
> sudo mv docker-1.6.2 /usr/bin/docker
> sudo chown root:root /usr/bin/docker
Why are we doing this? We can use the same services that come with the
1.5 distribution (i.e., service docker start and chkconfig) with
the 1.6 binary.


Start the Docker daemon and make sure it starts on boot.
> sudo service docker start
Starting cgconfig service:                [  OK  ]
Starting docker:                          [  OK  ]
> sudo chkconfig docker on
I've tested this with restarting the VM to ensure that the Docker daemon is running.


Verify install.
> sudo docker version
Client version: 1.6.2
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 7c8fca2
OS/Arch (client): linux/amd64
Server version: 1.6.2
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 7c8fca2
OS/Arch (server): linux/amd64


Installing Docker Registry on gateway node.

Note that there is a "development" image available where one can set up a registry pretty quickly with docker pull registry. For production purposes, they recommend that you build your own:

Docker's public registry maintains a default registry image to assist you in the deployment process. This registry image is sufficient for running local tests but is insufficient for production. For production you should configure and build your own custom registry image from the docker/distribution code.

We're going to build our own.


Download the 2.0 release of the Docker Registry.
The new and improved registry is a Docker image that we will build ourselves. First by downloading the registry distribution:
> cd /opt
> curl -L https://github.com/docker/distribution/archive/v2.0.0.tar.gz | sudo tar xz
> sudo mv distribution-2.0.0 docker-registry


Build our private registry.
Following the official documentation for setting up a registry, let's jump to our registry location:
> cd /opt/docker-registry
Make a directory to store our certs.
> sudo mkdir certs
Create our SSL certificates.
> sudo openssl req \
     -newkey rsa:2048 -nodes -keyout certs/domain.key \
     -x509 -days 365 -out certs/domain.crt

Note: If you want to use non-self-signed certs, place them in certs/ prior to building the image.

Following the directions for setting up the self-signed cert.
Run sudo vim cmd/registry/config.yml and update the http section to be:
http:
    addr: :5000
    secret: asecretforlocaldevelopment
    debug:
            addr: localhost:5001
    tls:
        certificate: /go/src/github.com/docker/distribution/certs/domain.crt
        key: /go/src/github.com/docker/distribution/certs/domain.key

Note: We may want to revisit this configuration in the future. It has nifty options for using Redis, better loggin solutions, notifications, etc.

Build the image:
> sudo docker build -t registry .

Note: Do NOT push this image up to the registry: it contains the self-signed certs that we just created.

Make a directory (on the host system) where we're going to store images. We'll be mounting this when we run the registry.
> sudo mkdir /opt/docker-registry-images
Run the image in the background:
> sudo docker run -d -v /opt/docker-registry-images:/tmp/registry-dev -p 5000:5000 registry:latest
We should be able to hit the /V2/ API endpoint:
> curl -k https://localhost:5000/v2/
{}
We can even take a public image, like ubuntu and push it to our private registry. Note that we have to re-tag an image that's built with the registry location then push it.
> sudo docker pull ubuntu
> sudo docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
registry            latest              c63473db2ed9        11 minutes ago      545.8 MB
golang              1.4                 ca0f230b927e        2 weeks ago         517.2 MB
ubuntu              latest              07f8e8c5e660        2 weeks ago         188.3 MB
> sudo docker tag ubuntu:latest localhost:5000/ubuntu
> sudo docker push localhost:5000/ubuntu
We can verify that the image is on the private registry:
> curl -k https://localhost:5000/v2/ubuntu/tags/list
{"name":"ubuntu","tags":["latest"]}
You'll also notice that since we mounted a volume from the host system to the container to store images, we'll have our images stored in /opt/docker-registry-images:
> ls /opt/docker-registry-images/docker/registry/v2/repositories


ubuntu
```
K! We have the private registry set up! Just a few more steps...


Configuring the registry to run as a service.
At this point, we can stop our Registry container (docker stop XXX, where XXX is the container ID from docker ps.)
Here is a sample init.d script that allows us to start and stop the registry as a service. To install:
> sudo curl https://gist.githubusercontent.com/slnovak/e82ace6b5f323da4cdb5/raw/33bd8d6a3efa76a7f5845b79c6830aa75b995931/registry > /etc/init.d/registry
> sudo chmod a+x /etc/init.d/registry
> sudo chmod 755 /etc/init.d/registry
After that, ensure that the registry gets started at boot:
> sudo chkconfig registry on
(Note that in that startup script, we're loading it after the Docker init script and unloading before when shutting down.)

Note: I still haven't been able to get chkconfig to load the registry successfully after a reboot in my VM. This needs to be investigated more.


Update /etc/hosts on compute nodes so they can access the gateway node under a common network name.
This is a pretty cruicial step. Since we tag images based on the network location of the private registry, that same network name is going to be used in the Galaxy tool configuration.
I recommend the alias ccc. Why? When a user creates a Galaxy tool and creates their .xml file, if they're specifying a Docker container, they'll need to include the domain and port information. It'd be easiest if we could do something along the lines of:
<tool id="smalt_wrapper (docker)" name="SMALT" version="0.0.3">
  <requirements>
	<container type="docker">ccc/smalt-galaxy/latest</container>
  </requirements>
  <description>maps query reads onto the reference sequences</description>
  <command>
    smalt_wrapper.py 
      --threads="4"
Note: ccc/smalt-galaxy/latest. This assumes that the registry is available at https://ccc:443/v2/...). If we have a clumsy name like ccc_gateway_node and we're running the registry on a non-standard port, all Galaxy tool .xml configurations would have to use a container configuration to match that network config, like ccc_gateway_node:5000/smalt-galaxy/latest.


Configure Nginx configuration to act as a reverse proxy traffic to Registry container for SSL traffic.
Given that the above is running, we need to update our Nginx configuration so that we can proxy SSL connections from the host to the container. The trick here, I think, is to use the option for proxy_ssl_session_reuse:

Determines whether SSL sessions can be reused when working with the proxied server. If the errors “SSL3_GET_FINISHED:digest check failed” appear in the logs, try disabling session reuse.

A sample Nginx configuration that could be helpful is:
server 
{
    listen      443 default ssl;
    server_name galaxy;
    access_log  /tmp/nginx_reverse_access.log;
    error_log   /tmp/nginx_reverse_error.log;
    root        /usr/local/nginx/html;
    index       index.html;

    ssl_session_cache    shared:SSL:1m;
    ssl_session_timeout  10m;
    ssl_certificate /etc/nginx/ssl/example.com.crt;
    ssl_certificate_key /etc/nginx/ssl/example.com.key;
    ssl_verify_client off;
    ssl_protocols        SSLv3 TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers RC4:HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;


   location /
   {
        proxy_pass  https://localhost:5000;
    }
}


Update firewall configurations to whitelist traffic to Registry.
Only HTTPS traffic from gateway and compute nodes should be able to access the Registry.
After that, Docker clients should be able to access our Registry! We can test by SSHing into a compute node and running:
> curl -k https://ccc/v2/
{}


### Testing ###
We can easily wrap one of the pre-existing tools with a base Docker image to verify that our private registry is working as expected.

1. **Pull a public base image.**

    From the gateway node, run:
   
    ```bash
    > sudo docker pull centos:6.6
    ```
   
2. **Push that image to our private registry.**

    On the gateway node, run:
    
    ```bash
    > sudo docker tag centos:6.6 ccc/mytool/latest
    ```
    
    Once that's done, we should be able to push that image to the registry:
    
    ```bash
    > sudo docker push ccc/mytool/latest
    ```
    
    > *Note*: If you weren't able to get the `/etc/hosts` entry up and running and you're able to test with local workers, you would want to use `sudo docker tag centos:6.6 localhost:5000/mytool/latest `.
    
3. **Update a tool to use that Docker image.**

    Pick a simple tool, like `galaxy/tools/filter/cutWrapper.pl`. Modify the top of the XML file so that it reads:
    
    ```xml
    <tool id="Cut1" name="Cut" version="1.0.2">
      <requirements>
    	<container type="docker">ccc/mytool/latest</container>
      </requirements>
      <description>columns from a table</description>
      <command interpreter="perl">cutWrapper.pl $input "$columnList" $delimiter $out_file1</command>
      <inputs>
        <param name="columnList" size="10" type="text" value="c1,c2" label="Cut columns"/>
        <param name="delimiter" type="select" label="Delimited by">
          <option value="T">Tab</option>
          <option value="Sp">Whitespace</option>
    ```
    
    (Note the added `<requirements>` block.)
    
    Push this tool to Galaxy and run the tool to see if it works. The compute node should pull the correct image from `ccc/mytool/latest` on the gateway registry.
    
### Next steps ###

1. Identify software requirements for each Docker tool. Some tools may require a custom software installed (bwa, picard, etc) or custom Python packages installed (`pip install numpy`, etc).

2. Identify what pipelines we're going to want to Dockerize.

3. Fine-tune the Registry configuration. How can we have better logging? Can we use Redis-based caching to improve performance? Do we want notifications when new images are pushed up? Do we want authentication on pushing new images?