Create a gist now

Instantly share code, notes, and snippets.

@lusis /index.md
Last active Aug 25, 2017

Embed
What would you like to do?
Docker "Best Practices"

This is a copy/paste from an internal wiki on how we should use docker

This guide is to serve as an outline of internal best practices to using Docker. The idea is to give enough information to allow engineers to create containers for new stack components while minimizing the cleanup required to make them production ready.

Concepts

Before we get to the practices, it's important to understand WHY we are doing this. The key concepts we're concerned about with broad Docker usage are provenance, determinism, repeatability and auditability.

Provenance

Provenance refers to knowing WHERE something comes from. Generally with the end use of software, this is easy:

  • Maven artifacts are pulled from central
  • OS Packages are pulled from the distro vendor

With Docker, this gets more difficult. While you may be simply using the foo/bar image to get a baseline install of bar, because of Docker's layered design, that actual image could include 5 other images from multiple other sources. You have to trace the layers all the way back and audit each one. To solve for this, we have our own base that EVERYTHING should be built upon as the origin base image. For example:

  • the java8 container imports from docker.internal/debian:wheezy and not debian:wheezy (which would pull from Docker hub) or worse just using FROM debian
  • the tomcat:7.0.54 container is built on the docker.internal/java8:8u51 container
  • IAM application containers are built on the docker.internal/tomcat:7.0.54 container

This guarantees that we have a clear line of provenance all the way back to the beginning of that final container which is actually running the application. Note that this also helps reduce the amount of diskspace used per container.

Determinism

Docker images are artifacts the same as jar files or system packages. They are intended to be deterministic versioned resources. For this reason you should NEVER EVER EVER use latest unversion docker FROM statements. You may have noticed that in the Provenance section, we used explicit versions for everything. This was intentional. Using only our base images plus version constraints ensures that we get the same deterministic resource every single time. When you use unversioned docker containers with questionable provenance, you're doing the exact same thing as running unsigned SNAPSHOT jars from a non-authoritative maven source. Don't do this.

Repeatability

This is mostly covered in determinism but we want to ensure that we can rebuild an EXACT duplicate of any point in time final Docker image. You should be able to check out a repo, build the Docker image and get the same resultant image as being run elsewhere. You should never run system package upgrades/updates in a Dockerfile anywhere but the base layer. You should never curl or wget unversioned artifacts from anywhere in a Dockerfile. This immediately means that the next time someone runs a docker build, they will get something different than you. If you MUST pull in an additional system package, you have to use the versioned statement format of the tool (i.e. apt-get install foo=1.1.1-1). If you must pull in third party artifacts and they are unversioned, you should mirror a versioned format internally and pull from there instead.

Auditability

This is also important. When using Docker, you've taken a problem that was already O(n) and making it worse.

As an example: With individual systems, you know that you have 15 systems running CentOS and 10 systems running Ubuntu. With containers, you may have reduced to 10 systems running Ubuntu but now you have n containers per host. Each of those containers may be using 4 different base distros. You check the news and discover there's a new openssl vulnerability. Patching the base ubuntu systems is "easy" but how do you know which containers are vulnerable? You don't. Docker has solved this problem by FINALLY introducing metadata. Combined with clean provenance, you can add metadata to the final artifact that can be queried. This metadata also stacks with the layers. From our chain described above, we have the following labels per layer:

stormcloud/debian:wheezy

  • LABEL stormpath.distro.name="debian"
  • LABEL stormpath.distro.release="wheezy"

stormcloud/java8

  • LABEL stormpath.java.version="8u51"
  • LABEL stormpath.java.provider="sun"

stormcloud/tomcat:7.0.54

  • LABEL stormpath.tomcat.version="7.0.54"

This means that I can semi-easily query all of my containers and find out what versions of critical software are installed. If I need to upgrade a system package (such as openssl) or something like the jdk, I can test that upgrade all the way up.

this was also in that section r.e. security

Short list for now:

  • Don't run services in the container as root. Create a dedicated user and run it. Root in the container is root on the host
  • Don't run Docker hub images in prod ever. (Trust me)

If you must use a docker hub image, build it yourself from the source dockerfile after auditing it.

Things to look for in Dockerfile that should make you run screaming

  • curl -k
  • no dedicated user
  • hardcoded passwords
  • binary files pulled in from non-ssl sites
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment