Skip to content

Instantly share code, notes, and snippets.

@d4nyll
Created March 15, 2017 13:57
Show Gist options
  • Save d4nyll/7267c1f88bf4f0e6ab5c4c4f72cdc2a4 to your computer and use it in GitHub Desktop.
Save d4nyll/7267c1f88bf4f0e6ab5c4c4f72cdc2a4 to your computer and use it in GitHub Desktop.
The Comprehensive Introduction to Docker
Docker is an open-source project that provides the tools and ecosystem to build and run applications inside containers.
In this article, we will first give you a conceptual overview of what a container is, and how it is different from a virtual machine (VM). Then we'll look into what a Docker container is specifically. Lastly, we will do a walk-through deploying a simple application using Docker.
The application we'll be deploying will be based on Meteor, which uses MongoDB as its database. We will deploy the application in one container, MongoDB on another container, and set it up so they can communicate with each other. Don't worry if you're not familiar with Meteor or MongoDB, we don't assume any prior knowledge. We also won't spend any time on explaining how they work - our focus will be solely on understanding Docker.
> After going through this tutorial, you'll be able to deploy your own app.
> Don't worry if some of the conceptual stuff sounds alien to you at first, that's normal - read through the article once, follow the walk-through, and then revisit the concepts - it'll make a lot more sense after!
Without further ado, let's get started!
## What problems are Docker solving?
Whenever we decide to use a tool, the first question we should ask is "What problem is it trying to solve?".
Docker, Kubernetes, 'DevOps' have been all the hype over the last few years, and the temptation is to simply use a tool because everyone is using it. This [Hype-Driven Development](https://blog.daftcode.pl/hype-driven-development-3469fc2e9b22#.17wxqo305) is harmful and should be avoided.
For example, our team decided to use [Redux Thunk](https://github.com/gaearon/redux-thunk) over [`redux-saga`](https://github.com/redux-saga/redux-saga) despite the popular recommendations because there are less to learn, and both can achieve the same goals. Likewise, we picked [Radium](https://github.com/FormidableLabs/radium) over [Styled Components](https://styled-components.com/).
To understand the problems Docker is solving, let's quickly think about the typical workflow of developing and deploying an application *without* Docker.
###### Development and Deployment Workflow without Docker
First, each of your team would set up their local machine to be able to develop on your chosen platform. Some of your team might develop on Ubuntu, some on MacOS, and others on Windows. So to ease set up, you might have written several `setup.sh` scripts that new team members can just run to set up their local environment, one for each environment.
And when your application is ready to be tested, you'll spin up a new <abbr title="Virtual Private Server">VPS</abbr> instance, upload your source code and run the `setup.sh` to set up the staging environment. However, you've most likely forgot to include several dependencies that you assumed was present (like `curl`), or because the server is running a different Linux distribution to your machine, the shell script doesn't work at all. You'll then have to spend hours installing the required packages and setting the right environment variables. And if you're disciplined, you'll update your `setup.sh` script with the new steps.
And after a few weeks of testing and bug-fixing, it's finally ready to be deployed to production. You'd repeat the same steps again.
A few months have passed and your platform really took off all over the world. Now, in order to reduce latency, you want to run your application from different physical locations around the world. So you repeat the server set up for each location and place a load balancer in front of it.
The above workflow is manageable, but there are many deficiencies:
* Although we have a `setup.sh` script, it does not guarantee that the resulting environment is going to be the same; this is especially relevant if you do not lock the versions of the packages you are using. For example, running the script in January 2017 might have install version `1.2.3` of package `X`; doing the same in March 2017 might have defaulted to the `1.3.0` version. It might just turns out that another packages break with `1.3.0`.
* Updates, new software installation, security updates would have to be manually performed on each individual instance in a consistent manner, to ensure the staging environment accurately reflect the production environment.
###### How Docker resolve these issues
Docker solves the above issues by:
* Instead of sharing the set-up script, you'd actually use Docker to *run* the set-up script, and share the *image* that was generated. This image contains all the dependencies the application needs to run the application, including platforms such as Ubuntu. So when you run a container based on this image, it doesn't need to download anything. This also ensures the versions of software you used when you're building the image would be the one used everywhere.
* Since every container is based on the same image, to make updates, you only have to update your build script, build the image, test that the image still behaves as expected, and then deploy new containers using this new image, and switch old container instances over. This ensures your containers are all consistent with each other.
Overall, using Docker ensures consistency across your deployed application instances, minimizes errors and, with tools like Kubernetes, make managing scalable infrastructure much easier.
## Introduction to Containers
Docker uses existing *container* technology under the hood, so let's take a look at containers!
#### Conceptual Overview
Software developers like to put things into boxes - be it functions, classes, or modules/packages - this is known as modularization.
A good architecture also ensures that each box only performs actions in a single domain - e.g. an module that is used sends SMS messages should not also be responsible for processing payment - this is known as [Separation of Concerns](https://en.wikipedia.org/wiki/Separation_of_concerns).
By modularizing our code into standalone units with a single concern, our code become much more reusable and easier to manage. If we want to replace an old feature, all we need to do is remove the old module and replace it with a new one, without it affecting the entire codebase.
![](http://blog.brew.com.hk/content/images/2016/11/cordova-dependency-tree.png)
> Modern code employs a modular structure. Above is the dependency graph for the [Cordova npm package](https://www.npmjs.com/package/cordova), where each node is a separate module.
Likewise, when deploying applications, we should keep each application isolated inside its own *container*, and ensure each application provides a single service, following the [*Service-Orientated Architecture (SOA)*](https://en.wikipedia.org/wiki/Service-oriented_architecture) principle.
SOA ensures each component is as light-weight as possible by removing redundant programs, and ensures independence and provides portability for the containers:
* If we need to spawn a new instance of the web application because of increased traffic, we can do so without also spawning a new instance of the database or web server
* Since each component is independent, we can apply version control to them individually
<table>
<tr>
<th></th>
<th>Application Structure</th>
<th>Deployment Architecture</th>
</tr>
<tr>
<td>Modularization</td>
<td>Packages / Modules</td>
<td>Containers</td>
</tr>
<tr>
<td>Separation of Concerns</td>
<td>Single Responsibility Principle</td>
<td>Service-Orientated Architecture (SOA)</td>
</tr>
</table>
For example, a simple social media platform application may be split into four parts:
* The core application, written in Python, or Node, or any other language
* A MongoDB database that our core application talks to
* A search service that uses ElasticSearch to return search results
* An NGINX web server to handle requests/responses
You'd have four containers, one for each branch of the application.
#### Implementation
Now we understand why we use containers, let's dig a little deeper and understand, on a high level, how they are implemented.
Linux Containers (LXC) rely on two Linux kernel mechanisms - *control groups* and *namespaces*.
###### Control Groups
Control groups (cgroups) separates processes by groups, and attaches to different subsystems which restricts the resource usage of each group. For example, we can place our application's process into the `foo` cgroup, and attach it to the `memory` subsystem, and restrict our application to using, at most, 50% of the host's memory.
![](http://blog.brew.com.hk/content/images/2017/02/RMG-rule1.png)
<small>Taken from [Red Hat Enterprise Linux's Resource Management Guide - ⁠1.2. Relationships Between Subsystems, Hierarchies, Control Groups and Tasks](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-Relationships_Between_Subsystems_Hierarchies_Control_Groups_and_Tasks.html)</small>
There are many different subsystems, each responsible for different type of resource, such as CPU, block I/O, and network bandwidth.
> Read more about control groups in our article - [Control Groups in Linux](http://blog.brew.com.hk/control-groups-in-linux/)
###### Namespaces
Namespaces package system resources, such as filesystems, network access e.t.c., and present them to a process. From the view of the process, it does not even know there are resources outside of its allocation.
One of the resources that can be namespaced is process IDs (PIDs). In Linux, process IDs are organized as a tree, with the system's initiation process (e.g. `systemd`) given the PID `1`, and located at the root of the tree.
If we namespace PIDs, we are masking a child process from the rest of the processes, by resetting the root of the child process to have PID of 1. This means descendant processes will treat the child process as if it is root, and they will have no knowledge of any other processes past that point.
![](http://blog.brew.com.hk/content/images/2017/02/linux-kernel-namespace.png)
<small>From [Separation Anxiety: A Tutorial for Isolating Your System with Linux Namespaces](https://www.toptal.com/linux/separation-anxiety-isolating-your-system-with-linux-namespaces)</small>
> You can view your system's process tree by running `pstree` in your terminal.
The combination of these two Linux kernel mechanisms allows us to have containers which are isolated from each other (using namespaces) and restricted in resources (using control groups).
> **Containers vs Virtualization**
> Processes which run inside a container are isolated by namespaces and control group, and *not* by an entire operating system running on emulated hardware. This means processes in a container is ran on the kernel of the host system - this is as efficient as it can get.
> Processes which run inside a virtual machine runs on the kernel of the virtual machine, which is itself a process running on the host system. This is inefficient as we need to run and maintain many processes required to run the operating system before it can run the processes we want it to. This reduces performance and consumes significant CPU time and memory.
## Docker
Linux Containers (LXC) have been around for a decade. Docker isn't reinventing the wheel (nor is it trying to), but is providing a **standard** way to define, build and run containers. Docker have also nurtured the container ecosystem, by providing tools that abstracts low-level process (like managing control groups) away from the end-user.
For example, Docker allows you to define your container's configuration in a Dockerfile, and 'extend' from other containers. It also provides a hub where you can share container images so people don't have to build their own from scratch.
> See [What does Docker technology add to just plain LXC?](https://docs.docker.com/engine/faq/#/what-does-docker-technology-add-to-just-plain-lxc) for more details.
#### Containers, Images and Layers
When dealing with individual containers (i.e. not concerning with clusters), the picture is very simple, and consists of three concepts - containers, images and layers.
###### Containers
Docker containers, apart from being able to be shared easily, has an extra feature/constraint that standard Linux containers do not - it must be self-dependent. This means all dependencies required by the container, including platforms and the actual application code, are packaged inside the container.
Those dependencies are provided by the *image* that the container runs on top of. So you can view a container as just a running instance of a image.
###### Images
An image is simply an ordered list of layers, which are each representations of the changes in the filesystem.
Since everything in Linux is a file, what these filesystem changes really represents are operations, such as the running of installation scripts. Therefore, an image is really just an environment that resulted from sequential operations that was ran to set up the environment.
###### Layers
For example, for our application to run, we'd need to have an environment which has NVM, NodeJS v7.4.0, and the yarn package manager installed. Furthermore, we'd need an isolated filesystem as well as setting some environment variables.
So, the operations we need to set up this environment includes:
* installing NVM
* Use NVM to install Node and npm
* Use npm to install yarn
* Setting up environment variables
Each of these operation produces a layer, which can be view as a snapshot of the image at this point of the set-up process. The next operation would then operate on the last layer, and builds on top of it.
In the end, you get an ordered list of sequentially-dependent layers, which makes up the image.
#### Running a Container
When running a container, a new writable *container layer* is created on top of the read-only image (composed of read-only layers). Any file changes are contained within the container layer.
> If a file from the image is needed to be changed, the diff is stored in the container layer - **the image layers are never changed**.
Summing it up with a real-world example - the Ubuntu 15.04 image is composed of 4 layers, each of which are read-only. When we start a container based on this image, Docker creates a new container layer. We can mess around inside this container, add files, change files, install new software, break things etc. When we exit the container, because all the changes are contained in the container layer, all the changes will be discarded. Neither the image nor the host system are affected.
![](http://blog.brew.com.hk/content/images/2017/02/container-layers.jpg)
> We won't go into too much details here, but you *can* persists files from a container by writing to a mounted volume, and you *can* keep the changes in your current container by creating a new image based on those changes using [`docker commit`](https://docs.docker.com/engine/reference/commandline/commit/).
#### Creating an image
The last piece of the puzzle is how to actually create an image - this is done by the Docker daemon.
The Docker daemon takes in a *Dockerfile*, which is simply a list of *instructions*. The daemon would execute those instructions in order, to build up the image, where each instruction in the Dockerfile corresponds to a layer in the image.
> Since the Dockerfile is a text file, we are, in essence, defining a Docker container with code.
## Walkthrough
That's all the theory you'll need for now. Let's actually create an application, build the images required to run it, and deploy it.
> Docker can be ran from any major operating system. From here on, we will only show the steps for Ubuntu 16.04 machine.
> Installation instructions for other platforms can be found in the [official documentation](https://docs.docker.com/engine/installation/)
#### Installation
Docker is on the official Ubuntu repository, but that version is likely to be out-of-date. So instead, we will download Docker from Docker's own official repository.
First, ensures packages can be downloaded using HTTPS:
$ sudo apt-get install -y --no-install-recommends apt-transport-https ca-certificates curl software-properties-common
Next, add the Docker [<abbr title="GNU Privacy Guard">GPG</abbr>](https://www.gnupg.org/) key, which simply enables you to verify that the Docker package you downloaded has not been corrupted.
$ curl -fsSL https://apt.dockerproject.org/gpg | sudo apt-key add -
The above command uses `curl` to download the GPG key from `https://apt.dockerproject.org/gpg` (the official website), and then add it to the apt key.
We then verify that the key has the fingerprint `58118E89F3A912897C070ADBF76221572C52609D` (this is published publicly on the Docker website).
```
$ apt-key fingerprint 58118E89F3A912897C070ADBF76221572C52609D
pub 4096R/2C52609D 2015-07-14
Key fingerprint = 5811 8E89 F3A9 1289 7C07 0ADB F762 2157 2C52 609D
uid Docker Release Tool (releasedocker) <docker@docker.com>
```
After we have added the GPG key and verified it, we can add Docker's official repository to `apt`'s own list of repositories.
$ sudo add-apt-repository "deb https://apt.dockerproject.org/repo/ubuntu-$(lsb_release -cs) main"
We have now added the Docker repository to `apt`'s list of repositories. Now we need to update the local package list with those from this new repository.
$ sudo apt-get update
Next, use `apt` to install Docker from Docker's repository that we just added.
$ sudo apt-get -y install docker-engine
And Docker is installed!
###### Docker Engine, Daemon, client
When we talk about 'Docker', what we actually mean is [*Docker Engine*](https://www.docker.com/products/docker-engine). The Docker Engine consists of:
* Docker daemon (runs as a background process):
![](http://blog.brew.com.hk/content/images/2017/02/engine-dia.png)
* a lightweight container runtime that runs your container
* tools that you need to build your images
* tools to handle a cluster of containers, such as networking, load balancing etc.
* Docker client - a command line interface that allows you to interact with the Docker daemon.
The Docker daemon and client, together, makes up the Docker Engine. This is similar to how `npm` and `node` gets bundled together.
Docker daemon exposes a REST API, which the Docker client uses to interact with the Docker daemon. This is similar to how the `mysql` client interacts with the `mysqld` daemon, or how your terminal shell provides you with an interface to interact with your machine.
![](http://blog.brew.com.hk/content/images/2017/02/docker-engine.png)
If you're on Linux, the Docker client is an application that runs on your command line; otherwise there are native applications for Mac ([Docker for Mac](https://docs.docker.com/docker-for-mac/)) and Windows ([Docker for Windows](https://docs.docker.com/docker-for-windows/)).
#### Verifying the install
If we installed the Docker client correctly, `docker` should now be registered as an executable. And if our daemon is installed correctly, we should be able to send commands and receive output.
We can check they are both installed properly by simply running a Docker command. Let's check Docker's version by running `docker version`.
```
$ sudo docker version
Client:
Version: 1.13.1
API version: 1.26
Go version: go1.7.5
Git commit: 092cba3
Built: Wed Feb 8 06:50:14 2017
OS/Arch: linux/amd64
Server:
Version: 1.13.1
API version: 1.26 (minimum version 1.12)
Go version: go1.7.5
Git commit: 092cba3
Built: Wed Feb 8 06:50:14 2017
OS/Arch: linux/amd64
Experimental: false
```
Great, Docker was installed successfully! Now, let's go build our application.
#### Downloading and Installing Containers
We are going to run our application on Ubuntu, so we need to get the Ubuntu 16.10 image and add more layers to it. But before we do, let's take a very quick look at where our images are stored in our system.
Our images are stored at `/var/lib/docker/aufs`. If we `cd` into it (as root), we'll see that there are nothing there. This is because we have no images yet.
```
# cd /var/lib/docker/aufs; tree
.
├── diff
├── layers
└── mnt
```
We'll come back later after we've downloaded an image.
Since Ubuntu is a very common image, we can find it on [Docker Hub](https://hub.docker.com/) - the official repository for Docker images.
Go to [`hub.docker.com`](https://hub.docker.com) and Search for `ubuntu`, and pick the one with the most 'pull's ([this one](https://hub.docker.com/_/ubuntu/)).
![](http://blog.brew.com.hk/content/images/2017/02/docker-hub-search.png)
We can download and install a image from a repository to our local environment by running [`docker pull <image-name>`](https://docs.docker.com/engine/reference/commandline/pull/). So we can install our Ubuntu image by running:
$ sudo docker pull ubuntu
However, if we did that, it would download Ubuntu 16.04 instead of 16.10. This is because the `16.04` version defaults to the `latest` version, and when we did `docker pull ubuntu`, it was actually only shorthand for `docker pull ubuntu:latest`.
> `latest` is the *tag* for that image
![](http://blog.brew.com.hk/content/images/2017/02/docker-ubuntu-tags.png)
To download our desired version, we just have to specify the corresponding tag - `yakkety`.
```
$ sudo docker pull ubuntu:yakkety
yakkety: Pulling from library/ubuntu
3a635c0fcefb: Pull complete
bf3f7e9b4869: Pull complete
ad323864e1f8: Pull complete
b4d3fc870200: Pull complete
4e69d6ff0e56: Pull complete
Digest: sha256:609c1726180221d95a66ce3ed1e898f4a543c5be9ff3dbb1f10180a6cb2a6fdc
Status: Downloaded newer image for ubuntu:yakkety
```
We can see that our `ubuntu:yakkety` image consists of 5 layers: `3a635c0fcefb`, `bf3f7e9b4869`, `ad323864e1f8`, `b4d3fc870200` and `4e69d6ff0e56`.
We can verify that the image is downloaded properly by running [`docker images`](https://docs.docker.com/engine/reference/commandline/images/).
```
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu yakkety 31005225a745 3 weeks ago 103 MB
```
And if we go back to `/var/lib/docker/aufs`, we can, indeed, see the layers of this image being downloaded and stored.
```
# cd /var/lib/docker/aufs; tree -L 2
.
├── diff
│   ├── 762e4923ec0604527e40412ea2932acbdf6be75c978c2c08dc8f2beee5940f4f
│   ├── 88541e3592919a42b38a3587c9f4414c85591329de1af122bd63bc82af3630cd
│   ├── d8d2f38cf52ee3abf887f438e7d0ac7cfbcbd1ef6bdfef466fc285c631095c55
│   ├── de9e2aa6bdc4623f6f6481097fd52ee000961bc02f4258d5ad2da618f13a2a16
│   └── f16feac2df8f22371e9e516fd614bb71f12b5f5227c8cd42f7040b9460637fb4
├── layers
│   ├── 762e4923ec0604527e40412ea2932acbdf6be75c978c2c08dc8f2beee5940f4f
│   ├── 88541e3592919a42b38a3587c9f4414c85591329de1af122bd63bc82af3630cd
│   ├── d8d2f38cf52ee3abf887f438e7d0ac7cfbcbd1ef6bdfef466fc285c631095c55
│   ├── de9e2aa6bdc4623f6f6481097fd52ee000961bc02f4258d5ad2da618f13a2a16
│   └── f16feac2df8f22371e9e516fd614bb71f12b5f5227c8cd42f7040b9460637fb4
└── mnt
├── 762e4923ec0604527e40412ea2932acbdf6be75c978c2c08dc8f2beee5940f4f
├── 88541e3592919a42b38a3587c9f4414c85591329de1af122bd63bc82af3630cd
├── d8d2f38cf52ee3abf887f438e7d0ac7cfbcbd1ef6bdfef466fc285c631095c55
├── de9e2aa6bdc4623f6f6481097fd52ee000961bc02f4258d5ad2da618f13a2a16
└── f16feac2df8f22371e9e516fd614bb71f12b5f5227c8cd42f7040b9460637fb4
```
#### Running our container
Now we have our image installed, let's run it using [`docker run`](https://docs.docker.com/engine/reference/commandline/run/).
$ sudo docker run ubuntu:yakkety
You'll see that the command ran, but produced no output of any kind.
The container was actually created based on the image, but because we didn't specify anything for it to run, the container immediately exits.
We can see this by running [`docker ps`](https://docs.docker.com/engine/reference/commandline/ps/), to see a list of all containers that is running and that have ran in the past.
```
$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
134987627b66 ubuntu:yakkety "/bin/bash" 2 minutes ago Exited (0) 2 minutes ago clever_einstein
```
As you can see, it was created and exited almost at the same time.
To allow us to actually interact with the container, we need to prevent the container from exiting, and also requests a terminal so we can run commands in the container. We can do this by passing in the `--interactive` and `--tty` flags, or `-it` for short.
```
$ sudo docker run -it ubuntu:yakkety
root@dc8255d00fe1:/# ls
bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys tmp usr var
```
As you can see, we're now inside the container, acting as the `root` user. We can run nasty commands, like `rm -rf /` and all the damage would be contained within the container.
```
$ sudo docker run -it ubuntu:yakkety
root@ff22e03b15c6:/# rm -rf --no-preserve-root /
...
root@ff22e03b15c6:/# ls
bash: ls: command not found
root@ff22e03b15c6:/# exit
```
<!--
We can even run a fork bomb inside it, but we just need to make sure we limit the resource usage (CPU and memory) of our container first. (**I wouldn't actually run this** because you can't exit from it)
```
$ sudo docker run --cpus=0.3 -m=300m -it ubuntu:yakkety
root@d63601cfec6e:/# :(){ :|:& };:
[1] 12
```
This will freeze up your container, but your host machine will continue to work as normal, because the container only consumes, at most, 300 megabytes of memory, and 0.3 core equivalent of my 8-core machine.
```
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d63601cfec6e ubuntu:yakkety "/bin/bash" 4 minutes ago Up 4 minutes frosty_mcnulty
```
It takes a long time to `exit` from a fork bomb, so if you ran the above, you might want to restart your machine.
-->
## Overview of Dockerfile
So far, you've learnt about how to download an image from Docker Hub and run it. That's sufficient if you just want to run off-the-shelf software like a database, but what if you want to containerize your own application?
To do that, you'd need to write a *Dockerfile*. A Dockerfile is a text file, where each line consists of an *instruction* followed by one or more *arguments*.
```
INSTRUCTION arguments
```
There are many types of instructions available, we will briefly go through each one before moving on to writing our own Dockerfile.
> For full details of all instructions and argumnts, read the official [Dockerfile reference](https://docs.docker.com/engine/reference/builder/)
* [`FROM`](https://docs.docker.com/engine/reference/builder/#/from) - specifies which Docker image this image is based on (the *base image*). Each Dockerfile ***must*** have a `FROM` instruction as the first instruction. For example, if we intend for our application to run on a Ubuntu machine, then we'd specify `FROM ubuntu`
* [`RUN`](https://docs.docker.com/engine/reference/builder/#/run) - specifies the command(s) to run *at build time*. Each `RUN` command would be a new layer in our image.
* [`ENTRYPOINT`](https://docs.docker.com/engine/reference/builder/#/entrypoint) - specifies the path to the executable (along with its arguments) that should be ran when initiated with `docker run <your-image>`. If this is not specified, it defaults to the shell (`/bin/sh -c`)
* [`CMD`](https://docs.docker.com/engine/reference/builder/#/cmd) - specifies the default command to pass to the `ENTRYPOINT` when you run `docker run`. There should only be one, and only one, `CMD` instruction in a Dockerfile. If multiple are provided, the last one will be used.
> `ENTRYPOINT` and `CMD` both determines what gets executed when the image is ran. `ENTRYPOINT` specifies the *binary* that should be executed, and `CMD` used to specify the default arguments that are passed into the entrypoint.
> You may find some Dockerfile with just a single `CMD` instruction, specifying an executable, e.g. `CMD ["acommand"]`. What is actually being executed is, in fact, `/bin/sh -c acommand`, so you can't tell the difference.
* [`ADD`](https://docs.docker.com/engine/reference/builder/#/add) - copies files and directories from the context (more on this later) to the container image
* [`COPY`](https://docs.docker.com/engine/reference/builder/#/copy) - very similar to `ADD`, except it does not support remote URLs, it does not unpack archive files and it does not invalidate cached `RUN` instructions (even if the contents have changed). View `COPY` as a light-weight version of `ADD`. You should use `COPY` over `ADD` whenever possible.
* [`WORKDIR`](https://docs.docker.com/engine/reference/builder/#/workdir) - change the working directory for any `RUN`, `CMD`, `ENTRYPOINT`, `COPY` and `ADD` instructions that comes after the `WORKDIR`
* [`ENV`](https://docs.docker.com/engine/reference/builder/#/env) - set environment variables that are available during build- *and* run-time
* [`ARG`](https://docs.docker.com/engine/reference/builder/#/arg) - define variables that can be defined at build-time (not run-time) by passing the `--build-arg <varname>=<value>` flag into `docker build`
> `ENV` and `ARG` both provides variables during build time, but `ENV` values also persists into the built image. In cases where `ENV` and `ARG` variables share the same name, the `ENV` variable takes precedence
* [`EXPOSE`](https://docs.docker.com/engine/reference/builder/#/expose) - informs Docker which port(s) the container listens to at runtime. **N.B. Despite its name, `EXPOSE` does not expose the port from the container to the host, it merely tells Docker that the container would be listening to that port.**
There are other, less commonly used instructions:
* [`ONBUILD`](https://docs.docker.com/engine/reference/builder/#/onbuild) - allows you to add commands that are to be ran by child images (images which use the current image as a base image). The commands would be run immediately after the `FROM` instruction in the child image.
* [`LABEL`](https://docs.docker.com/engine/reference/builder/#/label) - Allows you to attach arbitrary metadata, in the form of key-value pairs, to the image. Any containers loaded with the image would also carry that label. Uses for labels are very broad; for example, you can use it to enable load balancers to identify containers based on their labels.
* [`VOLUME`](https://docs.docker.com/engine/reference/builder/#/volume) - specify a mount point in the host's filesystem where you can persist data, even after the container is destroyed
* [`HEALTHCHECK`](https://docs.docker.com/engine/reference/builder/#/healthcheck) - specifies commands that are ran at regular intervals to check that the container is not just alive, but functional. For example, if a web server process is running, but unable to receive requests, it would be deemed 'unhealthy'.
* [`SHELL`](https://docs.docker.com/engine/reference/builder/#/shell) - Overrides the default shell used by commands specified using the shell form
* [`USER`](https://docs.docker.com/engine/reference/builder/#/user) - specify the user name or UID to use when building / running the image
* [`STOPSIGNAL`](https://docs.docker.com/engine/reference/builder/#/stopsignal) - specify the system call signal that will be sent to the container to exit
> Instructions are case-insensitive. However, convention is to use UPPERCASE.
> You can also add comments in Dockerfiles using hashes `#`. E.g.
> # This is a docker comment
#### Writing your own Dockerfile
Now we have a rough idea of what each instruction mean, let's try to build our Dockerfile.
We want to base our application on Ubuntu 16.10, so our first line would be:
FROM ubuntu:yakkety
Our application is based on Meteor, which has a handy installation script provided on its [install](https://www.meteor.com/install) page.
curl https://install.meteor.com/ | sh
But our Ubuntu instance wouldn't have `curl` installed, so we need to update our instance's apt repository and install `curl`. So our next two line would look something like this:
RUN apt update && apt install -y curl
RUN curl https://install.meteor.com/ | sh
After those two lines are ran, we'd expect Meteor to be installed. So let's use it to create a demo app. The command we'd normally run is `meteor create <my-app-name>`, so the equivalent instruction would, again, be `RUN`.
RUN meteor create myapp
Next, Meteor requires MongoDB (which we will install and run later in its own container), so we need to tell Meteor where it can expect to find a MongoDB instance. We do this through setting an environment variable.
ENV MONGO_URL=mongodb://localhost:27017/my_app
After our environment variable is set, we can then run the install script that will download the npm packages that Meteor depends on.
RUN cd myapp && meteor npm install
Lastly, we need to specify the command to run when we run the container.
CMD cd myapp && meteor
Now, there were some Meteor-specific packages, arguments and environment variables that I had to set, which is irrelevant to how Docker works, so we've omitted explaining them. Putting together all of the above, we arrive at something like this:
```
FROM ubuntu:yakkety
RUN apt update && apt install -y curl locales && locale-gen en_US.UTF-8 && localedef -i en_GB -f UTF-8 en_US.UTF-8
RUN curl https://install.meteor.com/ | sh
RUN meteor create myapp --allow-superuser
ENV LC_ALL=POSIX MONGO_URL=mongodb://localhost:27017/your_db
RUN cd myapp && meteor npm install
CMD cd myapp && meteor --allow-superuser
```
Obviously, before writing this article, we wrote many versions of the Dockerfile before we got it to work - we missed an environment variable, or we forgot to pass in an argument etc. But that's the point of Docker containers - if you missed a step, you realize it at build time. And you're guaranteed that if one container works, all containers based on that image would work.
> This Dockerfile does not follow best practice, once you've completed this walkthrough, we'd highly encourage you to read [best practices for writing Dockerfiles](https://docs.docker.com/engine/userguide/eng-image/dockerfile_best-practices/)
#### Building your image
The Dockerfile is only a set of instructions on *how* to build an image, it's not the image itself. Generating the image, however, is extremely easy - just run `docker build`.
The `docker build` command builds an image based on the Dockerfile and a *context*. The context is a set of files that are needed to build the image.
```
$ docker build [context] -f [path/to/Dockerfile]
```
For example:
```
$ sudo docker build . -f ./Dockerfile
```
By default, if you don't specify the location of the Dockerfile, Docker would try to find it at the root of the context. So you if you are in the root directory of the context, you can simply run:
```
$ sudo docker build .
```
Furthermore, if you don't specify the context, Docker will default to the current directory you are running the `docker` command from.
The Docker daemon performs some preliminary checks to ensure the Dockerfile is valid and contains no syntax errors. After this check, the Docker daemon begins processing the Dockerfile.
We don't need any additional files for our application, so we're just going to run `sudo docker build .`
```
$ sudo docker build .
Sending build context to Docker daemon 2.048 kB
Step 1/7 : FROM ubuntu:yakkety
yakkety: Pulling from library/ubuntu
3a635c0fcefb: Pull complete
bf3f7e9b4869: Pull complete
ad323864e1f8: Pull complete
b4d3fc870200: Pull complete
4e69d6ff0e56: Pull complete
Digest: sha256:609c1726180221d95a66ce3ed1e898f4a543c5be9ff3dbb1f10180a6cb2a6fdc
Status: Downloaded newer image for ubuntu:yakkety
---> 31005225a745
Step 2/7 : RUN apt update && apt install -y curl locales && locale-gen en_US.UTF-8 && localedef -i en_GB -f UTF-8 en_US.UTF-8
---> Running in b4431ff167d2
Generation complete.
---> 30c3d0d98485
Removing intermediate container b4431ff167d2
Step 3/7 : RUN curl https://install.meteor.com/ | sh
---> Running in 48b06f6882b0
---> 7dedac65d957
Removing intermediate container 48b06f6882b0
Step 4/7 : RUN meteor create myapp --allow-superuser
---> Running in 4c6a44c06edc
---> 1ff7f3e6288c
Removing intermediate container 4c6a44c06edc
Step 5/7 : ENV LC_ALL POSIX MONGO_URL mongodb://localhost:27017/your_db
---> Running in a460d9a11506
---> b2d0ae114e5f
Removing intermediate container a460d9a11506
Step 6/7 : RUN cd myapp && meteor npm install
---> Running in c95c4617640e
---> c13b0fd21209
Removing intermediate container c95c4617640e
Step 7/7 : CMD cd myapp && meteor --allow-superuser
---> Running in afe5ee6cdd7b
---> f8594829d730
Removing intermediate container afe5ee6cdd7b
Successfully built f8594829d730
```
Let's go through the output together.
The Docker daemon looks at the first instruction (which must always be a `FROM` instruction) and creates an intermediate (temporary) container based on the base image specified (`ubuntu:yakkety`). This returns the image with an ID of `31005225a745`.
```
Sending build context to Docker daemon 2.048 kB
Step 1/7 : FROM ubuntu:yakkety
yakkety: Pulling from library/ubuntu
3a635c0fcefb: Pull complete
bf3f7e9b4869: Pull complete
ad323864e1f8: Pull complete
b4d3fc870200: Pull complete
4e69d6ff0e56: Pull complete
Digest: sha256:609c1726180221d95a66ce3ed1e898f4a543c5be9ff3dbb1f10180a6cb2a6fdc
Status: Downloaded newer image for ubuntu:yakkety
---> 31005225a745
```
The daemon would then run the second command inside that intermediate container, and the resulting container is outputted as a new image (with ID `30c3d0d98485`). This new image is *committed* and its unique ID returned to `stdout`. The intermediate container's job is done and is removed.
```
Step 2/7 : RUN apt update && apt install -y curl locales && locale-gen en_US.UTF-8 && localedef -i en_GB -f UTF-8 en_US.UTF-8
---> Running in b4431ff167d2
Generation complete.
---> 30c3d0d98485
Removing intermediate container b4431ff167d2
```
When the third command is ran, the daemon, again, creates a new intermediate container based on the image outputted in the previous step, run the command, and output another image. This carries on until all commands have been ran. The last image outputted by this process becomes the image that was built.
Creating an image for each step in the Docker file allows us to create containers from any point in the image's history, similar to source control.
Our image is now built! We can check this by running `docker images`
```
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> f8594829d730 5 minutes ago 859 MB
ubuntu yakkety 31005225a745 4 weeks ago 103 MB
```
You'll see the `ubuntu:yakkety` image our Meteor container is based on, and our Meteor container itself. We're going to be working with another container in a moment, and remembering which container has which ID is cumbersome, so it's best we give our containers easy-to-remember names.
```
$ sudo docker build -t myapp .
Sending build context to Docker daemon 2.048 kB
Step 1/7 : FROM ubuntu:yakkety
---> 31005225a745
Step 2/7 : RUN apt update && apt install -y curl locales && locale-gen en_US.UTF-8 && localedef -i en_GB -f UTF-8 en_US.UTF-8
---> Using cache
---> 30c3d0d98485
Step 3/7 : RUN curl https://install.meteor.com/ | sh
---> Using cache
---> 7dedac65d957
Step 4/7 : RUN meteor create myapp --allow-superuser
---> Using cache
---> 1ff7f3e6288c
Step 5/7 : ENV LC_ALL POSIX MONGO_URL mongodb://localhost:27017/your_db
---> Using cache
---> b2d0ae114e5f
Step 6/7 : RUN cd myapp && meteor npm install
---> Using cache
---> c13b0fd21209
Step 7/7 : CMD cd myapp && meteor --allow-superuser
---> Using cache
---> f8594829d730
Successfully built f8594829d730
```
This time, our command returned almost instantly. If you look carefully, you'll see `---> Using cache` printed in the output. Docker realizes that the instruction in the Dockerfile matches exactly those which it already ran before, and so it uses the image cached in `/var/lib/docker/aufs` instead of running the instructions again.
If we check `docker images` again, you can see our container image now has a name.
```
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
myapp latest f8594829d730 9 minutes ago 859 MB
ubuntu yakkety 31005225a745 4 weeks ago 103 MB
```
#### Linking containers
Next, we need to download, run and link another container running MongoDB with our application container.
```
$ sudo docker run -d --name MeteorMongo mongo
Unable to find image 'mongo:latest' locally
latest: Pulling from library/mongo
5040bd298390: Pull complete
ef697e8d464e: Pull complete
67d7bf010c40: Pull complete
bb0b4f23ca2d: Pull complete
8efff42d23e5: Pull complete
3df9f20d1d07: Pull complete
7b43ac0a1517: Pull complete
010dcda0f65b: Pull complete
ec68d17240b3: Pull complete
Digest: sha256:0d4453308cc7f0fff863df2ecb7aae226ee7fe0c5257f857fd892edf6d2d9057
Status: Downloaded newer image for mongo:latest
1fbb6288c2f408351ef71b58241e5c66a353cbe3ef4e3771f4bc886928aa0f49
```
Here, we simply ran `docker run` without running `docker pull` first, because `docker run` will automatically detect whether the image is downloaded locally, and if not download it for us. We used `docker pull` previously just to go more in-depth into each step.
We passed in the `--name` argument to give the container a name.
```
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7f1abd1baf5c mongo "/entrypoint.sh mo..." 16 seconds ago Up 15 seconds 27017/tcp MeteorMongo
```
> Don't get confused! We used `-t` when building the image to give the *image* a name. We are passing in the `--name` argument into `docker run` to give the *container* a name.
> Remember - a container is just a running instance of an image.
We're nearly there! The last step is to run our Meteor application, and link it to our `MeteorMongo` container, so they can talk to each other.
$ sudo docker run --link=MeteorMongo:mongodb myapp
The above command would run or Meteor application on port `3000` inside the container, and allow it to communicate with the MongoDB container. But because it is ran inside a container, the outside world (i.e. the host system's `localhost`) cannot bind to the container's internal port `3000`.
To make our running application available to the outside, we can specify the [`p` (publish) flag](https://docs.docker.com/engine/reference/commandline/run/#/publish-or-expose-port--p---expose-1), which maps the host's port (here it's `4567`) with the container's port `3000`.
$ sudo docker run -p 127.0.0.1:4567:3000 --link=MeteorMongo:mongodb myapp
It takes a while to start the first time, but after a while, we can access our demo Meteor application on `127.0.0.1:4567`!
![](http://blog.brew.com.hk/content/images/2017/02/meteor-demo-app.png)
## Exploring Further
We've went through *a lot* in this article. It might be a good time to take a break and re-read the beginning, which hopefully makes a lot more sense to you now.
We have yet to touch on the vast Docker ecosystem, so we encourage you to explore those tools yourself:
* [Docker Cloud](https://cloud.docker.com/)
* [Docker Trusted Registry (DTR)](https://docs-stage.docker.com/docker-trusted-registry/)
* Docker Universal Control Plane (UCP)
* [Docker Compose](https://docs.docker.com/compose/overview/)
* [Docker Machine](https://docs.docker.com/machine/overview/)
And if you're interested in hosting Docker containers in a cluster, then you'll have to use even more tools like:
* [CoreOS]() - a lightweight Linux distribtion aimed at cluster environment, where it expects every application running on it be a container
* [etcd](https://github.com/coreos/etcd) - service discovery tool
* [Kubernetes](https://kubernetes.io/) - scheduler
We hope this article gave you a comprehensive insight into working with containers! Thank you for reading!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment