Skip to content

Instantly share code, notes, and snippets.

@matti

matti/docker.md Secret

Created March 13, 2018 11:19
Show Gist options
  • Star 26 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save matti/0b44eb865d70d98ffe0351fd8e6fa35d to your computer and use it in GitHub Desktop.
Save matti/0b44eb865d70d98ffe0351fd8e6fa35d to your computer and use it in GitHub Desktop.

Docker

Docker CLI basics

The Docker CLI is usually referenced as the "docker engine", see docs at https://docs.docker.com/engine/reference/commandline/cli/

docker has plenty of sub commands that are given when you just run docker

docker info

Let's see what's happening in our docker environment

$ docker info

gives information from our docker daemon (server) and from this info we can see our current server version

Server Version: 17.09.1-ce

and that we have something like

CPUs: 1
Total Memory: 1.952GiB

available for our containers. On Linux these numbers will match the host machine, but on Mac/Win this will be the number of the CPUs/RAM in the virtual machine.

mac/win only:

Before moving forward, let's change change the defaults: open Docker preferences from the top menu icon. Then, under "advanced" set the number of CPUs to max or max-1 and hit apply & restart. While the machine is restarting the docker info will return:

Error response from daemon: Bad response from Docker engine

Which means that the docker CLI can not connect to the server. After the machine has restarted, we can see that the docker info will match our settings.

Noe let's enter the virtual machine either with:

$ screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty

and hit enter to get something on the screen.

Or, we can also get in with a docker container from the internets:

$ docker run --rm -it --privileged --pid=host walkerlee/nsenter -t 1 -m -u -i -n sh

and then we can see with

$ uname -a
  Linux moby 4.9.49-moby #1 SMP Fri Dec 8 19:40:02 UTC 2017 x86_64 Linux

that this host "moby" is based on the Alpine Linux as we can see from (DEPRECATED IN BETA):

$ cat /etc/alpine-release
  3.5.0

Enough of this VM stuff, let's get started!

Docker images

Command

$ docker images

should return hello-world that was docker runned at the install.

Let's run it a couple more times

$ docker run hello-world
$ docker run hello-world

Like docker info the docker images command connected to docker server. Let's test it quickly:

On a mac we can see that /var/run/docker.sock is linked to the VM's socket:

$ ls -l /var/run/docker.sock
  lrwxr-xr-x  1 root  daemon  56 Dec 13 15:24 /var/run/docker.sock -> /Users/mpa/Library/Containers/com.docker.docker/Data/s60

Luckily curl supports unix sockets nowadays. So to do what docker images just did we can:

$ curl --unix-socket /var/run/docker.sock http://localhost/images/json

we can also list /containers/json etc, see https://docs.docker.com/engine/api/v1.35/#operation/ContainerList

$ curl --unix-socket /var/run/docker.sock http://localhost/containers/json

Let's use proper tools instead of curl and try removing the image by...

$ docker rmi hello-world

...which should fail with an error

Error response from daemon: conflict: unable to remove repository reference "hello-world" (must force) - container 3d4bab29dd67 is using its referenced image f2a91732366c

because the container created from the image is still present as you can (not) see in

$ docker ps

that lists, by default, all the running containers, but since that container is no longer running (it's designed just to output and exit) we need to say

$ docker ps -a

to show all containers in the daemon. Notice that containers have a container ID and container name that is autogenerated to be something like "confident_golick"

When we have a lot different of containers, we can filter the list naturally with grep or such

$ docker ps -a | grep hello-world

or by docker ps filters (https://docs.docker.com/engine/reference/commandline/ps/#filtering)

$ docker ps -a -f "ancestor=hello-world"

Now we could remove the image by force with docker rmi --force hello-world, but it would not remove our containers. If we run

$ docker ps -a | grep hello-world
$ docker ps -a -f "ancestor=hello-world"

It just appers that the containers were removed, because with grep we are matching "hello-world" and our filter uses ancestor matching and that ancestor was removed with rmi --force. If you check docker ps -a or grep docker ps -a | grep hello you can see that our containers are still there.

We could remove the containers one by one with docker rm <container name or id>, but let's train our xargs skills instead first because you're gonna need those with the Docker CLI later on. If you ran the rmi --force, then re-run docker run hello-world again.

$ docker ps -a -q -f "ancestor=hello-world"

The -q will output only the numeric container ID's that we can then use with xargs like:

$ docker ps -aq -f "ancestor=hello-world" | xargs docker rm

This works because the container was stopped. If the container was still running it wouldn't work.

Let's start a container whose image doesn't exit, for example

$ docker run -d nginx

This will download the nginx image and start a container from it to the background with -d (detach), what can be seen with

$ docker ps

Now if we try rm it, it will fail:

$ docker rm $(docker ps -q)
  Error response from daemon: You cannot remove a running container f72c583c982ca686b0826fdc447f04710e78ff6c25dc1ddc7c427cc35eadf5f0. Stop the container before attempting removal or force remove

Now we can either docker rm --force it or docker stop <container id or name> and then docker rm it.

Because xargs works with docker, it means that docker can consume multiple arguments, so you can also docker rm id1 id2 id3

It's common that over time the docker daemon is clogged with images and containers laying around, because it's not natural to clean up everything all the time.

Where do the images come from?

We can search for the images with

$ docker search hello-world

that searches images from https://hub.docker.com/

We get plenty of results like

hello-world
kitematic/hello-world-nginx
tutum/hello-world
...

The hello-world image has a web page at https://hub.docker.com/_/hello-world/ - these images without a prefix (aka org/user) are built from git repositories in https://github.com/docker-library

We really can't know where kitematic/hello-world-nginx is built since this page https://hub.docker.com/r/kitematic/hello-world-nginx/ has no links to any repos. Only thing we know now is that the image is 3 years old.

Also notice that there are no visible guarantees that https://hub.docker.com/_/hello-world/ comes from https://github.com/docker-library/hello-world The "Full Description" has links to that repo, but it may not be true.

In the third tutum/hello-world you can see that it's Automated, so in https://hub.docker.com/r/tutum/hello-world/ the "Source Repository" is linked AND in the tab "Build Details" we can see actually what happened during the builds https://hub.docker.com/r/tutum/hello-world/builds/

There are also other Docker registries, such as https://quay.io/ that competes with Docker Hub. Naturally docker search can not be used to search from these registries, so we have to use the site https://quay.io/search?q=hello and select a result like https://quay.io/repository/nordstrom/hello-world where it's shown how to pull from this registry:

$ docker pull quay.io/nordstrom/hello-world

So by default if the host (here: quay.io) is omitted, it will pull from Docker Hub. From docker engine/daemons/servers/cli's point of it doesn't matter where the image comes from, it just needs to be stored (pulled) locally.

Let's move on to inspect something more relevant than 'hello-world', for example Ubuntu: https://hub.docker.com/r/library/ubuntu/ - that is one of the most common Docker images to use as a base for your own image.

The description/readme says:

What's in this image? This image is built from official rootfs tarballs provided by Canonical (specifically, https://partner-images.canonical.com/core/).

From the links we can guess (not truly know) that the image is built from https://github.com/tianon/docker-brew-ubuntu-core - So from a guy named "Tianon Gravi" who describes him with "bash, debian, father, gentoo, go, perl, tron, vim, vw; basically nine years old" in his Github profile.

In that git repository's README in https://github.com/tianon/docker-brew-ubuntu-core/tree/1637ff264a1654f77807ce53522eff7f6a57b773#scripts-to-prepare-updates-to-the-ubuntu-official-docker-images it says:

Some more Jenkins happens

which means that in somewhere ™️ there's a Jenkins server that runs this script and publishes image to the registry - we have no way of knowing if this is true or not.

Anyway, let's pull this beast:

$ docker pull ubuntu
  Using default tag: latest
  latest: Pulling from library/ubuntu

Since we didn't specify a tag, we got latest that is usually the last build and pushed image to the registry, but in this case the repo readme says that

The ubuntu:latest tag points to the "latest LTS", since that's the version recommended for general use.

From https://hub.docker.com/r/library/ubuntu/tags/ we can see that there are tags like 16.04 which (should) give us the guarantee that the image is based on Ubuntu 16.04. Let's pull that now:

$ docker pull ubuntu:16.04
  16.04: Pulling from library/ubuntu
  c2ca09a1934b: Downloading [============================================>      ]  34.25MB/38.64MB
  d6c3619d2153: Download complete
  0efe07335a04: Download complete
  6b1bb01b3a3b: Download complete
  43a98c187399: Download complete

Images are composed of different layers that are downloaded in parallel to speed up the download.

(PRO-TIP: For command line fetching of all available tags we can do something like https://stackoverflow.com/questions/28320134/how-to-list-all-tags-for-a-docker-image-on-a-remote-registry)

We can tag images locally if we wish, for example

$ docker tag ubuntu:16.04 ubuntu:best_version

But actually tagging is also a way to "rename" the image:

$ docker tag ubuntu:16.04 best_distro:best_version

Now we create a new container with uptime as the command by saying

$ docker run best_distro:best_version uptime
  18:09:26 up 55 min,  0 users,  load average: 0.00, 0.01, 0.00

Mac/win only: Again, notice how the uptime is the uptime of your moby virtual machine.

Let's see how our image was really built from https://hub.docker.com/r/_/ubuntu/ by clicking our 16.04 Dockerfile link: https://github.com/tianon/docker-brew-ubuntu-core/blob/85822fe532df3854da30b4829c31878ac51bcb91/xenial/Dockerfile

We get to the Dockerfile that specifies all the commands that were used to create this image.

The first line states that the image starts from a special image "scratch" that is just empty. Then a file ubuntu-xenial-core-cloudimg-amd64-root.tar.gz is added to the root from the same directory: https://github.com/tianon/docker-brew-ubuntu-core/tree/85822fe532df3854da30b4829c31878ac51bcb91/xenial

This file should be the "..official rootfs tarballs provided by Canonical" mentioned earlier, but it's not actually coming from https://partner-images.canonical.com/core/xenial/current/, it's copied to the repo owned by "tianon". We could verify the checksums of the file if we are interested.

Notice how the file is not extracted at any point, this is because the ADD documentation states in https://docs.docker.com/engine/reference/builder/#add that "If is a local tar archive in a recognized compression format (identity, gzip, bzip2 or xz) then it is unpacked as a directory. "

We can be pretty sure that the ubuntu:16.04 that we just downloaded is this image, because

$ docker history --no-trunc best_distro:best_version

Matches with the directives specified in the Dockerfile. We could also build the image ourselves if we really wanted - there is nothing special in the "official" image and the build process is, as we saw, truly open.

Running containers

Let's run a container in the background

$ docker run -d --name looper ubuntu:16.04 sh -c 'while true; do date; sleep 1; done'
  2a49df3ba735c8a9b813c11f1c842606c1e94a6265c7c0bd5bd988cf942b8149

And check that it's running

$ docker ps
  CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
  2a49df3ba735        ubuntu:16.04        "sh -c 'while true..."   6 seconds ago       Up 1 second                             looper

Because we gave --name looper to the container, we can now reference it easily:

$ docker logs -f looper
  Mon Jan 15 19:25:53 UTC 2018
  Mon Jan 15 19:25:54 UTC 2018
  Mon Jan 15 19:25:55 UTC 2018
  ...

Now, in another terminal try

$ docker pause looper

And see how logs -f paused

$ docker unpause looper

Attach to the container:

$ docker attach looper
  Mon Jan 15 19:26:54 UTC 2018
  Mon Jan 15 19:26:55 UTC 2018
  ...

Now you have logs (STDOUT) running in two terminals. Now in the attach window press control+c. Th container is killed, because the ^C is passed on to the process with pid 1 (sh).

Start the container again and attach to it with --sig-proxy=false that disables signal proxying. Then when you hit ^C ...

$ docker start looper
$ docker attach --sig-proxy=false looper
  Mon Jan 15 19:27:54 UTC 2018
  Mon Jan 15 19:27:55 UTC 2018
  ^C

The container will stays running, just disconnecting you from the STDOUT.

To enter our container, we can start a new process in it.

$ docker exec -it looper bashdocker exec -it looper bash
  root@2a49df3ba735:/# ps aux
  USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  root         1  0.0  0.0   4496  1716 ?        Ss   10:31   0:00 sh -c while true; do date; sleep 1; done
  root       271  0.0  0.0   4496   704 ?        Ss   10:33   0:00 sh
  root       300  0.0  0.0  18380  3364 pts/0    Ss   10:33   0:00 bash
  root       386  0.0  0.0   4368   672 ?        S    10:33   0:00 sleep 1
  root       387  0.0  0.0  36836  2900 pts/0    R+   10:34   0:00 ps aux

In our command -it is short for -i and -t where -i is "interactive, connect STDIN" and -t "allocate a pseudo-TTY". From ps aux listing we can see that our bash process got pid 300. We can terminate the container with kill 1 or, exit and

$ docker kill looper
$ docker rm looper

The previous two commands would be basically the same as docker rm --force looper

Let's start another process with -it and also with --rm to remove it automatically after it has exited. This means that there is no garbage containers left behind, but also that docker start can not be used to start the container after it has exited.

$ docker run -d --rm -it --name looper-it ubuntu:16.04 sh -c 'while true; do date; sleep 1; done'
  7d4b4e097931e2aafc62ee9be31bbc58f47b631ead04bd7d2c8dba3abc148137

Now let's attach to the container and hit control+p, control+q that detaches us from the STDOUT. This sequence can be set with --detach-keys

$ docker attach looper-it
  Mon Jan 15 19:50:42 UTC 2018
  Mon Jan 15 19:50:43 UTC 2018
  ^P^Qread escape sequence

Note that hitting ^C would still kill (and remove due to --rm) the process because the docker attach was done without --sig-proxy=false

My first image

Let's create a file called myfirst/Dockerfile (https://docs.docker.com/engine/reference/builder/)

FROM ubuntu:16.04

WORKDIR /mydir
RUN touch hello.txt
COPY local.txt .
RUN wget http://example.com/index.html
  • WORKDIR will create and set the current working directory to /mydir after this directive
  • RUN will execute a command with /bin/sh -c prefix - Because of WORKDIR this is essentially same as RUN touch /mydir/hello.txt
  • COPY adds a local file to the second argument. It's preferred to use COPY instead of ADD when you are just adding files (ADD has all kinds of magic behaviour attached to it)

Then we'll build it by running build with context argument . which means that we have to be in the same directory (we could run this build from another directory and then give the path here)

$ docker build .

This fails in the COPY because the local.txt doesn't exist. Fix that and build again to see the next error.

Before fixing the next error now notice how all steps that modify the image will say ---> Using cache - this is because the Docker daemon caches all the operations for speed. Changing any build directive will invalidate all the caches after that line.

Now we will find out that wget doesn't exist in the Ubuntu base image. We'll need to add it with apt-get as this is Ubuntu. But, if we just add:

RUN apt-get install -y wget

It will fail because the apt sources are not part of the image to bring down the size (and they would be old anyway). When we add lines

RUN apt-get update
RUN apt-get install -y wget

the image should build nicely and at the end it will say something like Successfully built 66b527252f32 where the 66b527252f32 is a random name for our image. This is not ideal, because now we need to separately docker tag 66b527252f32 myfirst to have a sensible name for it, so let's run it again to also tag it:

$ docker build -t myfirst .

Before running our image we have a looming problem ahead of us: because apt-get update is run in a separate step that is cached. If we add another package in the apt-get install -y line some other day, the sources might have changed and thus the installing will fail. When something depends on another command, it's best practise to run them together, like this:

RUN apt-get update && apt-get install -y wget

Now let's run our image - note that we don't have to give a command (to be run in the container) after the image since the ubuntu base image sets it to bash at the last line: https://github.com/tianon/docker-brew-ubuntu-core/blob/1637ff264a1654f77807ce53522eff7f6a57b773/artful/Dockerfile#L47

$ docker run -it myfirst
  root@accf99660aeb:/mydir# ls
  hello.txt  index.html  local.txt

Our WORKDIR was last set to /mydir so our inherited bash command is started in that directory. Also note how our hostname accf99660aeb equals the container name. Before exiting the container, let's create one file (in addition to the files created by our Dockerfile)

$ touch manually.txt
$ exit

Now we can use diff to compare changes between our image myfirst and container:

$ docker diff accf99660aeb
  C /mydir
  A /mydir/manually.txt
  C /root
  A /root/.bash_history

What we discover is that in addition to our manually.txt file, bash "secretly" created a history file. We could create a new image from these changes (myfirst + changes = newimage) with

$ docker commit accf99660aeb myfirst-pluschanges

Let's try creating a new container from the new image, this time by setting the command to "ls -l". Also notice how we don't have to allocate pseudo-TTY or connect STDIN since our command is not interactive (and will exit anyway immediately)

$ docker run myfirst-pluschanges ls -l
  total 4
  -rw-r--r-- 1 root root    0 Jan  5 11:59 hello.txt
  -rw------- 1 root root 1270 Aug  9  2013 index.html
  -rw-r--r-- 1 root root    0 Jan  5 12:18 manually.txt

And as expected, our manually.txt file is now in the image.

Now let's start moving towards a more meaningful image. youtube-dl a program that downloads youtube videos https://rg3.github.io/youtube-dl/download.html Let's add it to the image - but this time instead of doing it directly in Dockerfile, let's try another approach that is sometimes easier than our current process where we add things to it and try to see if it builds. This time we'll open up an interactive session and test stuff beforehand "storing" it in our Dockerfile. By following the youtube-dl install instructions blindly we'll see that...

$ docker run -it myfirst
  root@8c587232a608:/mydir# sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
  bash: sudo: command not found

..sudo is not installed, but since we are root we don't need it now, so let's try again without...

root@8c587232a608:/mydir# curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
bash: curl: command not found

..and we see that curl is not installed either - we could just revert to use wget, but as an exercise, let's add curl with apt-get since we already have the apt sources in our image (that hopefully are still valid)

$ apt-get install -y curl
$ curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl

Then we'll add permissions and run it:

$ chmod a+rx /usr/local/bin/youtube-dl
$ youtube-dl
  /usr/bin/env: 'python': No such file or directory

Okay - On the top of the youtube-dl download page we'll notice that

Remember youtube-dl requires Python version 2.6, 2.7, or 3.2+ to work except for Windows exe.

So let's add python

$ apt-get install -y python

And let's run it again

$ youtube-dl
  WARNING: Assuming --restrict-filenames since file system encoding cannot encode all characters. Set the LC_ALL environment variable to fix this.
  Usage: youtube-dl [OPTIONS] URL [URL...]

  youtube-dl: error: You must provide at least one URL.
  Type youtube-dl --help to see a list of all options.

It works (we just need to give an URL), but we notice that it outputs a warning about LC_ALL. In a regular Ubuntu desktop/server install the localization settings are (usually) set, but in this image they are not set, as we can see by running env in our container. So according to https://unix.stackexchange.com/questions/87745/what-does-lc-all-c-do just setting it to LC_ALL=C might be a good fix, so let's try that.

$ LC_ALL=C youtube-dl

Nope, same error. By Googling around you might end up in this thread: https://stackoverflow.com/questions/28405902/how-to-set-the-locale-inside-a-docker-container/41648500, but the best answer is not the most upvoted. To fix this without installing additional locales, see this: https://stackoverflow.com/a/41648500

$ LC_ALL=C.UTF-8 youtube-dl

And it works! Let's persist it for our session and try downloading a video:

$ export LC_ALL=C.UTF-8
$ youtube-dl https://www.youtube.com/watch?v=UFLCdmfGs7E

So now when we know what do, let's add these to the bottom of our Dockerfile - by adding the instructions to the bottom we preserve our cached layers - this is handy practise to speed up creating the initial version of a Dockerfile when it has time consuming operations like downloads.

...
RUN apt-get install -y curl python
RUN curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
RUN chmod a+x /usr/local/bin/youtube-dl
ENV LC_ALL=C.UTF-8
CMD ["/usr/local/bin/youtube-dl"]
  • Instead of using RUN export LC_ALL=C.UTF-8 we'll store the env directly in the image
  • We'll also override bash as our image command (set on the base image) with youtube-dl itself. This won't work, but let's see why.

When we build this as youtube-dl

$ docker build -t youtube-dl .

And run it:

$ docker run youtube-dl https://www.youtube.com/watch?v=UFLCdmfGs7E
  Usage: youtube-dl [OPTIONS] URL [URL...]

  youtube-dl: error: You must provide at least one URL.
  Type youtube-dl --help to see a list of all options.

So far so good, but now the natural way to use this image would be to give the URL as an argument:

$ docker run youtube-dl http://www.youtube.com
  /usr/local/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "exec: \"http://www.youtube.com\": stat http://www.youtube.com: no such file or directory": unknown.
  ERRO[0001] error waiting for container: context canceled

Now our URL became the command (CMD). Luckily we have another way to do this: we can use ENTRYPOINT to define the main executable and then docker will combine our run arguments for it.

ENTRYPOINT ["/usr/local/bin/youtube-dl"]

And now it works like it should:

$ docker build -t youtube-dl .
$ docker run youtube-dl https://www.youtube.com/watch\?v\=UFLCdmfGs7E
  [youtube] UFLCdmfGs7E: Downloading webpage
  [youtube] UFLCdmfGs7E: Downloading video info webpage
  [youtube] UFLCdmfGs7E: Extracting video information
  [download] Destination: Short introduction to Docker (Scribe)-UFLCdmfGs7E.mp4
  [download] 100% of 3.02MiB in 00:0072MiB/s ETA 00:003

Now there's one more thing in ENTRYPOINT vs CMD that might be confusing - there are two ways to set them: exec form and shell form. We've been using the exec form where the command itself is executed. In shell form the command that is executed is wrapped with /bin/sh -c - it's useful when you need to evaluate environment variables in the command like $MYSQL_PASSWORD or similar.

In the shell form the command is provided as a string without brackets. In the exec form the command and it's arguments are provided as a list (with brackets), see the table below:

Dockerfile                              Resulting command

ENTRYPOINT /bin/ping -c 3
CMD localhost                           /bin/sh -c '/bin/ping -c 3' /bin/sh -c localhost

ENTRYPOINT ["/bin/ping","-c","3"]
CMD localhost                           /bin/ping -c 3 /bin/sh -c localhost

ENTRYPOINT /bin/ping -c 3
CMD ["localhost"]"                      /bin/sh -c '/bin/ping -c 3' localhost

ENTRYPOINT ["/bin/ping","-c","3"]
CMD ["localhost"]                       /bin/ping -c 3 localhost

Now we have two problems:

  • Minor: Our container build process creates many layers resulting in increased image size
  • Major: The downloaded files stay in the container

Let's fix the major issue first.

By inspecting docker ps -a we can see all our previous runs. When we filter this list with

$ docker ps -a --last 3
  CONTAINER ID        IMAGE               COMMAND                   CREATED                  STATUS                          PORTS               NAMES
be9fdbcafb23        youtube-dl          "/usr/local/bin/yout…"    Less than a second ago   Exited (0) About a minute ago                       determined_elion
b61e4029f997        f2210c2591a1        "/bin/sh -c \"/usr/lo…"   Less than a second ago   Exited (2) About a minute ago                       vigorous_bardeen
326bb4f5af1e        f2210c2591a1        "/bin/sh -c \"/usr/lo…"   About a minute ago       Exited (2) 3 minutes ago                            hardcore_carson

We'll see that the last container was be9fdbcafb23 or determined_elion for us humans.

$ docker diff determined_elion
  C /mydir
  A /mydir/Short introduction to Docker (Scribe)-UFLCdmfGs7E.mp4

Let's try docker cp command to copy the file (notice the quotes because of our filename that has spaces)

$ docker cp "determined_elion://mydir/Short introduction to Docker (Scribe)-UFLCdmfGs7E.mp4" .

And now we have our file locally. This doesn't really fix our issue, so let's continue:

Volumes: bind mount

By bind mounting a host (our machine) folder to the container we can get the file directly to our machine. Let's start another run with -v option, that requires an absolute path. We mount our current folder as /mydir in our container, overwriting everything that we have put in that folder in our Dockerfile.

$ docker run -v $(pwd):/mydir youtube-dl https://www.youtube.com/watch\?v\=UFLCdmfGs7E

Note: the Docker for Mac/Win has some magic so that the directories from our host become available for the moby virtual machine allowing our command to work as it would on a Linux machine.

Optimizing the Dockerfile

Now we'll fix the minor problem of our Dockerfile being non-logical and not very space efficient. In the first version we have just commands rearranged so that the build process is logical:

FROM ubuntu:16.04
ENV LC_ALL=C.UTF-8

RUN apt-get update && apt-get install -y \
    curl python
RUN curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
RUN chmod a+x /usr/local/bin/youtube-dl

WORKDIR /app
ENTRYPOINT ["/usr/local/bin/youtube-dl"]

We have also changed the WORKDIR to be /app as it's a fairly common convention to put your own stuff in different public docker images. For this image where we essentially download videos, a WORKDIR /videos or similar might also make sense.

In the next phase we'll glue all RUN commands together to reduce the number of layers we are making in our image.

FROM ubuntu:16.04
ENV LC_ALL=C.UTF-8

RUN apt-get update && apt-get install -y \
    curl python && \
    curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl && \
    chmod a+x /usr/local/bin/youtube-dl

WORKDIR /app
ENTRYPOINT ["/usr/local/bin/youtube-dl"]

As a sidenote not directly related to docker: remember that if needed, it is possible to bind packages to versions with curl=1.2.3 - this will ensure that if the image is built at the later date, then the image is more likely to work, because the versions are exact. On the other hand the packages will be old and have security issues.

With docker history we can see that our single RUN layer adds 85.2 megabytes to the image:

$ docker history youtube-dl
  IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
  295b16d6560a        30 minutes ago      /bin/sh -c #(nop)  ENTRYPOINT ["/usr/local...   0B
  f65f66bbae17        30 minutes ago      /bin/sh -c #(nop) WORKDIR /app                  0B
  89592bae75a8        30 minutes ago      /bin/sh -c apt-get update && apt-get insta...   85.2MB
  ...

The next step is to remove everything that is not needed in the final image. We don't need the apt source lists anymore, so we'll glue the next line to our single RUN

.. && \
rm -rf /var/lib/apt/lists/*

Now when we build, we'll see that the size of the layer is 45.6MB megabytes. We can optimize even further by removing the curl. We can remove curl and all the dependencies it installed with

.. && \
apt-get purge -y --auto-remove curl && \
rm -rf /var/lib/apt/lists/*

..which brings us down to 34.9MB

Now our slimmed down container should work, but:

$ docker run -v "$(pwd):/app" youtube-dl https://www.youtube.com/watch\?v\=EUHcNeg_e9g
  [youtube] EUHcNeg_e9g: Downloading webpage
  ERROR: Unable to download webpage: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)> (caused by URLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)'),))

Because --auto-remove also removed dependencies, like:

Removing ca-certificates (20170717~16.04.1) ...

We can now see that our youtube-dl worked previously because of our curl dependencies. If youtube-dl would have been installed as a package, it would have declared ca-certificates as its dependency.

Now what we could do is to first purge --auto-remove and then add ca-certificates back with apt-get install or just install ca-certificates along with other pacakges before removing curl:

FROM ubuntu:16.04
ENV LC_ALL=C.UTF-8

RUN apt-get update && apt-get install -y \
    curl python ca-certificates && \
    curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl && \
    chmod a+x /usr/local/bin/youtube-dl && \
    apt-get purge -y --auto-remove curl && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /app
ENTRYPOINT ["/usr/local/bin/youtube-dl"]

From the build output we can see that ca-certificates also adds openssl

The following additional packages will be installed:
openssl
The following NEW packages will be installed:
ca-certificates openssl

and this brings us to 36.4 megabytes in our RUN layer (from the original 87.4 megabytes)

Our process (youtube-dl) could in theory escape the container due a bug in docker/kernel. To mitigate this we'll add a non-root user to our container and run our process with that user. Another option would be to map the root user to a high, non-existing user id on the host with https://docs.docker.com/engine/security/userns-remap/, but this is fairly a new feature and not enabled by default.

&& \
useradd -m app

And then we change user with the directive USER app - so all commands after this line will be executed as our new user, including the CMD.

FROM ubuntu:16.04
ENV LC_ALL=C.UTF-8

RUN apt-get update && apt-get install -y \
    curl python ca-certificates && \
    curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl && \
    chmod a+x /usr/local/bin/youtube-dl && \
    apt-get purge -y --auto-remove curl && \
    rm -rf /var/lib/apt/lists/* && \
    useradd -m app

USER app
WORKDIR /app
ENTRYPOINT ["/usr/local/bin/youtube-dl"]

When we run this image without bind mounting our local directory:

$ docker run youtube-dl https://www.youtube.com/watch\?v\=UFLCdmfGs7E
  [youtube] UFLCdmfGs7E: Downloading webpage
  [youtube] UFLCdmfGs7E: Downloading video info webpage
  [youtube] UFLCdmfGs7E: Extracting video information
  ERROR: unable to open for writing: [Errno 13] Permission denied: 'Short introduction to Docker (Scribe)-UFLCdmfGs7E.mp4.part'

We'll see that our app user can not write to /app - this can be fixed with chown or not fix it at all, if the intented usage is to always have a /app mounted from the host.

Publishing to Dockerhub

If we want to share our container publicly, we need to tag it as our_dockerhub_username/youtube-dl and login to the docker hub with docker login

$ docker tag youtube-dl mattipaksula/youtube-dl
$ docker push mattipaksula/youtube-dl
  The push refers to a repository [docker.io/mattipaksula/youtube-dl]
  582af28a5d38: Pushed
  22c7a6ee7548: Pushed
  3ff70ce53dac: Mounted from library/ubuntu
  b8e5935ae7cc: Mounted from library/ubuntu
  ba76b502dc9b: Mounted from library/ubuntu
  803030df23c1: Mounted from library/ubuntu
  db8686e0ca43: Mounted from library/ubuntu
  latest: digest: sha256:ad1038acd11ed87ec013b5b7251a02ce4c0e9e8c08acd7070458cc085f3f53ee size: 1775

From the output we can see that the existing shared Ubuntu layers are re-used (mounted) from the library/ubuntu.

Alpine Linux variant

Our Ubuntu base image adds the most megabytes to our image (approx 113MB). Alpine Linux provides a popular alternative base in https://hub.docker.com/_/alpine/ that is around 4 megabytes. It's based on altenative glibc implementation musl and busybox binaries, so not all software run well (or at all) with it, but our python container should run just fine. We'll create the following Dockerfile.alpine file:

FROM alpine:3.7
ENV LC_ALL=C.UTF-8

RUN apk add --no-cache curl python ca-certificates && \
    curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl && \
    chmod a+x /usr/local/bin/youtube-dl && \
    apk del curl && \
    adduser -D app

USER app
WORKDIR /app
ENTRYPOINT ["/usr/local/bin/youtube-dl"]

Notes:

  • The package manager is apk and it can work without downloading sources (caches) first with --no-cache
  • useradd is missing, but adduser exists.
  • Most of the package names are the same - there's a good package browser at https://pkgs.alpinelinux.org/packages

Now when we build this file with :alpine-3.7 as the tag:

$ docker build -t youtube-dl:alpine-3.7 -f Dockerfile.alpine .

It seems to run fine:

$ docker run -v "$(pwd):/app" youtube-dl:alpine-3.7 https://www.youtube.com/watch\?v\=EUHcNeg_e9g

From the history we can see that the our single RUN layer size is 41.1MB

$ docker history youtube-dl:alpine-3.7
  IMAGE...
  ...
  14cfb0b531fb        20 seconds ago         /bin/sh -c apk add --no-cache curl python ca…   41.1MB
  ...
  <missing>           3 weeks ago         /bin/sh -c #(nop) ADD file:093f0723fa46f6cdb…   4.15MB

So in total our Alpine variant is about 45 megabytes, significantly less than our Ubuntu based image.

We can publish both variants by publishing this tag as well:

$ docker tag youtube-dl:alpine-3.7 mattipaksula/youtube-dl:alpine-3.7
$ docker push mattipaksula/youtube-dl:alpine-3.7

OR, we could just replace our Ubuntu image for everybody who might be depending that it is Ubuntu

$ docker tag youtube-dl:alpine-3.7 mattipaksula/youtube-dl

Also remember that unless specified the :latest tag will always just refer to the latest image build & pushed - that can basically contain anything.

docker-compose

Even with a simple image, we've already been dealing with plenty of command line options in both building+pushing and running the image:

$ docker build -t youtube-dl:alpine-3.7 -f Dockerfile.alpine .
$ docker run -v "$(pwd):/app" youtube-dl:alpine-3.7 https://youtube...

Now we'll switch to tool called docker-compose to manage these with YAML. We'll create a file called myfirst/docker-compose.yml:

version: '3.4'

services:
    youtube-dl-ubuntu:
      image: mattipaksula/youtube-dl:ubuntu-16.04
      build: .
    youtube-dl-alpine:
      image: mattipaksula/youtube-dl:alpine-3.7
      build:
        context: .
        dockerfile: Dockerfile.alpine

The version setting is not very strict, it just needs to be above 2 because otherwise the syntax is significantly different. See https://docs.docker.com/compose/compose-file/ for more info. The key build: value can be set to a path (ubuntu) or have an object with context and dockerfile keys.

Now we can build and push both variants with just these commands:

$ docker-compose build
$ docker-compose push

To run the image as we did previously, we'll need to add the volume bind mounts. Compose can work without an absolute path:

version: '3.4'

services:
    youtube-dl-ubuntu:
      image: mattipaksula/youtube-dl:ubuntu-16.04
      build: .
      volumes:
        - .:/app
    youtube-dl-alpine:
      image: mattipaksula/youtube-dl:alpine-3.7
      build:
        context: .
        dockerfile: Dockerfile.alpine
      volumes:
        - .:/app

Now we can run it:

$ docker-compose run youtube-dl-ubuntu https://www.youtube.com/watch\?v\=EUHcNeg_e9g

web services

Compose is really meant for running web services, so let's move from simple binary wrappers to running a HTTP services.

https://github.com/jwilder/whoami is simple service that prints the current container id (hostname).

$ docker run -d -p 8000:8000 jwilder/whoami
  736ab83847bb12dddd8b09969433f3a02d64d5b0be48f7a5c59a594e3a6a3541
$ curl localhost:8000
  I'm 736ab83847bb

Take down the container so that it's not blocking our port 8000 $ docker rm -f 736ab83847bb

Let's create whoami/docker-compose.yml from the command line options (you can also use something like https://github.com/magicmark/composerize)

version: '3.4'

services:
    whoami:
      image: jwilder/whoami
      ports:
        - 8000:8000

Test it:

$ docker-compose up -d
$ curl localhost:8000

Compose can scale the service to run multiple instances:

$ docker-compose up --scale whoami=3
  WARNING: The "whoami" service specifies a port on the host. If multiple containers for this service are created on a single host, the port will clash.
  Starting whoami_whoami_1 ... done
  Creating whoami_whoami_2 ... error
  Creating whoami_whoami_3 ... error

But it will fail with port clash. If we don't specify the host port, a free port will be allocated:

$ docker-compose port --index 1 whoami 8000
  0.0.0.0:32770
$ docker-compose port --index 2 whoami 8000
  0.0.0.0:32769
$ docker-compose port --index 3 whoami 8000
  0.0.0.0:32768

We can curl from these ports:

$ curl 0.0.0.0:32769
  I'm 536e11304357
$ curl 0.0.0.0:32768
  I'm 1ae20cd990f7

In a server environment you'd normally have a load balancer in-front of the service. For local environment (or a single server) one good solution is to use https://github.com/jwilder/nginx-proxy that configures nginx from docker daemon as containers are started and stopped.

Let's add the proxy to our compose file and remove the port bindings from the whoami service. We'll mount our docker.sock inside of the container in :ro read-only mode.

version: '3.4'

services:
    whoami:
      image: jwilder/whoami
    proxy:
      image: jwilder/nginx-proxy
      volumes:
        - /var/run/docker.sock:/tmp/docker.sock:ro
      ports:
        - 80:80

When we start this and test

$ docker-compose up -d --scale whoami=3
$ curl localhost:80
  <html>
  <head><title>503 Service Temporarily Unavailable</title></head>
  <body bgcolor="white">
  <center><h1>503 Service Temporarily Unavailable</h1></center>
  <hr><center>nginx/1.13.8</center>
  </body>
  </html>

It's "working", but the nginx just doesn't know which service we want. The nginx-proxy works with two environment variables: VIRTUAL_HOST and VIRTUAL_PORT. VIRTUAL_PORT is not needed if the service has EXPOSE in it's docker image. We can see that jwilder/whoami sets it: https://github.com/jwilder/whoami/blob/master/Dockerfile#L9

The domain localtest.me is configured so that all subdomains point to 127.0.0.1 (at least at the time of writing) - let's use that:

version: '3.4'

services:
    whoami:
      image: jwilder/whoami
      environment:
       - VIRTUAL_HOST=whoami.localtest.me
    proxy:
      image: jwilder/nginx-proxy
      volumes:
        - /var/run/docker.sock:/tmp/docker.sock:ro
      ports:
        - 80:80

Now the proxy works:

$ docker-compose up -d --scale whoami=3
$ curl whoami.localtest.me
  I'm f6f85f4848a8
$ curl whoami.localtest.me
  I'm 740dc0de1954

Let's add couple of more containers behind the same proxy. We can use the official nginx image to serve a simple static web page. We don't have to even build the container images, we can just mount the content to the image. Let's prepare some content for two services called "hello" and "world".

$ echo "hello" > hello.html
$ echo "world" > world.html

Then add these services to the docker-compose.yml file where you mount just the content as index.html in the default nginx path:

    hello:
      image: nginx
      volumes:
        - ./hello.html:/usr/share/nginx/html/index.html:ro
      environment:
        - VIRTUAL_HOST=hello.localtest.me
    world:
      image: nginx
      volumes:
        - ./world.html:/usr/share/nginx/html/index.html:ro
      environment:
        - VIRTUAL_HOST=world.localtest.me

Now let's test:

$ docker-compose up -d --scale whoami=3
$ curl hello.localtest.me
  hello
$ curl world.localtest.me
  world
$ curl whoami.localtest.me
  I'm f6f85f4848a8
$ curl whoami.localtest.me
  I'm 740dc0de1954

Now we have a basic single machine hosting setup up and running.

Test updating the hello.html without restarting the container, does it work?

Wordpress

Next we'll setup Wordpress that requires MySQL and persisted volume.

In https://hub.docker.com/_/wordpress/ there is a massive list of different variants in Supported tags and respective Dockerfile links - most likely for this testing we can use any of the images. From "How to use this image" we can see that all variants require WORDPRESS_DB_HOST that needs to be MySQL.So before moving forward, let's setup that.

In https://hub.docker.com/_/mysql/ there's a sample compose file under "via docker stack deploy or docker-compose" - Let's strip that down to

version: '3.4'

services:
    db:
      image: mysql
      restart: unless-stopped
      environment:
        MYSQL_ROOT_PASSWORD: example

Notes:

  • Version was updated to 3.4 - but that doesn't change anything in this case
  • restart: always was changed to unless-stopped that will keep the container running unless it's stopped. With always the a stopped container is started after reboot for example.

Under "Caveats - Where to Store Data" we can see that the /var/lib/mysql needs to be mounted separately to preserve data so that the container can be recreated. We could use a bind mount like previously, but this time let's create a separete volume for the data:

version: '3.4'

services:
    mysql:
      image: mysql
      restart: unless-stopped
      environment:
        - MYSQL_ROOT_PASSWORD=example
      volumes:
        - mysql-data:/var/lib/mysql

volumes:
    mysql-data:
$ docker-compose up
  Creating network "wordpress_default" with the default driver
  Creating volume "wordpress_mysql-data" with default driver
  Creating wordpress_mysql_1 ...
  Creating wordpress_mysql_1 ... done
  Attaching to wordpress_mysql_1
  mysql_1  | Initializing database
  ...
  mysql_1  | 2018-02-01T19:48:20.660859Z 0 [Warning] 'tables_priv' entry 'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode.
  mysql_1  | 2018-02-01T19:48:20.664811Z 0 [Note] Event Scheduler: Loaded 0 events
  mysql_1  | 2018-02-01T19:48:20.665236Z 0 [Note] mysqld: ready for connections.
  mysql_1  | Version: '5.7.21'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)

The image initializes the data files in the first start. Let's terminate the container with ^C

^CGracefully stopping... (press Ctrl+C again to force)
Stopping wordpress_mysql_1 ... done

Compose uses the current directory as a prefix for container and volume names so that different projects don't clash. The prefix can be overriden with COMPOSE_PROJECT_NAME environment variable if needed.

Now when the MySQL is running, let's add the actual Wordpress. The container seems to require just two environment variables.

    wordpress:
      image: 'wordpress:4.9.1-php7.1-apache'
      environment:
        - WORDPRESS_DB_HOST=mysql
        - WORDPRESS_DB_PASSWORD=example
      ports:
        - '9999:80'
      depends_on:
        - mysql

We also declare that mysql service should be started first and that the container will link to it - The MySQL server is accessible with dns name "mysql" from the Wordpress service.

Now when you run it:

$ docker-compose up -d
$ docker-compose logs wordpress
  Attaching to wordpress_wordpress_1
  wordpress_1  | WordPress not found in /var/www/html - copying now...
  wordpress_1  | Complete! WordPress has been successfully copied to /var/www/html
  ...

We see that Wordpress image creates files in startup at /var/www/html that also needs to be persisted. The Dockerfile has this line https://github.com/docker-library/wordpress/blob/6a085d90853b8baffadbd3f0a41d6814a2513c11/php7.1/apache/Dockerfile#L44 where it declares that a volume should be created. Docker will create the volume, but it will be handled as a anonymous volume that is not managed by compose, so it's better to be explicit about the volume. With that in mind our final file should look like this:

version: '3.4'

services:
    mysql:
      image: mysql
      restart: unless-stopped
      environment:
        - MYSQL_ROOT_PASSWORD=example
      volumes:
        - mysql-data:/var/lib/mysql
    wordpress:
      image: 'wordpress:4.9.1-php7.1-apache'
      environment:
        - WORDPRESS_DB_HOST=mysql
        - WORDPRESS_DB_PASSWORD=example
      volumes:
        - wordpress-data:/var/www/html
      ports:
        - '9999:80'
      depends_on:
        - mysql
volumes:
    mysql-data:
    wordpress-data:

Now open and configure the installation at http://localhost:9999

We can inspect the changes that happened in the image and ensure that no extra meaningful files got written to the container:

$ docker diff $(docker-compose ps -q wordpress)
C /run/apache2
A /run/apache2/apache2.pid
C /run/lock/apache2
C /tmp

Since plugins and image uploads will by default write to local disk at /var/www/html, this means that Wordpress can not be scaled in a real production deployment on multiple machines without somehow sharing this path. Some possible solutions:

- shared filesystem like NFS or AWS EFS
- Something like https://www.gluster.org/ or http://ceph.com/
- Two-way syncing daemon like https://www.cis.upenn.edu/~bcpierce/unison/index.html, https://syncthing.net/ or https://www.resilio.com) - see http://blog.kontena.io/how-to-build-high-availability-wordpress-site-with-docker/
- User space FUSE solutions like https://github.com/kahing/goofys or https://github.com/googlecloudplatform/gcsfuse
- See https://lemag.sfeir.com/wordpress-cluster-docker-google-cloud-platform/

Backups and restore

We can test backing up:

$ docker-compose exec mysql mysqldump wordpress -uroot -pexample | less

Where we see that the first line is unexpected:

mysqldump: [Warning] Using a password on the command line interface can be insecure.

This is because docker-compose's exec has a bug docker/compose#5207 where STDERR gets printed to STDOUT.. As a workaround we can skip docker-compose

$ docker exec -i $(docker-compose ps -q mysql) mysqldump wordpress -uroot -pexample > dump.sql
  mysqldump: [Warning] Using a password on the command line interface can be insecure.

Now STDERR is correctly printed to the terminal.

$ docker-compose down
  Stopping wordpress_wordpress_1 ... done
  Stopping wordpress_mysql_1     ... done
  Removing wordpress_wordpress_1 ... done
  Removing wordpress_mysql_1     ... done
  Removing network wordpress_default

As our volumes are managed separately in docker-compose, that command didn't remove our volumes to prevent mistakes.

$ docker-compose down --volumes
  Removing network wordpress_default
  WARNING: Network wordpress_default not found.
  Removing volume wordpress_mysql-data
  Removing volume wordpress_wordpress-data

Then start the mysql service again (with fresh volumes) without the wordpress service

$ docker-compose up -d mysql

Since the dumping with docker-compose exec did not work, let's see if importing would:

$ docker-compose exec mysql mysql -uroot -pexample < dump.sql
  mysql: [Warning] Using a password on the command line interface can be insecure.
  Traceback (most recent call last):
    File "docker-compose", line 6, in <module>
    File "compose/cli/main.py", line 71, in main
    File "compose/cli/main.py", line 124, in perform_command
    File "compose/cli/main.py", line 467, in exec_command
    File "site-packages/dockerpty/pty.py", line 338, in start
    File "site-packages/dockerpty/io.py", line 32, in set_blocking
  ValueError: file descriptor cannot be a negative integer (-1)
  Failed to execute script docker-compose

...and no, because of another bug in docker/compose#3352 - we'll bypass compose again with:

$ docker exec -i $(docker-compose ps -q mysql) mysql -uroot -pexample wordpress < dump.sql

And then start the wordpress:

$ docker-compose up -d wordpress

And our old site is back!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment