rheinwein/image-cache-edited.md

## image-cache-edited.md

      
    Raw
  

              image-cache-edited.md
            
          
    Working With the Docker Image Cache

When building Docker images, you'll likely need to do at least a few iterations on your Dockerfile before you get everything perfect. If you know how to use the local image cache to your advantage it can significantly speed up the Dockerfile code/build/test cycle.
In this article I'll discuss how the Docker image cache works and then give you some tips for using it effectively.
Caching Image Layers

Each instruction in your Dockerfile results in a new image layer being created and added to your local image cache. That image then becomes the parent for the image created by the next instruction (see my previous article for a detailed explanation of the image creation process). Let's look at an example:
FROM debian:wheezy

MAINTAINER brian.dehamer@centurylink.com
RUN apt-get update && apt-get install -y vim

ENTRYPOINT 'vim'

If we docker build this Dockerfile and inspect the local image cache we'll see something like this:
$ docker images --tree
Warning: '--tree' is deprecated, it will be removed soon. See usage.
└─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest
  └─59e359cb35ef Virtual Size: 85.18 MB
    └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy
      └─c58b36b8f285 Virtual Size: 85.18 MB
        └─90ea6e05b074 Virtual Size: 118.6 MB
          └─5dc74cffc471 Virtual Size: 118.6 MB Tags: vim:latest

The FROM instruction in our Dockerfile corresponds to the image layer tagged with debian:wheezy. The three child layers shown underneath that correspond to the other three instructions from our Dockerfile.
Another way to look at this is with the docker history command:
$ docker history vim
IMAGE         CREATED         CREATED BY                              SIZE
5dc74cffc471  15 minutes ago  /bin/sh -c #(nop) ENTRYPOINT [/bin/sh   0 B
90ea6e05b074  15 minutes ago  /bin/sh -c apt-get update && apt-get    33.41 MB
c58b36b8f285  15 minutes ago  /bin/sh -c #(nop) MAINTAINER brian.de   0 B
e8d37d9e3476  2 weeks ago     /bin/sh -c #(nop) CMD [/bin/bash]       0 B
59e359cb35ef  2 weeks ago     /bin/sh -c #(nop) ADD file:1e2ba3d937   85.18 MB
511136ea3c5a  13 months ago  

With this view, the order is reversed (the child image appears before the parent) but you do get to see the Dockerfile instruction that was responsible for generating each layer.
After you've successfully built an image from your Dockerfile you should notice that subsequent builds of the same Dockerfile finish significantly faster. Once Docker caches an image layer for an instruction it doesn't need to be rebuilt.
Let's look at an example for the Dockerfile above. We'll run the docker build command with the time utility so that we can see how long the initial build takes to complete.
$ time docker build -q -t vim .
Sending build context to Docker daemon  2.56 kB
Sending build context to Docker daemon 
Step 0 : FROM debian:wheezy
 ---> e8d37d9e3476
Step 1 : MAINTAINER brian.dehamer@centurylink.com
 ---> Running in 6b08074996d3
 ---> c58b36b8f285
Removing intermediate container 6b08074996d3
Step 2 : RUN apt-get update && apt-get install -y vim
 ---> Running in ef1603171a30
 ---> 90ea6e05b074
Removing intermediate container ef1603171a30
Step 3 : ENTRYPOINT 'vim'
 ---> Running in b3e0ad883ec5
 ---> 5dc74cffc471
Removing intermediate container b3e0ad883ec5
Successfully built 5dc74cffc471

real  0m21.917s
user  0m0.003s
sys   0m0.005s

You can see that it took 21 seconds to complete the build. Our example here is fairly trivial but it's not uncommon for builds to take several minutes once you start adding more instructions to your Dockerfile.
If we immediately execute the same instruction again, we should see something like this:
$ time docker build -q -t vim .
Sending build context to Docker daemon  2.56 kB
Sending build context to Docker daemon 
Step 0 : FROM debian:wheezy
 ---> e8d37d9e3476
Step 1 : MAINTAINER brian.dehamer@centurylink.com
 ---> Using cache
 ---> c58b36b8f285
Step 2 : RUN apt-get update && apt-get install -y vim
 ---> Using cache
 ---> 90ea6e05b074
Step 3 : ENTRYPOINT 'vim'
 ---> Using cache
 ---> 5dc74cffc471
Successfully built 5dc74cffc471

real  0m0.032s
user  0m0.003s
sys   0m0.002s

Note how each instruction was followed by the "Using cache" message, and the total build time dropped from 21 seconds to less than a second. Since we didn't change anything between the two builds there was really nothing for Docker to do -- everything was already in the cache.
Cache Invalidation

As Docker is processing your Dockerfile to determine whether a particular image layer is already cached, it looks at two things: the instruction being executed and the parent image.
Docker will scan all of the children of the parent image and looks for one whose command matches the current instruction. If a match is found, Docker skips to the next instruction and repeats the process.
If a matching image is not found in the cache, a new image is created.
Since the cache relies on both the instruction being executed and the image generated from the previous instruction, it should come as no surprise that changing any instruction in the Dockerfile will invalidate the cache for all of the instructions that follow it. Invalidating an image also invalidates all the children of that image.
Let's make a change to our Dockerfile and see how it impacts the local image cache. We'll update the apt-get install instruction to install Emacs in addition to Vim:
FROM debian:wheezy

MAINTAINER brian.dehamer@centurylink.com
RUN apt-get update && apt-get install -y vim emacs

ENTRYPOINT 'vim'

Let's build our new image and see what happens:
$ time docker build -q -t vim .
Sending build context to Docker daemon  2.56 kB
Sending build context to Docker daemon 
Step 0 : FROM debian:wheezy
 ---> e8d37d9e3476
Step 1 : MAINTAINER brian.dehamer@centurylink.com
 ---> Using cache
 ---> c58b36b8f285
Step 2 : RUN apt-get update && apt-get install -y vim emacs
 ---> Running in 33824d0f33ff
 ---> d6e06afe57c5
Removing intermediate container 33824d0f33ff
Step 3 : ENTRYPOINT 'vim'
 ---> Running in 27b1ae56612d
 ---> c3bf7baa3f34
Removing intermediate container 27b1ae56612d
Successfully built c3bf7baa3f34

real  2m1.511s
user  0m0.003s
sys   0m0.005s

Since we didn't alter the MAINTAINER instruction, it was found in the cache and used as-is. However, because we did edit the apt-get line, it resulted in a completely new image layer being created.
Furthermore, even though we didn't change the ENTRYPOINT instruction at all, its layer also had to be rebuilt since its parent image changed.
If we look at the image tree again we can see the two new layers that we created alongside the layers that were generated from the previous version of our Dockerfile:
$ docker images --tree
Warning: '--tree' is deprecated, it will be removed soon. See usage.
└─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest
  └─59e359cb35ef Virtual Size: 85.18 MB
    └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy
      └─c58b36b8f285 Virtual Size: 85.18 MB
        └─d6e06afe57c5 Virtual Size: 320.5 MB
        └─c3bf7baa3f34 Virtual Size: 320.5 MB Tags: vim:latest
        └─90ea6e05b074 Virtual Size: 118.6 MB
          └─5dc74cffc471 Virtual Size: 118.6 MB

Note that the layer for the MAINTAINER instruction (c58b36b8f285) remained the same, but it now has two children. The layers generated from the previous version of our Dockerfile are still in the cache it's just that they are no longer part of the tree tagged as vim:latest.
Tips

Now that we know how the image cache works, let's discuss some strategies for making the most of it when working on your own Dockerfile.
Top-to-Bottom

This one should be pretty obvious by now, but as you're iterating on your Dockerfile you should try and keep the stable parts toward the top and make your additions at the bottom.
If you know you need to install a bunch of OS packages in your image (which is typically one of the slower parts of building an image) put your package installation instructions toward the top of the Dockerfile. That way you only need to sit through the installation process for those packages once as you go through the code/build/test/repeat cycle for your image.
Similarly, if you have a core set of instructions that you use across all of your images (like a MAINTAINER value you always use), it's best to keep those at the top of your Dockerfile and always in the same order. That way those cached layers can be shared between different images.
The Build Context

When executing docker build the first line of output typically reads "Sending build context to Docker daemon . . ." The build context constitutes everything in your build directory (the directory that you pass to the docker build command) and is used by Docker so that you can inject local files into your image using the ADD and COPY instructions.
This is the one place where the caching rules change slightly -- in addition to looking at the instruction and the parent image, Docker will also check to see if the file(s) being copied have changed.
Let's create a simple Dockerfile that uses ADD to copy a file into our image:
FROM debian:wheezy
ADD README.md /tmp/

Now let's docker build the image:
$ docker build -q -t readme .
Sending build context to Docker daemon 3.584 kB
Sending build context to Docker daemon 
Step 0 : FROM debian:wheezy
 ---> e8d37d9e3476
Step 1 : ADD README.md /tmp/
 ---> 09eabce38f39
Removing intermediate container 3e44a3b6eabe
Successfully built 09eabce38f39

If we were to execute the docker build again we'd see that no new images are created since we haven't changed anything.
However, let's update the README.md file and then build again:
$ touch README.md 

$ docker build -q -t readme .
Sending build context to Docker daemon 3.584 kB
Sending build context to Docker daemon 
Step 0 : FROM debian:wheezy
 ---> e8d37d9e3476
Step 1 : ADD README.md /tmp/
 ---> 03057a46a5c7
Removing intermediate container 989edbcf38ae
Successfully built 03057a46a5c7

Note that a new image was generated for the ADD instruction this time (compare the image ID here to the one from the previous run). We didn't change anything inside the Dockerfile, but we did update the timestamp on the README.md file itself.
For the most part, this is exactly the behavior we want when building images. If the file changes in some way, you would expect that the next build of the image would incorporate the changes to that file. However, things get a bit trickier when you start adding lots of files at once.
A common pattern is to inject an application's entire codebase into an image using an instruction like:
ADD . /opt/myapp

In this case we're injecting the entire build context into the image. If any single file changes in the entire build context, it will invalidate the cache and a new image layer will be generated on the next build.
If your build directory happens to include things like log files or test reports that are updated frequently you may find that you're getting new image layers generated with every single docker build. You could work-around this by specifically ADD ing ONLY those files which are necessary for your application but if you have many files spread across a number of directories this can be pretty tedious.
Luckily, Docker has a better solution in the form of the .dockerignore file. In much the same way that the .gitignore file works, the .dockerignore file allows you to specify a list of exclusion patterns. Any files/directories matching those patterns will be excluded from the build context.
If you have files in your build directory that change often and are not required by your image, you should consider adding them to .dockerignore file. A good rule of thumb is that anything in your .gitignore is a good candidate for inclusion in your .dockerignore.
One Catch-22 related to the use of ADD . is that the Dockerfile itself is also part of the build context -- so any changes you make to the Dockerfile result in a change to the build context, and you can't add the Dockerfile to the .dockerignore file because it needs to be part of the build context in order for Docker to read the build instructions. If you're using ADD . and making changes to your Dockerfile, don't be surprised to see new image layers generated every time you do a build.
Bust the Cache

For the most part, the image cache is incredibly helpful and can save you a lot of time while building your images. However, there are times when the caching can bite you if you aren't paying attention, so it's good to know how to selectively bust the cache.
Let's say we have a Dockerfile which contains the following:
RUN git clone https://github.com/bdehamer/dot_files.git
WORKDIR /dot_files
RUN git checkout v1.0.0

When I build this the first time, I'm going to get exactly what I expect -- it'll clone my Git repo and checkout the v1.0.0 tag.
Now imagine I push some changes to my repo and tag it as v1.1.0. I'm going to update the Dockerfile to reference the new tag:
RUN git clone https://github.com/bdehamer/dot_files.git
WORKDIR /dot_files
RUN git checkout v1.1.0

When I go to build the image from the updated Dockerfile I get the following error:
. . .
Step 7 : RUN git clone https://github.com/bdehamer/dot_files.git
 ---> Using cache
 ---> 104e2ed02220
Step 8 : WORKDIR /dot_files
 ---> Using cache
 ---> 7d120a36b1a5
Step 9 : RUN git checkout v1.1.0
 ---> Running in 86dd626440ac
error: pathspec 'v1.1.0' did not match any file(s) known to git.
2014/08/05 20:26:11 The command [/bin/sh -c git checkout v1.1.0] returned a non-zero code: 1

I definitely pushed a v1.1.0 tag to my repo, yet Git is telling me that no such tag is found.
This is one of those times where the Docker image cache is being a little too helpful. In the output above note how the git clone step had already been cached from our previous build and was re-used in this run. When we get to the git checkout instruction we're still using a copy of the repo that doesn't have a v1.1.0 tag.
This is quite different from the example with the build context above. In this case the contents of the git repo are not part of the build context -- as far as Docker is concerned, our git clone is just another instruction that happens to match one that already exists in the cache.
The brute-force solution here is to simply run docker build with the --no-cache flag and force it to re-create all the layers. While that will work, it doesn't allow us to take advantage of any earlier instructions in the Dockerfile that were just fine to be pulled from the cache.
A better approach is to refactor our Dockerfile a bit to ensure that any future changes to the tag will force a fresh git clone as well:
WORKDIR /dot_files
RUN git clone https://github.com/bdehamer/dot_files.git . && \
  git checkout v1.1.0

Now we've combined the git clone and git checkout into a single instruction in the Dockerfile. If we later edit the file to change the tag reference it will invalidate the cache for that layer and we'll get a fresh clone when the new layer is generated.
Note also that I moved the WORKDIR instruction so that the directory would be created before the cloning the repo.  Then, by cloning into the current directory (note that . after the repo's URL), I was able to execute my clone and checkout without needing to switch directories in-between
When building images based-off of Debian/Ubuntu you'll often see this same pattern applied to installing OS packages:
RUN apt-get update && apt-get install -y vim=2:7.3*

Here the apt-get update is like the git clone in the previous example -- we want to ensure that we've got access to all the latest packages anytime we add another package or update the version of the vim.