Skip to content

Instantly share code, notes, and snippets.

@obeleh
Last active October 30, 2019 08:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save obeleh/336a343700dedb516ffe59ec4598bf3d to your computer and use it in GitHub Desktop.
Save obeleh/336a343700dedb516ffe59ec4598bf3d to your computer and use it in GitHub Desktop.

About docker ADD...

Recently I was asked to review parts of an automated tests PR that contained a Dockerfile. And even after using Docker for about 6 years I learned something new when I found this SO question

Because image size matters, using ADD to fetch packages from remote URLs is strongly discouraged; you should use curl or wget instead. That way you can delete the files you no longer need after they’ve been extracted and you won’t have to add another layer in your image. For example, you should avoid doing things like:

ADD http://example.com/big.tar.xz /usr/src/things/
RUN tar -xJf /usr/src/things/big.tar.xz -C /usr/src/things
RUN make -C /usr/src/things all

And instead, do something like:

RUN mkdir -p /usr/src/things \
    && curl -SL http://example.com/big.tar.xz \
    | tar -xJC /usr/src/things \
    && make -C /usr/src/things all

In addition, ADD has a major advantage over RUN wget: it detects when its target has changed. Am I missing something, or do multi-stage builds rehabilitate ADD?

My worry with Docker sometimes is that I've been using it for so long that I'm using it in an old arcane way. Remember these are internet years we're talking about. I remember a time where I still used docker commit. Plz avoid docker commit because you will get into irreproducible builds. Another thing you learn along the way is to use Docker copy in favour of add. Read the "why" in this article about The Difference between COPY and ADD in a Dockerfile

The SO Question makes a good point. We're so used to making our layers small this has become a common habit. If someone would have come up to me and asked me why I do it I would say it was because you want to keep the number of layers to a minimum. But the honest answer is "because I've always been doing it that way". Because who cares if you have 20 layers? It doesn't hurt performance anymore. And as the article states, you can also minimize the number of layers by using multistage builds.

Because I refrained from ADD, what I didn't know is that it will download the file and then check if the file is different. If it's the same, it will use the cache layers you already had. This detail wasn't clear to me before I did some tests on my machine. Because I thought it did some fancy cache detection with headers like If-None-Match but that doesn't appear to be the case.

Now I'm slowly gravitating back to docker ADD I think it's more clear. The curl-ing hides your intent. These are the considerations I now use to choose what to do.

  • adding files from your repo: COPY
  • adding files from a URL. And you want a new image if the file changes?: ADD
  • adding files from a URL. And you don't really care: RUN curl && extract && rm tar

Thanks for reading and I hope you also learned something ;)

@ariejan
Copy link

ariejan commented Oct 30, 2019

👍 Nice write-up. You might want to blockquote relevant parts of the stack overflow question, as it's a very relevant part of your story. Now I have to click away from the article if I want to undertand it fully.

@pascalw
Copy link

pascalw commented Oct 30, 2019

diff --git a/about_docker_add.md b/about_docker_add.md
index 800cdc7..354c98b 100644
--- a/about_docker_add.md
+++ b/about_docker_add.md
@@ -1,12 +1,12 @@
 # About docker ADD...
 
-Recently I was asked to review parts of an automated tests PR that contained a Docker file. And even after using Docker for about 6 years I learned something new when I found this [SO question](https://stackoverflow.com/questions/47726176/multi-stage-docker-run-wget-vs-add)
+Recently I was asked to review parts of an automated tests PR that contained a `Dockerfile`. And even after using Docker for about 6 years I learned something new when I found this [SO question](https://stackoverflow.com/questions/47726176/multi-stage-docker-run-wget-vs-add)
 
-My worry with docker sometimes is that I've been using it for so long that I'm using it in an old arcane way. Remember these are internet years we're talking about. I remember a time where I still used [docker commit](https://docs.docker.com/engine/reference/commandline/commit/). Plz avoid docker commit because you will get into irreproducible builds. Another thing you learn along the way is to use Docker [copy](https://docs.docker.com/engine/reference/builder/#copy) in favour of [add](https://docs.docker.com/engine/reference/builder/#add). Read the "why" in this article about [The Difference between COPY and ADD in a Dockerfile](https://nickjanetakis.com/blog/docker-tip-2-the-difference-between-copy-and-add-in-a-dockerile)
+My worry with Docker sometimes is that I've been using it for so long that I'm using it in an old arcane way. Remember these are internet years we're talking about. I remember a time where I still used [docker commit](https://docs.docker.com/engine/reference/commandline/commit/). Plz avoid `docker commit` because you will get into irreproducible builds. Another thing you learn along the way is to use Docker [copy](https://docs.docker.com/engine/reference/builder/#copy) in favour of [add](https://docs.docker.com/engine/reference/builder/#add). Read the "why" in this article about [The Difference between COPY and ADD in a Dockerfile](https://nickjanetakis.com/blog/docker-tip-2-the-difference-between-copy-and-add-in-a-dockerile)
 
 The SO Question makes a good point. We're so used to making our layers small this has become a common habit. If someone would have come up to me and asked me why I do it I would say it was because you want to keep the number of layers to a minimum. But the honest answer is "because I've always been doing it that way". Because who cares if you have 20 layers? It doesn't hurt performance anymore. And as the article states, you can also minimize the number of layers by using [multistage builds](https://docs.docker.com/develop/develop-images/multistage-build/). 
 
-Because I refrained from `ADD`, what I didn't know is that it will download the file and then check if the file is different. If it's the same. It will use the cache layers you already had. This detail wasn't clear to me before I did some tests on my machine. Because I thought it did some fancy cache detection with headers like [If-None-Match](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match) but that doesn't appear to be the case.
+Because I refrained from `ADD`, what I didn't know is that it will download the file and then check if the file is different. If it's the same, it will use the cache layers you already had. This detail wasn't clear to me before I did some tests on my machine. Because I thought it did some fancy cache detection with headers like [If-None-Match](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match) but that doesn't appear to be the case.
 
 Now I'm slowly gravitating back to `docker ADD` I think it's more clear. The `curl`-ing hides your intent. These are the considerations I now use to choose what to do.

@pascalw
Copy link

pascalw commented Oct 30, 2019

Would it be wrong to just always ADD? Does it hurt to use ADD to add files from your repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment