obeleh/about_docker_add.md

## about_docker_add.md

      
    Raw
  

              about_docker_add.md
            
          
    About docker ADD...

Recently I was asked to review parts of an automated tests PR that contained a Dockerfile. And even after using Docker for about 6 years I learned something new when I found this SO question

Because image size matters, using ADD to fetch packages from remote URLs is strongly discouraged; you should use curl or wget instead. That way you can delete the files you no longer need after they’ve been extracted and you won’t have to add another layer in your image. For example, you should avoid doing things like:

ADD http://example.com/big.tar.xz /usr/src/things/
RUN tar -xJf /usr/src/things/big.tar.xz -C /usr/src/things
RUN make -C /usr/src/things all

And instead, do something like:

RUN mkdir -p /usr/src/things \
    && curl -SL http://example.com/big.tar.xz \
    | tar -xJC /usr/src/things \
    && make -C /usr/src/things all

In addition, ADD has a major advantage over RUN wget: it detects when its target has changed.
Am I missing something, or do multi-stage builds rehabilitate ADD?

My worry with Docker sometimes is that I've been using it for so long that I'm using it in an old arcane way. Remember these are internet years we're talking about. I remember a time where I still used docker commit. Plz avoid docker commit because you will get into irreproducible builds. Another thing you learn along the way is to use Docker copy in favour of add. Read the "why" in this article about The Difference between COPY and ADD in a Dockerfile
The SO Question makes a good point. We're so used to making our layers small this has become a common habit. If someone would have come up to me and asked me why I do it I would say it was because you want to keep the number of layers to a minimum. But the honest answer is "because I've always been doing it that way". Because who cares if you have 20 layers? It doesn't hurt performance anymore. And as the article states, you can also minimize the number of layers by using multistage builds.
Because I refrained from ADD, what I didn't know is that it will download the file and then check if the file is different. If it's the same, it will use the cache layers you already had. This detail wasn't clear to me before I did some tests on my machine. Because I thought it did some fancy cache detection with headers like If-None-Match but that doesn't appear to be the case.
Now I'm slowly gravitating back to docker ADD I think it's more clear. The curl-ing hides your intent. These are the considerations I now use to choose what to do.

adding files from your repo: COPY
adding files from a URL. And you want a new image if the file changes?: ADD
adding files from a URL. And you don't really care: RUN curl && extract && rm tar

Thanks for reading and I hope you also learned something ;)