Skip to content

Instantly share code, notes, and snippets.

@kekru
Last active February 28, 2024 05:44
Show Gist options
  • Save kekru/8ac61cd87536a4355220b56ae2f4b0a9 to your computer and use it in GitHub Desktop.
Save kekru/8ac61cd87536a4355220b56ae2f4b0a9 to your computer and use it in GitHub Desktop.
Dockerfile: Stabilize build cache for COPY command, between different machines

Dockerfile: Remote build cache optimization for COPY (on Windows)

With Docker (especially with Buildkit) you have the ability to share your images as build cache for other computers.

When running the following docker build command, Buildkit will download cache information from the images, referenced with --cache-from.

export IMAGE="my-registry.example.com/myproject/myapp:feature-1234"
export IMAGE_LATEST="my-registry.example.com/myproject/myapp:latest"
export DOCKER_BUILDKIT=1
docker build . \
  -t ${IMAGE} \
  --cache-from ${IMAGE} \
  --cache-from ${IMAGE_LATEST} \
  --build-arg BUILDKIT_INLINE_CACHE=1
docker push ${IMAGE}

If another machine has already build this Dockerfile and pushed the image to ${IMAGE} or ${IMAGE_LATEST}, then the current machine will not build again.
Instead, it will just download the layers from the referenced Docker registry.

However, there could be problems with the COPY or ADD command, between machines, especially Windows machines.

Example: You have this Dockerfile

FROM ubuntu:20.04
RUN apt-get update && apt-get --no-install-recommends -y install \
    curl \
 && rm -rf /var/lib/apt/lists/*
COPY myfile.txt /
...

When trying to find a build cache for the COPY command, Buildkit will check if my-file.txt is the same as in the previous build, which created the cache.

my-file.txt ist considered different, when one of the following has changed:

  • content of the file
  • file permissions
  • owner or group
  • timestamps (Created at, Modified at)
  • SELinux permissions
  • Maybe more ??

So if one of these changed, then the cache is invalidated and the build will run again from this step.
This is mostly not a problem, when running on Linux.
But when building from a Windows client, the cache is invalidated verry often, because there are no Linux file permissions in Windows.
In my setup, I also had different timestamps on Windows, after checking out the same git repo, as on my Linux machine.

To solve this problem, you could add a small file cleanup stage in your Dockerfile:

FROM alpine:3.12.0 as file-loader
COPY myfile.txt /data/
# set mod, owner and timestamps to a fixed value
# so it doesnt depend on what Windows does
RUN chmod -R 555 /data \
 && chown -R root:root /data \
 && find /data -exec touch -a -m -t 201512180130.09 {} \;

FROM ubuntu:20.04
RUN apt-get update && apt-get --no-install-recommends -y install \
    curl \
 && rm -rf /var/lib/apt/lists/*
COPY --from=file-loader /data/myfile.txt /
...

Now we have a stage file-loader, which is cleaning up the file attributes, so it doesn't depend on Windows' behaviour.
The file-loader stage is NOT cached between machines - thats okay, because it builds always very fast.

Now the seconds build stage can be cached correctly between Linux and Windows machines

@jaredlockhart
Copy link

Thank you @kekru this was very helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment