Skip to content

Instantly share code, notes, and snippets.

@UrsaDK
Last active March 28, 2024 07:16
Show Gist options
  • Star 33 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save UrsaDK/f90c9632997a70cfe2a6df2797731ac8 to your computer and use it in GitHub Desktop.
Save UrsaDK/f90c9632997a70cfe2a6df2797731ac8 to your computer and use it in GitHub Desktop.
Speed up your multistage builds in GitHub Actions

Caching multi-stage builds in GitHub Actions

Caching Docker builds in GitHub Actions is an excellent article by @dtinth which analyses various strategies for speeding up builds in GitHub Actions. The upshot of the article is a fairly decisive conclusion that the best two ways to improve build times are:

  1. Build images via a standard docker build command, while using GitHub Packages' Docker registry as a cache = Longer initial build but fastest re-build times.

  2. Build your images via docker integrated BuildKit (DOCKER_BUILDKIT=1 docker build), while using a local registry and actions/cache to persist build caches = Fastest initial build but slightly longer re-build times.

The problem

Unfortunately, when trying to implement these solutions in my own project, I discovered that both approaches fall short when it comes to multi-stage builds.

The problem is that inlined build cache only includes layers directly involved in the creation of the image. It excludes caches from various stages in my Dockerfile which are copied into the final image using

COPY --from=...

Using the above strategies did improve my image re-building time. It has gone down from ~ 3m 15s to about 1m 50s. However, this improvement was not as much as I would have liked...

The solution

Luckily, the guys behind docker integrated BuildKit have a standalone tool for building docker images. The tool comes in two parts: buildkitd - a build daemon, and buildctl - a controller for the build daemon.

The features that I was most interested in, is the ability of BuildKit controller (buildkitctl) to export/import a full set of build caches, including caches for all multi-stage layers. Combining this ability with actions/cache gave me a nearly perfect solution for speeding up multi-stage docker builds in Github Actions.

Final time to re-build the image has now been reduced from ~ 3m 15s to ~ 38s!


PS: Interestingly, BuildKit daemon (buildkitd) can run either locally or remotely. This opens up the possibility of hosting a standalone build process on your own infrastructure, which might further improve building time.

name: CI pipeline
on:
push:
branches: [master]
pull_request:
defaults:
run:
shell: bash
jobs:
build:
runs-on: ubuntu-latest
env:
DOCKER_IMAGE: ci/${{ github.job }}
BUILD_CACHE: /home/runner/.docker/buildkit
steps:
- uses: actions/checkout@v2
- uses: actions/cache@v1
with:
path: ${{ env.BUILD_CACHE }}
key: ${{ hashFiles('Dockerfile') }}
- name: Install the latest buildkit release
run: |
BUILDKIT_URL="$(curl -sL https://api.github.com/repos/moby/buildkit/releases \
| jq -r 'map(select((.name|startswith("v")) and (.name|contains("rc")|not)))|sort_by(.published_at)[-1].assets[]|select(.name|endswith(".linux-amd64.tar.gz")).browser_download_url')"
curl -L "${BUILDKIT_URL}" | sudo tar -xz -C /usr/local
- name: Start buildkit daemon
run: |
sudo --non-interactive --shell <<END_SUDO
install -d -m 0750 -o root -g docker /run/buildkit
buildkitd &
while ! test -S /run/buildkit/buildkitd.sock; do sleep 0.1; done
chgrp docker /run/buildkit/buildkitd.sock
END_SUDO
- name: Build docker image
run: |
buildctl build \
--frontend=dockerfile.v0 --local dockerfile=. --local context=. \
--export-cache type=local,dest=${BUILD_CACHE},mode=max \
--import-cache type=local,src=${BUILD_CACHE} \
--output type=docker,name=${DOCKER_IMAGE} | docker load
echo "Cache size: $(du -sh ${BUILD_CACHE})"
- name: Launch a container based on the new image (example)
run: ./bin/docker run --rm ${DOCKER_IMAGE} ...
@epicserve
Copy link

@UrsaDK,

However, if your Dockerfile doesn't change between branches then you can just reuse the cache from the old branch.

I don't think what you're saying is entirely true because if the files that are in your COPY commands change then that invalidates your cache for that layer and a new layer is created. So we do have files that change between branches, the files in the ./src directory for example that are used for building our Javascript assets.

Thank you for your advice. I ended up going with this approach which is working well.
https://dev.to/pst418/speed-up-multi-stage-docker-builds-in-ci-cd-with-buildkit-s-registry-cache-11gi

@UrsaDK
Copy link
Author

UrsaDK commented Feb 25, 2021

@epicserve, I'm glad you found a solution that works! I'm a little bit swamped with work at the moment, so my apologies for not getting back you earlier.

For the benefits of those who might find this discussion in the future, let me address some of the points you raised:

... if the files that are in your COPY commands change then that invalidates your cache for that layer ...

Technically, you're absolutely right. However, cache invalidation via COPY command is a bit of an edge case scenario for the way docker handles caches: during the cache lookup, as a result of either COPY or ADD command, docker calculates a checksum for each file and compares it against the checksum in the existing image. If anything has changed, then the current, as well as succeeding layers of cache, are invalidated. In all other cases cache checking only looks at the command string itself, as used inside the Dockerfile.

This is such a fundamental piece of behaviour that everyone I know covers both of these scenarios with a simple "if your Dockerfile hasn't changed".

... we do have files that change between branches, the files in the ./src directory for example ...

Yes, that is a very common requirement. In fact, more complex projects will have multiple such directories, which need to be copied into the image during the build. There are two common strategies for improving performance here:

  1. Try to put the COPY command as far towards the end of your Dockerfile as possible. This will allow you to reuse the majority of the cache preceding the COPY command.

  2. Use multistage builds, and if you do, then enabling BuildKit. This could allow you to efficiently skip unused stages during the build, as well as build stages concurrently when possible.

@dmtpln
Copy link

dmtpln commented Mar 23, 2021

Caching not working now, because it's not rewriting on new builds. It remains always deprecated

You can change actions/cache to v2 and add restore-keys

- uses: actions/cache@v2
   with:
       path: ${{ env.BUILD_CACHE }}
       key: ${{ runner.os }}-buildkit-${{ github.sha }}
       restore-keys: |
           ${{ runner.os }}-buildkit-

It worked for me.

@jshbrntt
Copy link

You can also set --build-arg BUILDKIT_INLINE_CACHE=1 during the docker build this will add the cache metadata to the built image which when pushed an pulled from a registry will allow you to cache the layers of the multi-stage build.

https://docs.docker.com/engine/reference/commandline/build/#specifying-external-cache-sources

@gilcu2
Copy link

gilcu2 commented Aug 4, 2021

Worked for me taking a hash of the directories of interest as key of the cache and restoring the previous cache if the key change.

jobs:
  test:
    runs-on: ubuntu-latest
     env:
       BUILD_CACHE: /home/runner/.docker/buildkit

     steps:
       - name: Checkout code
         uses: actions/checkout@v2
       - name: Compute cache key
         run: echo "CACHE_KEY=$(git ls-files -s dir1 dir2 | git hash-object --stdin)" >> $GITHUB_ENV
       - name: Cache docker layers
         uses: actions/cache@v1
         with:
           path: ${{ env.BUILD_CACHE }}
           key: cache-${{ env.CACHE_KEY }}
           restore-keys: |
             cache-

...

Thanks @UrsaDK

@dantonyuk
Copy link

There is a minor error in sorting buildkit releases by version. It sorts it alphabetically rather than by numbers. Also, it takes into account release candidates. So, it could find 0.9-rc before 0.10. To fix that we could sort by the publishing date, and filter out all the release candidates:

BUILDKIT_URL="$(curl -sL https://api.github.com/repos/moby/buildkit/releases \
            | jq -r 'map(select((.name|startswith("v")) and (.name|contains("rc")|not)))|sort_by(.published_at)[-1].assets[]|select(.name|endswith(".linux-amd64.tar.gz")).browser_download_url')"

@UrsaDK
Copy link
Author

UrsaDK commented Feb 24, 2023

Nice one! Thanks for this @dantonyuk 👍 I've updated the gist to include your fix. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment