Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Speed up your multistage builds in GitHub Actions

Caching multi-stage builds in GitHub Actions

Caching Docker builds in GitHub Actions is an excellent article by @dtinth which analyses various strategies for speeding up builds in GitHub Actions. The upshot of the article is a fairly decisive conclusion that the best two ways to improve build times are:

  1. Build images via a standard docker build command, while using GitHub Packages' Docker registry as a cache = Longer initial build but fastest re-build times.

  2. Build your images via docker integrated BuildKit (DOCKER_BUILDKIT=1 docker build), while using a local registry and actions/cache to persist build caches = Fastest initial build but slightly longer re-build times.

The problem

Unfortunately, when trying to implement these solutions in my own project, I discovered that both approaches fall short when it comes to multi-stage builds.

The problem is that inlined build cache only includes layers directly involved in the creation of the image. It excludes caches from various stages in my Dockerfile which are copied into the final image using

COPY --from=...

Using the above strategies did improve my image re-building time. It has gone down from ~ 3m 15s to about 1m 50s. However, this improvement was not as much as I would have liked...

The solution

Luckily, the guys behind docker integrated BuildKit have a standalone tool for building docker images. The tool comes in two parts: buildkitd - a build daemon, and buildctl - a controller for the build daemon.

The features that I was most interested in, is the ability of BuildKit controller (buildkitctl) to export/import a full set of build caches, including caches for all multi-stage layers. Combining this ability with actions/cache gave me a nearly perfect solution for speeding up multi-stage docker builds in Github Actions.

Final time to re-build the image has now been reduced from ~ 3m 15s to ~ 38s!


PS: Interestingly, BuildKit daemon (buildkitd) can run either locally or remotely. This opens up the possibility of hosting a standalone build process on your own infrastructure, which might further improve building time.

name: CI pipeline
on:
push:
branches: [master]
pull_request:
defaults:
run:
shell: bash
jobs:
build:
runs-on: ubuntu-latest
env:
DOCKER_IMAGE: ci/${{ github.job }}
BUILD_CACHE: /home/runner/.docker/buildkit
steps:
- uses: actions/checkout@v2
- uses: actions/cache@v1
with:
path: ${{ env.BUILD_CACHE }}
key: ${{ hashFiles('Dockerfile') }}
- name: Install the latest buildkit release
run: |
BUILDKIT_URL="$(curl -sL https://api.github.com/repos/moby/buildkit/releases \
| jq -r 'map(select(.name|startswith("v")))|sort_by(.name)[-1].assets[]|select(.name|endswith(".linux-amd64.tar.gz")).browser_download_url')"
curl -L "${BUILDKIT_URL}" | sudo tar -xz -C /usr/local
- name: Start buildkit daemon
run: |
sudo --non-interactive --shell <<END_SUDO
install -d -m 0750 -o root -g docker /run/buildkit
buildkitd &
while ! test -S /run/buildkit/buildkitd.sock; do sleep 0.1; done
chgrp docker /run/buildkit/buildkitd.sock
END_SUDO
- name: Build docker image
run: |
buildctl build \
--frontend=dockerfile.v0 --local dockerfile=. --local context=. \
--export-cache type=local,dest=${BUILD_CACHE},mode=max \
--import-cache type=local,src=${BUILD_CACHE} \
--output type=docker,name=${DOCKER_IMAGE} | docker load
echo "Cache size: $(du -sh ${BUILD_CACHE})"
- name: Launch a container based on the new image (example)
run: ./bin/docker run --rm ${DOCKER_IMAGE} ...
@UrsaDK

This comment has been minimized.

Copy link
Owner Author

@UrsaDK UrsaDK commented May 8, 2020

@dtinth, thanks a lot for the great writeup. 👍 It saved me a lot of time on my quest to speed-up multi-stage builds.

@epicserve

This comment has been minimized.

Copy link

@epicserve epicserve commented Jan 5, 2021

@UrsaDK, How would this work if you wanted separate caches for each Git branch?

@LarsFronius

This comment has been minimized.

Copy link

@LarsFronius LarsFronius commented Jan 12, 2021

@UrsaDK, How would this work if you wanted separate caches for each Git branch?

I think you'd need to embed the branch as a cache key in line 24.

@UrsaDK

This comment has been minimized.

Copy link
Owner Author

@UrsaDK UrsaDK commented Jan 18, 2021

@epicserve, I think the first question to ask is: why do you want to have separate caches for different Git branch? As @LarsFronius pointed out, the cache key for this workflow is defined on line 24. However, if your Dockerfile doesn't change between branches then you can just reuse the cache from the old branch.

The manual page for actions/cache explains it best: The cache action first searches for cache hits for key and restore-keys in the branch containing the workflow run. If there are no hits in the current branch, the cache action searches for key and restore-keys in the parent branch and upstream branches.

That means that all you have to do is make this action run on all of your branches by removing the branch filter on line 5. I haven't used this code in a while but from what I remember removing line 5 all-together will make this action run on all push and pull_request to any branch.

@epicserve

This comment has been minimized.

Copy link

@epicserve epicserve commented Jan 21, 2021

@UrsaDK,

However, if your Dockerfile doesn't change between branches then you can just reuse the cache from the old branch.

I don't think what you're saying is entirely true because if the files that are in your COPY commands change then that invalidates your cache for that layer and a new layer is created. So we do have files that change between branches, the files in the ./src directory for example that are used for building our Javascript assets.

Thank you for your advice. I ended up going with this approach which is working well.
https://dev.to/pst418/speed-up-multi-stage-docker-builds-in-ci-cd-with-buildkit-s-registry-cache-11gi

@UrsaDK

This comment has been minimized.

Copy link
Owner Author

@UrsaDK UrsaDK commented Feb 25, 2021

@epicserve, I'm glad you found a solution that works! I'm a little bit swamped with work at the moment, so my apologies for not getting back you earlier.

For the benefits of those who might find this discussion in the future, let me address some of the points you raised:

... if the files that are in your COPY commands change then that invalidates your cache for that layer ...

Technically, you're absolutely right. However, cache invalidation via COPY command is a bit of an edge case scenario for the way docker handles caches: during the cache lookup, as a result of either COPY or ADD command, docker calculates a checksum for each file and compares it against the checksum in the existing image. If anything has changed, then the current, as well as succeeding layers of cache, are invalidated. In all other cases cache checking only looks at the command string itself, as used inside the Dockerfile.

This is such a fundamental piece of behaviour that everyone I know covers both of these scenarios with a simple "if your Dockerfile hasn't changed".

... we do have files that change between branches, the files in the ./src directory for example ...

Yes, that is a very common requirement. In fact, more complex projects will have multiple such directories, which need to be copied into the image during the build. There are two common strategies for improving performance here:

  1. Try to put the COPY command as far towards the end of your Dockerfile as possible. This will allow you to reuse the majority of the cache preceding the COPY command.

  2. Use multistage builds, and if you do, then enabling BuildKit. This could allow you to efficiently skip unused stages during the build, as well as build stages concurrently when possible.

@teplenin

This comment has been minimized.

Copy link

@teplenin teplenin commented Mar 23, 2021

Caching not working now, because it's not rewriting on new builds. It remains always deprecated

You can change actions/cache to v2 and add restore-keys

- uses: actions/cache@v2
   with:
       path: ${{ env.BUILD_CACHE }}
       key: ${{ runner.os }}-buildkit-${{ github.sha }}
       restore-keys: |
           ${{ runner.os }}-buildkit-

It worked for me.

@joshua-barnett

This comment has been minimized.

Copy link

@joshua-barnett joshua-barnett commented Mar 27, 2021

You can also set --build-arg BUILDKIT_INLINE_CACHE=1 during the docker build this will add the cache metadata to the built image which when pushed an pulled from a registry will allow you to cache the layers of the multi-stage build.

https://docs.docker.com/engine/reference/commandline/build/#specifying-external-cache-sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment