Caching Docker builds in GitHub Actions is an excellent article by @dtinth which analyses various strategies for speeding up builds in GitHub Actions. The upshot of the article is a fairly decisive conclusion that the best two ways to improve build times are:
-
Build images via a standard
docker build
command, while using GitHub Packages' Docker registry as a cache = Longer initial build but fastest re-build times. -
Build your images via docker integrated BuildKit (
DOCKER_BUILDKIT=1 docker build
), while using a local registry and actions/cache to persist build caches = Fastest initial build but slightly longer re-build times.
Unfortunately, when trying to implement these solutions in my own project, I discovered that both approaches fall short when it comes to multi-stage builds.
The problem is that inlined build cache only includes layers directly involved in the creation of the image. It excludes caches from various stages in my Dockerfile
which are copied into the final image using
COPY --from=...
Using the above strategies did improve my image re-building time. It has gone down from ~ 3m 15s
to about 1m 50s
. However, this improvement was not as much as I would have liked...
Luckily, the guys behind docker integrated BuildKit have a standalone tool for building docker images. The tool comes in two parts: buildkitd
- a build daemon, and buildctl
- a controller for the build daemon.
The features that I was most interested in, is the ability of BuildKit controller (buildkitctl
) to export/import a full set of build caches, including caches for all multi-stage layers. Combining this ability with actions/cache gave me a nearly perfect solution for speeding up multi-stage docker builds in Github Actions.
Final time to re-build the image has now been reduced from ~ 3m 15s
to ~ 38s
!
PS: Interestingly, BuildKit daemon (buildkitd
) can run either locally or remotely. This opens up the possibility of hosting a standalone build process on your own infrastructure, which might further improve building time.
@epicserve, I'm glad you found a solution that works! I'm a little bit swamped with work at the moment, so my apologies for not getting back you earlier.
For the benefits of those who might find this discussion in the future, let me address some of the points you raised:
Technically, you're absolutely right. However, cache invalidation via COPY command is a bit of an edge case scenario for the way docker handles caches: during the cache lookup, as a result of either COPY or ADD command, docker calculates a checksum for each file and compares it against the checksum in the existing image. If anything has changed, then the current, as well as succeeding layers of cache, are invalidated. In all other cases cache checking only looks at the command string itself, as used inside the
Dockerfile
.This is such a fundamental piece of behaviour that everyone I know covers both of these scenarios with a simple "if your Dockerfile hasn't changed".
Yes, that is a very common requirement. In fact, more complex projects will have multiple such directories, which need to be copied into the image during the build. There are two common strategies for improving performance here:
Try to put the COPY command as far towards the end of your
Dockerfile
as possible. This will allow you to reuse the majority of the cache preceding the COPY command.Use multistage builds, and if you do, then enabling BuildKit. This could allow you to efficiently skip unused stages during the build, as well as build stages concurrently when possible.