Skip to content

Instantly share code, notes, and snippets.

@madmod
Last active April 25, 2018 00:19
Show Gist options
  • Save madmod/9a23ab6815f61b6abae7cd83005761bd to your computer and use it in GitHub Desktop.
Save madmod/9a23ab6815f61b6abae7cd83005761bd to your computer and use it in GitHub Desktop.
Its an awesome hack which lets you have a build cache (maven, npm, python, whatever) persist across Docker builds.

Awesome Docker hack for ⚡️ fast builds

Why?

Because downloading half the internet takes a while, and we all (should) have better things to do.

What?

Its an awesome hack which lets you have a build cache (maven, npm, python, whatever) persist across Docker builds.

Awesome:

  • It still works with Docker build without any extra arguments. (But you won't be able to update the build cache, only use a previous version.)
  • CI friendly and doesn't require any tools to turn it on/off.
  • You can push and pull the cache to/from a Docker repo so other people can download the internet from your (hopefully) fast Docker registry.
  • Cache can be shared globally across many projects, or be specefic to a single image tag.
  • Doesn't create junk in your repos so you don't need to .Dockerignore or .gitignore anything.

How?

A Docker image contains the build cache which is pulled into the build context with a standard FROM in the Dockerfile, then after a build it is copied from the build image into an intermediate build stage, which can be saved back into an image to update the cache. Various levels of magic can be applied to automate the sourcing and saving of the build cache.

ARG BUILD_CACHE_IMAGE=scratch
# Get the build cache first.
# HACK: Pull this image even though we want to use `COPY --from=$BUILD_CACHE_IMAGE` 
#   because it isn't supported yet. See: https://github.com/moby/moby/issues/34482
FROM $BUILD_CACHE_IMAGE as last-build-cache

# You need to name your build stage so you can save the updated build cache.
FROM alpine:3.7 as build

# Do anything not dependent on your build cache first like installing system packages,
# so you don't need to rebuild if your cache changes. ( Although this will work for 
# caching system packages also if you want! ;D )

# Use the build cache.
COPY --from=last-build-cache . /my/cache/path

# You might need to set up your package manager to use the cache with RUN or 
# maybe COPYing a config file in. You might also need to `mkdir -fp /my/cache/path`
# so there aren't any errors if the cache is empty.

# Do your build here.

# Make a build stage with the updated build cache and name it so we can --target it.
FROM scratch as build-cache
COPY --from=build /my/cache/path .

# You could make a lighter runtime image here and COPY in your artifacts, or you 
# could just do `FROM build` if you want that to be your runtime.
FROM build
# If you choose to do `FROM build` you might want to remove the build cache from your
# final layer to reduce image size.
RUN rm -rf /my/cache/path

You might want to put your base image and your cache path as ARGs also so you don't need to repeat them. Keep in mind that when ARG is used in FROM it must be defined before the first FROM in the Dockerfile.

No magic method.

Just do Docker build or whatever without any extra arguments and you won't get any caching. There is a very slight build time increase as it skips the empty layers.

Less magic method.

First make an empty image for the build cache to be stored in with:

tar cv --files-from /dev/null | Docker import - my-project-build-cache

Populate the build cache image:

Docker build \
  --build-arg BUILD_CACHE_IMAGE=my-project-build-cache \
  --target build-cache \
  -t my-project-build-cache .

You might want to do this periodically to get newer dependencies so your package manager has less updating to do.

To use the cache just include the build arg with the cache image to use like this:

Docker build \
  --build-arg BUILD_CACHE_IMAGE=my-project-build-cache \
  -t my-project .

Note that this method will not update the cache after the build

You can now use standard Docker commands to version, export, push, pull, and erase your cache image, or just ignore it altogether and enjoy faster builds.

More magic method.

Abuse labels and the Docker cli to tag intermediate layers so you can update the cache on each build.

TODO Explain this.

Dockerfile

ARG BUILD_CACHE_IMAGE=scratch

# This doesn't need to be before FROM, but I like it here because it is a "global".
ARG BUILD_CACHE_PATH=/my/cache/path

# A fancy way to not repeat the base image without needing an ARG is to make a
# build stage which does nothing.
FROM alpine:3.7 as base-image

# Get the build cache.
# HACK: Pull this image even though we want to use `COPY --from=$BUILD_CACHE_IMAGE` 
#       because it isn't supported yet. See: https://github.com/moby/moby/issues/34482
FROM $BUILD_CACHE_IMAGE as last-build-cache

FROM base-image as build

# Do anything not dependent on your build cache first like installing system packages, 
# so you don't need to rebuild if your cache changes. ( Although this will work for 
# caching system packages also if you want! ;D )

# Use the build cache.
COPY --from=last-build-cache . $BUILD_CACHE_PATH

# You might need to set up your package manager to use the cache with RUN or
# maybe COPYing a config file in. You might also need to `mkdir -fp /my/cache/path`
# so there aren't any errors if the cache is empty.

# Do your build here.

# HACK: Label the stage so the magic build.sh can find the correct intermediate 
#       layer and tag it. This must be done after all commands you want in the tagged layer.
LABEL stage=build

# Make a build stage with the updated build cache and name it so we can --target it.
FROM scratch as build-cache
COPY --from=build $BUILD_CACHE_PATH .

# Here you could have other stages, like one with tools for development, a test 
# running stage, etc.

FROM base-image as runtime

COPY --from=build /my/artifacts /somewhere/sensible

# Do whatever to make your runtime. You could even apply this cache pattern to multiple 
# build stages and many pacakge managers as long as you have `FROM runtime` at the end 
# so the default Docker build works as expected.

# HACK: Label the stage so the magic build.sh can find the correct intermediate 
#       layer and tag it. This must be done after all commands you want in the tagged layer.
LABEL stage=runtime

build.sh

#!/usr/bin/env bash
# Build the project Docker image.

set -eo pipefail

: ${IMAGE_VERSION_TAG:="latest"}

: ${IMAGE_NAME:="my-cool-image"}

# NOTE: To disable the build cache set this to "scratch".
: ${BUILD_CACHE_IMAGE:="${IMAGE_NAME}:${IMAGE_VERSION_TAG}_build-cache"}


# Create an empty build cache image if needed.
if [[ "$(Docker images -q ${BUILD_CACHE_IMAGE} 2> /dev/null)" == "" ]]; then
  # Make our own empty image since we can't use scratch in Docker tag and FROM 
  # scratch fails because Docker build won't make empty images anymore.
  tar cv --files-from /dev/null | Docker import - ${BUILD_CACHE_IMAGE}
fi

time Docker build \
  -f ${DOCKERFILE_NAME} \
  --build-arg BUILD_CACHE_IMAGE=${BUILD_CACHE_IMAGE} \
  -t ${IMAGE_NAME} .


function get_stage_layer () {
  STAGE_NAME="$1"
  # TODO: Find a way to limit this based on more than time so that concurrent builds 
  #       using this pattern arent broken. (Doesnt work -> --filter="reference=${DOCKER_CONTAINER_NAME}")
  Docker images --filter "label=stage=${STAGE_NAME}" --format '{{.CreatedAt}}\t{{.ID}}' | sort -nr | head -n 1 | cut -f2
}


function tag_stage_layer () {
  STAGE_NAME="$1"
  STAGE_LAYER="$(get_stage_layer ${STAGE_NAME})"
  Docker tag ${STAGE_LAYER} ${IMAGE_NAME}:${IMAGE_VERSION_TAG}_${STAGE_NAME}
  if [ "$DOCKER_VERSION_TAG" == "latest" ]; then
    Docker tag ${STAGE_LAYER} ${IMAGE_NAME}:_${STAGE_NAME}
  fi
  echo "Successfuly tagged intermediate build stage layer (${STAGE_LAYER}) ${IMAGE_NAME}:${IMAGE_VERSION_TAG}_${STAGE_NAME}"
}


tag_stage_layer build
tag_stage_layer build-cache
tag_stage_layer runtime
# You could do this for many other layers, and they could even be in the same FROM
# so you don't need to COPY things around, just incrementally run commands and LABEL 
# after each of them.
#tag_stage_layer develop
#tag_stage_layer test
#tag_stage_layer release

Very magic method.

TODO: Explain how to share the cache across projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment