Skip to content

Instantly share code, notes, and snippets.

@hermanbanken
Last active January 10, 2020 08:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hermanbanken/bfe5251686a78dd63c2da593974a6bc2 to your computer and use it in GitHub Desktop.
Save hermanbanken/bfe5251686a78dd63c2da593974a6bc2 to your computer and use it in GitHub Desktop.
Reproducible Image SHA with Kaniko

Reproducible Container Image SHA's with Kaniko

At Q42 we use Docker a lot nowadays. A few projects use full-fledged Kubernetes on GKE with Docker images. Other projects use docker with AppEngine Flex, which has become our standard for quickly deploying a standalone workload, because it does the boring things like hosting & SSL (LetsEncrypt) for us.

In one particularly large project (Hue) we have a multitude of different micro-services that all have separate Docker images. Combined with CI/CD releasing to our test & staging environments, this works beautifully. There is a catch to this CI/CD style however: every release still needs to be manually accorded, and every minor tweak to for example the README.md triggers a new release. This has a lot of overhead, and I would like to alleviate this by detecting identical releases: releases that ship the same 'binaries' (or source code + modules in the case of NodeJS).

Dockers random SHAs

Unfortunately, our docker builds have irreproducible SHAs. This is due to how multi-stage builds and COPY work: every COPY --from (in contrast to local copy) generates a random SHA. Examining the contents of the layers shows that they are 100% identical, with the exception being the random layer SHA.

# Create two builds
$ docker build . --no-cache -t build1;
$ docker build . --no-cache -t build2

# Extract the images to compare the contents
$ mkdir build1 build2;
$ docker save build1 | gzip > build1.tar.gz; tar -xvf build1.tar.gz build1;
$ docker save build2 | gzip > build2.tar.gz; tar -xvf build2.tar.gz build2;

$ (cd build1 && find . -name layer.tar -exec shasum {} \;)
# SHASUM                                  FILE
6b2e943093910e2c255ffaca2525de3ea3220898  ./c2394e1953e70842a16dd159e399c1513a63201a1d13b83a0e3caec3864040ce/layer.tar
ede0b4a5744626532dc4aeb2bd0fbf84ce729017  ./8e635d6264340a45901f63d2a18ea5bc8c680919e07191e4ef276860952d0399/layer.tar

$ (cd build2 && find . -name layer.tar -exec shasum {} \;)
# SHASUM                                  FILE
6b2e943093910e2c255ffaca2525de3ea3220898  ./5cc38bc170a7225fdc43d8e67a388d1363ab924118a17e09b6bcfaac67f0f967/layer.tar
ede0b4a5744626532dc4aeb2bd0fbf84ce729017  ./8e635d6264340a45901f63d2a18ea5bc8c680919e07191e4ef276860952d0399/layer.tar

# The former file is our Golang executable, with different layername, but identical SHASUM. 
# The latter file is the alpine base image, which correctly has the same name

Initially we suspected this might be related to mtime being taken into consideration for the SHA, and the time of Golang executables being different. However, it turns out that even by putting a touch --date=@0 /binary before the second stage starts does not prevent a different SHA, contrary to the Layer Spec.

# A typical Dockerfile for our Golang projects
# Resetting the mtime does not work here

FROM golang:1.13-alpine3.10 AS builder
RUN apk add git
WORKDIR /go/src/github.com/Q42/somerepo
ADD . /go/src/github.com/Q42/somerepo
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o /binary .
RUN touch --date=@0 /binary
# (should reset the mtime to keep the docker build-sha stable for identical builds)

FROM alpine:3.10
COPY --from=builder /go/src/github.com/Q42/somerepo /
CMD ["/binary"]

There are also some old & resolved issues in moby/moby regarding this issue: 9391, 12031. The latter issue is a 'workaround' for the Docker cache, so that when it comes to caching, the mtime is ignored: this makes sense if you have a single build host or a shared build cache. For situations where you can't have some shared cache I would like the option to simply strip all, or relevant, mtimes. Note that it is not feasible to simply strip mtime for all purposes, because that might break applications that depend on the mtime of some files.

Kaniko to the rescue

Unfortunately, Docker does not have that option, but we're not completely out luck: there are many alternatives for building an OCI (Open Container Image). Take for example Kaniko (which has the additional feature of running 100% in user space).

And, Kaniko has a configuration option which does exactly what we want:

--reproducible
	Set this flag to strip timestamps out of the built image and make it reproducible.

The authors advise running Kaniko via Docker (on OSX at least) like this:

context=$PWD
dockerfile=Dockerfile
destination=eu.gcr.io/myproject/myrepo
docker run \
    -v "$HOME"/.config/gcloud:/root/.config/gcloud \
    -v "$context":/workspace \
    gcr.io/kaniko-project/executor:latest \
    --dockerfile "${dockerfile}" --destination "${destination}" --context dir:///workspace/ \
    --reproducible

Adding the --reproducible flag we indeed get an image with the same sha when running Kaniko twice on the same context but with different caches or on different hosts. Switching over from Docker to Kaniko is not trivial, so we're not there yet, but at least we are a bit closer to reducing the overhead and you might be helped with this too!

References:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment