Skip to content

Instantly share code, notes, and snippets.

@dqminh
Created November 4, 2014 08:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dqminh/536d03bc1d2cddac31f2 to your computer and use it in GitHub Desktop.
Save dqminh/536d03bc1d2cddac31f2 to your computer and use it in GitHub Desktop.
builder v2

Builder v2

Current approach

Right now, the builder is very tightly coupled with docker core. The builder has the following roles:

  • process the build context
  • parse Dockerfile into sexp-expressions
  • evaluate individual sexp-expressions which may involves the following job:
    • persist configurations i.e ENV, EXPOSE, VOLUME, etc.
    • import external data into a new layer i.e ADD, COPY, etc.
    • run extra commands that creates a new the layer i.e. RUN, etc.

Problems

  • current builder is not extensible. The only interface to the builder right now is Dockerfile, which in many cases are clunky / hard to extend / hard to deprecate.
  • hard to guarantee backward-compatibilities between different versions of Dockerfile and/or deprecate old Dockerfile instructions. Ideally I would want a Dockerfile to always be able to build even with new version of Docker and/or builder.

Proposed Approach

Semantics of processing the build context as well as parsing dockerfile should be separated from docker core. Docker core should only concern about processing the build layers ( ideally via a set of defined API ). For example a set of Dockerfile instructions can be translated into API calls as follow: ( all the calls are just example and may not be correct at all)

--> POST /build
  47c0 # build session id

FROM golang:1.3.1
  --> POST /images/create?fromImage=golang:1.3.1
  asdf
  --> POST /build/47c0/create?fromImage=asdf
  31f1 # container id
  --> POST /build/47c0/commit/31f1
  31f0 # layer id

ADD . /go/src/app
  --> POST /build/47c0/add?fromImage=31f0&path=/go/src/app
  <tar stream>
  f50c # container id
  --> POST /build/47c0/commit/f50c
  f41c # layer id

BUILD /go/bin # nested build
  --> POST /containers?fromImage=f41c
  f401
  --> POST /containers/f401/copy?resource=/go/bin
  </go/bin tar stream> # saved somewhere else so we can reuse after
  d405
  # not commit because we dont want cache here

FROM scratch
  --> POST /images/create?fromImage=golang:1.3.1
  scratch
  --> POST /build/47c0/add?fromImage=scratchf0&path=/go/bin
  </go/bin tar stream> # previously extracted /go/bin
  d789

ADD dnsdock /
  --> POST /build/47c0/add?fromImage=31f0&path=/go/src/app
  <tar stream>
  e234 # container id
  --> POST /build/47c0/commit/e234
  qwer # layer id

ENV CGO_ENABLED 0
  --> POST /images/create?fromImage=golang:1.3.1&change="ENV CGO_ENABLED 0"
  wert # layer id

By separating the core API from the builder, we are also able to separate the semantics of building/processing build context + dockerfile, which means that user can write his own build script that is not depends on Dockerfile, but instead programmatically processing build context and calling the build API directly. For example, we would be able to write an ansible builder that build ansible playbooks into a container by writing a custom wrapper around ansible connections

This also means that as long as a client has access to Docker's API endpoint, it can behave as a builder, which opens up several possibility of decouple build file semantics from the core. For example, we can have a builder that builds dockerfile v1 that will be run as:

# this starts a builderv1 that parse Dockerfile v1 inside /buildcontext
# and translate those into API calls
docker run \
  -v /var/run/docker.sock:/var/run/docker.sock docker/builderv1 \
  -v /tmp/buildcontext:/buildcontext /bin/builder

while dockerfile v2 will be run as:

# this starts a builderv1 that parse Dockerfile v2 inside /buildcontext
# and translate those into API calls
docker run \
  -v /var/run/docker.sock:/var/run/docker.sock docker/builderv2 \
  -v /tmp/buildcontext:/buildcontext /bin/builder

docker build -t image . will be translated into the corresponding docker run after processing the build context into /tmp/buildcontext

In the same spirit, an custom builder will be: ( taken from https://gist.github.com/tonistiigi/c7b539c2a1a0568020c6 )

# this starts a builderv1 that parse Dockerfile v2 inside /buildcontext
docker run \
  -v /var/run/docker.sock:/var/run/docker.sock nitrousio/builder \
  -v /tmp/buildcontext:/buildcontext /bin/builder

> cat /bin/builder
#!/usr/bin/env nodejs
var docker = require('docker-builder')

var container = docker.New()

container.copy('/context/foo', '/bar')
container.commit() // also sets itself to next

container.env('FOO', 'bar')
container.workdir('/foo/bar')

container.commit()

docker.tag(container)

NOTE: However Core API also depends on some of the Dockerfile instruction for changing image configuration ( see commit --change, or import --change ), so we might also have to maintain the (latest?) parser inside the core.

@tonistiigi
Copy link

Are so many new Remote API endpoints even needed? I would have though that after docker commit --change and scp compatible docker cp we would have all the use cases covered.

To keep the maintenance workload under control we should avoid creating the same functionality in many places and try more to inherit it from where its already implemented. Still, I think this is a better approach than the "separate binary for everything", proposed by @erikh. Even in that case, these binaries would just connect to some outer service, meaning that they can be bypassed and new builder-libraries can be created(like the Node one in the example). We should see this as inevitable thing that will happen and only design for the API. These binaries should be just one implementation on top of that.

I still think that for creating 99% of the images, current Dockerfile format is the best solution. But I am in favor of having a lower level implementation and making current Dockerfile based building on top of that as a separate tool(at least when I don't take the loss of observability into an account). As well as any other alternative tool to build Docker images.

It is important that the build process happens inside a container like in your docker run examples. This should (somehow) be a hard requirement to make sure images are always easy to rebuild by anyone. In my own write-up I suggested a shebang + comment for this.

One difficulty with this approach is caching. The build script will always need to run through(this is fast anyway) but API needs to be designed so that every step that adds a new layer has a cacheable criteria(like it does atm). Turning everything into a single layer based on the script checksum should not be acceptable. Another, and more complicated problem is the cleanup of a cancelled build. This has a very bad UX in current implementation also, but I think in current implementation it could at least be added quite easily. In here its more complicated because if we just cancel in some random point things could end up in very broken state.

Following nested builds example, it wasn't quite clear for me. But I guess the general idea is to have an endpoint for downloading a tar from a container and another for starting a new layer hierarchy. One of the key points in nested builds is that layers should be maintained. I do not want to use it just for static binaries. To me, the assumption that the best environment for building an image is also the best environment for running it, is wrong and needs to be addressed.

@dqminh
Copy link
Author

dqminh commented Nov 4, 2014

Are so many new Remote API endpoints even needed? I would have though that after docker commit --change and scp compatible docker cp we would have all the use cases covered.

To keep the maintenance workload under control we should avoid creating the same functionality in many places and try more to inherit it from where its already implemented.

I very much agree to this. As a matter of fact, i dont think the example API I wrote above is good at all. They at best just serve as examples of how the client should interact with the daemon.

One difficulty with this approach is caching. The build script will always need to run through(this is fast anyway) but API needs to be designed so that every step that adds a new layer has a cacheable criteria(like it does atm). Turning everything into a single layer based on the script checksum should not be acceptable.

I think that caching should be controlled by the daemon. The caching behavior is still have to be defined though. I'm leaning more towards some sort of transactional behavior where the client has to open/commit a transaction and every request in between will be counted towards the cache, and a single layer will be generated per transaction.

But I guess the general idea is to have an endpoint for downloading a tar from a container and another for starting a new layer hierarchy

Yes, i guess that would serve the purpose right ? The idea would be to extract part of a containers and put it into a container based on another image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment