Skip to content

Instantly share code, notes, and snippets.

@mpdude
Created September 25, 2019 19:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mpdude/71892679dd160e849d001498f9d3d9dd to your computer and use it in GitHub Desktop.
Save mpdude/71892679dd160e849d001498f9d3d9dd to your computer and use it in GitHub Desktop.

Question on how to best build Docker images

Initial situation

In most of my projects I need to run a few steps after I checkout the code from version control and before I can actually use (or work on) it. Examples include

  • run composer, npm, yarn, ... to fetch dependencies
  • run grunt, gulp or similar front-end build pipelines
  • run some legacy code generation tools 🙀

... and the list goes on.

Bonus points: SSH keys

To make things more interesting, I also need to access private repositories when fetching dependencies. That is, a suitable SSH key must be available for running any of the dependency management tools.

In my work environment this key is usually loaded into the ssh-agent, but not available as a plain file that I could copy into intermediate build stages.

However, mounting the ssh-agent socket into a running container is pretty straightforward. Building images is a different story, as bind mounts are not available at build time. Hacks using socat are rather ugly.

What about dev-prod-parity?

The above steps need to happen for the image that will finally go to production as well as for local development. The front-end build pipeline will, in fact, be run over and over again during development.

I think it would be smart if I would not have to make sure I am using the same tools (and versions of them) on my local machine for local development and inside a multi-stage Dockerfile to perform all the steps during image builds.

Wouldn't it be more Docker-style to just run a container for gulp, npm etc.? I could probably get away with the default base images in that case.

One way of doing this is by running the appropriate containers with my local workdir mounted into them. And by doing multi-stage build steps FROM them. In either case, both need to match each other, but the one is a docker run -v ..., the other is RUN inside the Dockerfile.

Alternatively: Is it really a good approach to have an additional Dockerfile.dev that installs all of those tools, effectively working as a "Vagrant Box disguised as a container"?

Using the local workdir as a staging area?

What if I just use my local workdir (or a workdir on a CI server, FWIW) as a staging area?

  1. Checkout code
  2. Run all necessary tools either as local installs or as Docker containers, one at a time. Mount workdir + SSH agent socket into each of them.
  3. Vendors, build artifacts, ... end up in my local directory
  4. Build final image by copying workdir into a standard Apache/ngnix/PHP/... base image
  5. For development: Run that image, additionally mounting workdir into it
  6. For development: Run (2) again as necessary.

Possible issues and things people have mentioned on this:

  • The build context that is finally sent to Docker is huge when it contains vendors
  • "Don't mount your workdir into containers to get results out, that's an anti-pattern."
  • "The build process should entirely be described in the Dockerfile, not run in your local shell or depend on your workdir."
  • "Building the Docker image for production and development are different things anyway. Use different Dockerfiles or even stick with Vagrant for development."

Did you encounter similar issues already? Are there alternative techniques that work well for you?

Please share your comments 👇🏻. 🙏🏻

@krumeich
Copy link

Concerning the use of SSH keys, I was facing a similar problem a few months ago: I'd created a docker image that contains a customized TeX Live distribution. TeX Live is both huge and relatively stable, so it didn't make sense for me to rebuild the image completely whenever I want to compile the documents on our Jenkins server – just to get a fresh clone of the document repo into the image. What turned out to work very well for me is defering the cloning of the document repo until runtime of the container. With the help of a second container that does nothing but fetching the documents. Here is a rough outline of the steps involved:

  • Create a volume that shall hold the data to be worked on ("the volume")
  • Run the "fetcher" container and mount the volume.
  • The fetcher container just clones the repo into the directory mapped to the volume and terminates
  • Run the TeX Live container and mount the volume that now contains the cloned repo
  • Process the documents and copy the PDF artifacts to their destination location (on a vieux/sshfs connected volume)
  • Remove all containers and volumes

Unfortunately, the code is company confidential for now, so I can't point you to its location.

As for the bonus question: On the Jenkins server I have created a volume that contains a private key and a known_hosts file. The key is not protected by a passphrase (it's all run internally so we consider this safe enough. YMMV.). I mount this volume into the "fetcher" container under /root/.ssh.

Tu summarize: We solve the problem by separating data and programs and bridging them with a docker volume. I'm completely aware that this scenario widely differs from the situation you described. However, it might give you further ideas to solve your problem.

Added bonus: Since everything happens in volumes and containers, we do not clutter the workspace on the CI/CD server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment