Note that these reflections are specifically tailored to a conversation about Docker we're having at 18F, and as such they have a few assumptions:
- All developers use OS X.
- Production deployments will be done via cloud.gov.
- Developers have access to AWS sandbox accounts.
Most development tools are built to assume that they are a "singleton" in the context of the system they're installed on.
But as soon as you have two projects that require different versions (or configurations) of that tool, you start needing another
tool that manages the version or configuration of that tool for you, so that each project can use the version or configuration
that it needs. This is how tools like nvm
(for node), rvm
(for ruby), virtualenv
/pyenv
(for python) and such come
into existence. It adds a lot of cognitive overhead to the development process.
Containers get rid of this problem entirely--but not without introducing new cognitive overhead that developers need to understand. At least one benefit of the new overhead, though, is it's generic enough to apply to all kinds of problems, rather than being specialized to a particular type of development tool.
Installing Docker on OS X is easy, and as the CALC docker instructions attest,
setup largely boils down to git clone
followed by docker-compose up
, peppered with a few manual tasks.
Without Docker, the more dependent services your project has, the harder it's generally going to be for someone to configure
and start up a development environment. With Docker, this often isn't the case: as we've added redis, worker and scheduler
processes to CALC, developers haven't had to change their environment, because docker-compose up
does everything for them.
Another nice thing about docker-compose up
is that it starts all services in a single terminal window and prefixes their
output with their container name. This is already a lot more convenient than manually opening a separate terminal window for
every dependent service, which is what non-Docker setups often make developers do.
Unlike vagrant, a big promise of Docker is that it's not just intended for development purposes--it's also intended for deployment, because containers are so good for process isolation and resource management. This means that we can ideally have great dev/prod parity in theory. In practice, things are a bit more complicated, especially since cloud.gov currently calls its Docker support an experimental feature.
While we don't do it in CALC, using Docker containers on Travis CI (not sure about CircleCI) is easy and I've done it before. This makes it particularly easy to ensure dev/CI parity, so you don't have tests that work on your local machines but mysteriously fail in CI.
Because we're not allowed to use tools like ngrok to expose our development instances to coworkers at
18F, being able to conveniently deploy our work to a temporary Amazon EC2 instance becomes important. Fortunately, thanks
to docker-machine
, this isn't hard; see CALC's guide to deploying to cloud environments for more details.
Once one learns the fairly straightforward syntax of dockerfiles and docker compose files, Dockerfile
and docker-compose.yml
become handy "recipes" on how to reliably set up a development (or even production) environment from scratch. So even if
one decides not to use Docker, they can still consult those files to figure out how everything is configured and connected.
This transparency also means that it wouldn't be too hard for us to migrate from Docker to a different containerization technology, if that ever becomes a need. It's the opposite of vendor lock-in.
The incredibly low cost of adding new containers to a docker-based project means that it becomes very easy to add new developer tooling. For example, in CALC, we were able to trivially add mailcatcher support during development, despite the fact that it's ruby-based (and CALC is a Python project).
Thinking of containers as "lightweight VMs" is a good way to get started with Docker, but sometimes that abstraction doesn't work very well, especially once you need to start debugging certain kinds of problems.
It's easier to understand how containers work when your Docker host is a Linux system; however, on OS X, there's a hypervisor running a Linux VM in the way (yes, even on Docker Native for OS X), along with a variety of other tricks that might make it a bit harder to develop a good mental model for how they work.
That said, developing a solid mental model for how containers work, especially how they work under the hood, is valuable knowledge that can help one in a wide variety of problem areas that go far beyond setting up development environments. Even if one doesn't ever use chroot, cgroups, or namespaces on their own, understanding what they do improves one's understanding of the foundational plumbing that underlies any application, irrespective of implementation language.
I'm not sure if it's still the case, but when Docker Native for OS X was first released, mounting a directory on one's Docker host to a volume on a container was really slow. As in, tests took 10 times longer to run.
I ended up using a third-party tool called dinghy, which works a bit like
the old Docker Toolbox for OS X, to speed things up. However, like the old Docker Toolbox, it also introduces an extra
layer of abstraction, because one now needs to use docker-machine
to develop locally.
If this problem hasn't yet been fixed, it presents one with a frustrating trade-off: make development slow but less complicated, or make it fast but more complicated. Argh.
If your Docker containers never write new files to mounted volumes, this generally isn't a problem.
However, if they do--e.g. if your static asset build pipeline is running in a container--then by default, those files
are owned by root
. This means that deleting them from your Docker host becomes annoying.
To work around this, I've had to add an entrypoint script that creates a user in the container with the same UID as the owner of the project's root directory. I've literally written this script in JS, Ruby, and Python at this point, and it's really annoying.
That said, I started following this pattern in late 2015, and it's possible that Docker/docker-compose may have evolved new functionality to obviate the need for this.
Because docker-compose up
starts up all of a project's services at once, but because it has no way to detect when a service
is actually ready, it's easy for race conditions to exist between dependent services.
To work around this, I've had to add an entrypoint script that waits for a project's database to be ready before running any commands. I've written this script in Ruby and Python at this point, and it's kind of annoying.
Argh, I have had so many problems with this and I still barely understand it well enough to explain it to others. There's
the EXPOSE
directive in dockerfiles, the ports
directive in docker compose files, and then the fact that those ports
aren't even exposed when you run docker-compose run
instead of docker-compose up
, unless you pass the --service-ports
option to it...
Oh, and on top of all that you'll also want to bind your server to all network interfaces (0.0.0.0
) rather than just 127.0.0.1
, or else things still won't work.
It's not hard in theory, but it's an annoying chunk of cognitive overhead that one simply doesn't need to deal with when they're not using Docker/docker-compose.
Because of the way that dockerfiles work, and the fact that containers are so ephemeral, it can potentially be quite annoying
to change the dependencies in one's projects. Often just editing a requirements.txt
/package.json
/Gemfile
, as one would
do in a non-Docker environment, causes all the packages listed in said file to be re-retrieved from the
internet and installed, which starts taking a very long time once you've got lots of dependencies.
There's various ways to work around this; for instance, if I'm just tinkering with a dependency, I'll temporarily add a
RUN npm install foo
to the end of my Dockerfile
. We're discussing this more in-depth in CALC#1230 but the point is that it's something that can become non-trivial once you move
to Docker, and that can be annoying.
See CALC's selenium.md
for more details.
This is something that our non-technical folks in particular frequently get tripped up on, but more experienced devs can too.
The safest approach is simply to re-run docker-compose build
every time you git pull
but this can be hard to remember.
Then again, though, non-Dockerized environments have parallel problems: CALC devs who aren't using Docker actually have to
run multiple commands like pip install -r requirements.txt
and npm install
to make sure their environment stays up-to-date,
so perhaps Dockerization is still a net win in this regard.
If there's some sort of development or debugging tool that you like using, but which isn't currently part
of the Docker setup, figuring out how to add it to your setup and using it properly can be harder
than in a non-Docker environment, as one of our developers found when trying to use ipdb
in CALC.
It can also be challenging to add such a tool to only your setup: you can modify your local Dockerfile
or requirements.txt
to ensure that your favorite debugger is used, but then you have to make sure you
don't actually commit those changes to the repository.
Thanks to union file systems, Docker uses way less disk space than a virtual machine; but it's still using more than developing locally without Docker or a VM. This can be particularly unfortunate if you need to pull a Docker image and are, say, tethering from a coffee shop.
Freeing up disk space taken up by Docker can also be a chore, though a recently-introduced
docker system prune
command makes it
easier than it has been.
Thinking of
Changing python/node/ruby dependencies can be cumbersome
, have you ever tried to usenpm link
with docker? For instance, cloudgov dashboard is running it's front end build watch in docker but for it's cloudgov-style dependencies in package.json, we need to npm link to the local cg-style on our dev computers. Is this possible/hard?