Skip to content

Instantly share code, notes, and snippets.

@teddziuba
Last active December 20, 2015 11:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save teddziuba/6127257 to your computer and use it in GitHub Desktop.
Save teddziuba/6127257 to your computer and use it in GitHub Desktop.
Docker Service Discovery Proposal

Mental Models

My mental model of a Docker image is "the environment in which a given application runs", from the application's perspective, it has the entire machine to itself.

My mental model of a service is a single entry point ("address") that dispatches incoming service requests to one of N different processes. Each process is running inside a Docker container.

A process is an program, running inside a Docker container. A process has 0 to N service dependencies, and connects to these services by way of their addresses.

Running a Process with Dependencies

When a process with dependencies starts, it must know service addresses by either:

A) being explicitly told at invocation (docker run -e SERVICE_URL=xxx) B) discover service addresses on its own, before running the target program

It's worth noting that (A) can be accomplished by a (B) implementation, simply by reading the service addresses from the environment and returning them to the target process.

Running an Addressable Process

Ideally, addressable services run inside Docker. An addressable service is composed of N "service processes". Each service process may process-specific metadata that defines domain-specific information about how to connect to the service process. This metadata will be used in determining the service address.

When an addressable service process starts in Docker, one of the following must happen:

A) The service process announces its existence + metadata B) Something discovers the existence of the service process + metadata

After the service's existence is known, all of the following must happen:

A) Something remembers this servicen process's existence
B) Something determines the address for the service,
   given all the service processes in existence.
C) Something must notice when the service process goes away,
   and re-define the service address accordingly (if at all).

Service Metadata

Simply knowing the existence of a collection of service processes is likely not enough information to determine the service address. For example, running a clustered PostgreSQL service requires that the entry point load balancer (like pgPool) must know the authentication credentials for every PostgreSQL server back-end instance. Clients will connect to pgPool as the "address", and be unaware of the back-end instances.

Any "director" type program that knows about all PostgreSQL instances and produces the address for the pgPool load balancer must have access to these credentials.

As such, service processes need to be able to declare process-specific metadata.

Implementation

Reading Dependency Addresses

When a process starts, it must be aware of its dependency addresses and their associated metadata. I propose the following:

A) An /environment directory in the process's container B) An executable program for each environment variable/address to be discovered C) Before executing the process (whatever the target of run is), sequentially execute the programs in /environment and add their outputs to the current environment.

Example:

/environment/DATABASE_URL:

#!/bin/bash

# In a large installation, reach out to Zookeeper to find my database.
curl http://zookeeper.vip.phx.ebay.com/znodes/v1/dbcluster/master?dataformat=utf8

And on execution:

root@46af61bef758:/# /environment/DATABASE_URL
postgresql://Ohm3quu7:TieJ3oom@postgres.vip.phx.ebay.com/Ieb8owee

root@46af61bef758:/#

And this ought to run in a harness, after which, the environment variable is set:

DATABASE_URL=postgresql://Ohm3quu7:TieJ3oom@postgres.vip.phx.ebay.com/Ieb8owee

There are several key advantages to this approach:

A) It allows for deployment-specific code to determine service addresses and metadata, and this deployment-specific code runs in the application container, not in the host launching the docker process. B) It can trivially work without central coordination. The Zookeeper example provided demonstrates that it can work with central coordination, but it's certainly reasonable for these environment-variable scrpts to do trivial work, like pass-though environment variables set by the docker invocation, or read data from the local dockerd.

Side Note: Dev/Test/Prod environments

Clearly, you will not be connecting to the production database from development. We usually have separate configs for development, testing, and production, which can be trivially implemented with this approach:

my-project/ environments/ production/ DATABASE_URL MEMCACHE_SERVER PAYPAL_CLIENT_KEY PAYPAL_CLIENT_SECRET development/ DATABASE_URL MEMCACHE_SERVER PAYPAL_CLIENT_KEY PAYPAL_CLIENT_SECRET

And when you run your app, define the "environment" with a volume mapping:

ted@workstation:~/my-project $ docker run -v ./environments/development:/environment -d myproject

The "environment harness" then executes the environment programs before executing the main cmd of the container.

Declaring Service Process Existence and Metadata

On the other side of the coin, when a service process starts, it must make itself and its metadata known to the world.

I propose the mirror-image of the discovery method:

A) An /export directory in the service process's container B) An executable program in /export for each variable to be declared C) A special executable, /export/bootstrap, that is run before any of the declaration executables are run. D) When the container starts, a harness runs /export/bootstrap and then runs each of the declaration executables.

Like above, these declaration variables are free to do what they please, for example, register the process with an instance of Zookeeper in production. They are not required to print anything to STDOUT, as this sort of metadata registration will be installation specific.

The Single Machine Case

All of the above can be implemented without any changes to Docker, but the single-machine case can be well served by some minor changes:

A) Allow containers to POST arbitrary key/value pairs to dockerd B) Allow containers to GET these key/value pairs from dockerd

This allows us to do the following:

Service Process Starting Locally

/export/DATABASE_URL:

#!/bin/bash

curl -X POST -d "postgresql://pgdeveloper:pgdeveloper@$CONTAINER_HOST/pgdeveloper" \
  http://$DOCKERD/environment/DATABASE_URL

Application Process Starting Locally

/environment/DATABASE_URL:

#!/bin/bash

curl http://$DOCKERD/environment/DATABASE_URL

These could easily be shortened with some syntactic sugar, but you get the idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment