Skip to content

Instantly share code, notes, and snippets.

@spacejam
Last active August 29, 2015 14:26
Show Gist options
  • Save spacejam/7bdf77e09deb492b5a64 to your computer and use it in GitHub Desktop.
Save spacejam/7bdf77e09deb492b5a64 to your computer and use it in GitHub Desktop.
Mesos Cluster on Docker with Bidirectional Communication for use with Fault Injectors

Step 1: DNS

/etc/defaults/docker:

DOCKER_OPTS="--bip=172.17.42.1/24 --dns 172.17.42.1"

restart docker sysctl restart docker

start dns container

docker run -d -v /var/run/docker.sock:/var/run/docker.sock --name dnsdock -p 172.17.42.1:53:53/udp tonistiigi/dnsdock

Now, containers will be resolved to .. where image is a stripped down version. If your image looks like mesosphere/mesos-slave:0.22.1-1.0.ubuntu1404 and your container name is blah_thing_1, then this will be resolvable by querying blah_thing_1.mesos-slave.docker.

If you want to test, add this line to your host machine's /etc/resolv.conf to use it for the first attempt when looking up names:

nameserver 172.17.42.1

To test it out:

$ dig blah_thing_1.mesos-slave.docker
...
;; ANSWER SECTION:
blah_thing_1.mesos-slave.docker.        0   IN  A   172.17.42.24

Step 2: docker containers that don't shit the bed with AAAA record attempts

Ubuntu base images will try to query AAAA records, and when they don't resolve properly with dnsdock it will fail to connect. This results in infuriating symptoms like being able to ping correctly, but not being able to connect with actual programs.

Master:

FROM debian
RUN echo "deb http://repos.mesosphere.io/ubuntu/ trusty main" > /etc/apt/sources.list.d/mesosphere.list
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
RUN apt-get update
RUN apt-get -y install mesos
CMD mesos-master

Slave:

FROM debian
RUN echo "deb http://repos.mesosphere.io/ubuntu/ trusty main" > /etc/apt/sources.list.d/mesosphere.list
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
RUN apt-get update
RUN apt-get -y install mesos
CMD mesos-slave

Note that these are basically copy+pasted from mesosphere's docker template, but using debian instead of ubuntu for DNS correctness.

Step 3: Mesos containers that can talk to each other

Here's a docker-compose.yml file that I used with a framework to schedule an etcd cluster onto the mesos cluster. This performs periodic health checks and requires bidirectional communication between containers, achieved using the dnsdock resolver sitting at 172.17.42.1. docker-compose.yml:

zk:
  image: bobrik/zookeeper
  ports:
   - 2181:2181
   - 2888:2888
   - 3888:3888
  environment:
    ZK_CONFIG: tickTime=2000,initLimit=10,syncLimit=5,maxClientCnxns=128,forceSync=no,clientPort=2181
    ZK_ID: 1
  dns: 172.17.42.1

master:
  image: mesos-master
  ports:
   - 5050:5050
  links:
      - zk:zk
  dns: 172.17.42.1
  environment:
    MESOS_ZK: zk://zk:2181/mesos
    MESOS_LOG_DIR: /var/log/mesos
    MESOS_QUORUM: 1
    MESOS_WORK_DIR: /var/lib/mesos

etcd:
  image: etcd-mesos
  hostname: etcd
  ports:
   - 12300:12300
  links:
      - zk:zk
      - master:master
  dns: 172.17.42.1

slave0:
  hostname: etcd_slave0_1.mesos-slave.docker
  image: mesos-slave
  links:
      - zk:zk
      - master:master
      - etcd:etcd
  dns: 172.17.42.1
  ports:
    - 5051:5051
  environment:
    MESOS_HOSTNAME: etcd_slave0_1.mesos-slave.docker  
    MESOS_PORT: 5051
    MESOS_MASTER: zk://zk:2181/mesos
    MESOS_LOG_DIR: /var/log/mesos
    MESOS_LOGGING_LEVEL: INFO

slave1:
  hostname: etcd_slave1_1.mesos-slave.docker
  image: mesos-slave
  links:
      - zk:zk
      - master:master
      - etcd:etcd
  dns: 172.17.42.1
  ports:
    - 5052:5052
  environment:
    MESOS_HOSTNAME: etcd_slave1_1.mesos-slave.docker  
    MESOS_PORT: 5052
    MESOS_MASTER: zk://zk:2181/mesos
    MESOS_LOG_DIR: /var/log/mesos
    MESOS_LOGGING_LEVEL: INFO

slave2:
  hostname: etcd_slave2_1.mesos-slave.docker
  image: mesos-slave
  links:
      - zk:zk
      - master:master
      - etcd:etcd
  dns: 172.17.42.1
  ports:
    - 5053:5053
  environment:
    MESOS_HOSTNAME: etcd_slave2_1.mesos-slave.docker  
    MESOS_PORT: 5053
    MESOS_MASTER: zk://zk:2181/mesos
    MESOS_LOG_DIR: /var/log/mesos
    MESOS_LOGGING_LEVEL: INFO

slave3:
  hostname: etcd_slave3_1.mesos-slave.docker
  image: mesos-slave
  links:
      - zk:zk
      - master:master
      - etcd:etcd
  dns: 172.17.42.1
  ports:
    - 5054:5054
  environment:
    MESOS_HOSTNAME: etcd_slave3_1.mesos-slave.docker
    MESOS_PORT: 5054
    MESOS_MASTER: zk://zk:2181/mesos
    MESOS_LOG_DIR: /var/log/mesos
    MESOS_LOGGING_LEVEL: INFO

slave4:
  hostname: etcd_slave4_1.mesos-slave.docker
  image: mesos-slave
  links:
      - zk:zk
      - master:master
      - etcd:etcd
  dns: 172.17.42.1
  ports:
    - 5055:5055
  environment:
    MESOS_HOSTNAME: etcd_slave4_1.mesos-slave.docker
    MESOS_PORT: 5055
    MESOS_MASTER: zk://zk:2181/mesos
    MESOS_LOG_DIR: /var/log/mesos
    MESOS_LOGGING_LEVEL: INFO

Step 4: run fault injection!

You can now use a tool that basically performs docker pause, docker unpause, docker restart, iptables and tc for messing with your containers in a way that comes fairly close to simulating real problems encountered in data centers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment