Skip to content

Instantly share code, notes, and snippets.

@drzaeus77
Created November 4, 2015 07:11
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save drzaeus77/89aa3db154c688a15ee6 to your computer and use it in GitHub Desktop.
Save drzaeus77/89aa3db154c688a15ee6 to your computer and use it in GitHub Desktop.

Overview

This walkthrough shows how several open source projects integrate to implement a novel networking solution for docker containers within Mesos.

For a few diagrams of the networking provided here, see these slides.

Relevant repositories mentioned within this guide are:

https://github.com/iovisor/bcc.git https://github.com/iovisor/bcc-fuse.git https://github.com/drzaeus77/magnum.git https://github.com/drzaeus77/docker.git https://github.com/drzaeus77/libnetwork.git https://github.com/drzaeus77/docker-plugin.git https://github.com/drzaeus77/mesos.git https://github.com/drzaeus77/netlink.git https://github.com/drzaeus77/express-example.git

Create a devstack

First step is to create a multinode devstack setup. I chose to create one with 1 controller and 2 compute nodes. I used neutron with ml2+linuxbridge+vxlan for the networking.

This happens to be a nested VM devstack, so the network subnets you may notice are numbered how libvirt uses them. Magnum shouldn't care, so you may reuse this or use your own as you wish.

Note however that a private magnum fork is required.

Controller localrc

HOST_IP=192.168.122.243
SERVICE_HOST=192.168.122.243
MYSQL_HOST=192.168.122.243
RABBIT_HOST=192.168.122.243
GLANCE_HOSTPORT=192.168.122.243:9292
MULTI_HOST=1
LOGFILE=/opt/stack/logs/stack.sh.log
ADMIN_PASSWORD=pg12345
MYSQL_PASSWORD=mysqlpass
RABBIT_PASSWORD=rabbitpass
SERVICE_PASSWORD=servicepass
SERVICE_TOKEN=servicetoken

disable_service n-net
disable_service n-cpu
enable_service q-svc
enable_service q-agt
enable_service q-dhcp
enable_service q-meta
enable_service q-l3
enable_service neutron

enable_plugin magnum https://github.com/drzaeus77/magnum.git

#Neutron
Q_USE_SECGROUP=False
Q_AGENT=linuxbridge
FLOATING_RANGE="192.168.124.1/24"
Q_FLOATING_ALLOCATION_POOL=start=192.168.124.3,end=192.168.124.253
PUBLIC_NETWORK_GATEWAY="192.168.124.1"
Q_L3_ENABLED=True
PUBLIC_INTERFACE=eth2
Q_USE_PROVIDERNET_FOR_PUBLIC=True
OVS_PHYSICAL_BRIDGE=
PUBLIC_BRIDGE=br-ex
ENABLE_TENANT_VLANS=True
PHYSICAL_NETWORK=public
Q_ML2_PLUGIN_EXT_DRIVERS=

LB_PHYSICAL_INTERFACE=eth2
Q_ML2_PLUGIN_MECHANISM_DRIVERS=linuxbridge,l2population
Q_ML2_PLUGIN_TYPE_DRIVERS=flat,vlan,vxlan
Q_ML2_TENANT_NETWORK_TYPE=vxlan
ENABLE_TENANT_TUNNELS=True
TUNNEL_ENDPOINT_IP=192.168.123.10

Compute localrc

HOST_IP=192.168.122.149
SERVICE_HOST=192.168.122.243
MYSQL_HOST=$SERVICE_HOST
RABBIT_HOST=$SERVICE_HOST
GLANCE_HOSTPORT=$SERVICE_HOST:9292
Q_HOST=$SERVICE_HOST
MULTI_HOST=1
LOGFILE=/opt/stack/logs/stack.sh.log
ADMIN_PASSWORD=pg12345
MYSQL_PASSWORD=mysqlpass
RABBIT_PASSWORD=rabbitpass
SERVICE_PASSWORD=servicepass
SERVICE_TOKEN=servicetoken
NOVA_VNC_ENABLED=True
NOVNCPROXY_URL="http://192.168.122.243:6080/vnc_auto.html"
VNCSERVER_LISTEN=$HOST_IP
VNCSERVER_PROXYCLIENT_ADDRESS=$VNCSERVER_LISTEN
ENABLED_SERVICES=n-cpu,neutron,q-agt

#Neutron
Q_USE_SECGROUP=False
FLOATING_RANGE="192.168.124.1/24"
Q_FLOATING_ALLOCATION_POOL=start=192.168.124.3,end=192.168.124.253
PUBLIC_NETWORK_GATEWAY="192.168.124.1"
Q_L3_ENABLED=True
PUBLIC_INTERFACE=eth0
Q_USE_PROVIDERNET_FOR_PUBLIC=True
OVS_PHYSICAL_BRIDGE=
PUBLIC_BRIDGE=br-ex
ENABLE_TENANT_VLANS=True
PHYSICAL_NETWORK=public
Q_ML2_PLUGIN_EXT_DRIVERS=
Q_PLUGIN=ml2
Q_AGENT=linuxbridge

LB_PHYSICAL_INTERFACE=
Q_ML2_PLUGIN_MECHANISM_DRIVERS=linuxbridge,l2population
Q_ML2_PLUGIN_TYPE_DRIVERS=flat,vlan,vxlan
Q_ML2_TENANT_NETWORK_TYPE=vxlan
ENABLE_TENANT_TUNNELS=True
TUNNEL_ENDPOINT_IP=192.168.123.11

[optional] Prepare dependencies

This demo exercises several pieces of code from various open source repositories. The binaries are hosted for easy installation, but the steps to build them yourself are shown here.

If you choose to build from source, then the script in /opt/stack/magnum/templates/heat-mesos/elements/mesos/post-install.d/40-update-docker should be modified to point to a server where the custom binaries are hosted.

Docker

In order to customize the interfaces inside the docker network sandbox, the docker code is updated with two changes:

  1. Update to the latest version of the go netlink library with some additional features.
  2. Use new functionality in the go netlink library to add tc qdisc+filter to each interface to enable iovisor functionality.
git clone --branch release/v1.9 https://github.com/drzaeus77/docker.git
cd docker
make
# Created binary: bundles/1.9.0-rc4/binary/docker-1.9.0-rc4

iovisor-docker-plugin

This plugin implements the libnetwork interface supported by docker 1.9. Rather than the standard veth interface, it creates an ipvlan device and configures settings in iovisor to enable packet tagging based on IP address.

go get github.com/drzaeus77/docker-plugin/iovplug
go install github.com/drzaeus77/docker-plugin/util/iov-plug
go install github.com/drzaeus77/docker-plugin/util/iovisor-docker-plugin

Mesos

In order to pass a custom network type to docker run (--net iovisor) as well as to support the new docker inspect network format in docker 1.9, some minor changes to Mesos are required.

git clone https://github.com/drzaeus77/mesos
cd mesos
tag=`git rev-parse --short HEAD`
docker build -t mesos/mesos:git-$tag .
docker run --rm -v `pwd`:/mnt mesos/mesos:git-$tag tar -C / -zcf /mnt/mesos-$tag.tar.gz usr/local

Prepare the Mesos image

Once devstack is ready, you will need to prepare the binary mesos image used in the magnum templates. These steps are adapted from [A Mesos cluster with Heat][heat-mesos] [heat-mesos]: http://docs.openstack.org/developer/magnum/dev/dev-heat-mesos.html)

ln -s /opt/stack/magnum
git clone https://git.openstack.org/openstack/diskimage-builder.git
git clone https://git.openstack.org/openstack/dib-utils.git
export PATH="${PWD}/dib-utils/bin:$PATH"
export ELEMENTS_PATH=magnum/magnum/templates/heat-mesos/elements
export DIB_RELEASE=trusty
diskimage-builder/bin/disk-image-create ubuntu vm docker mesos -o ubuntu-mesos.qcow2

Load the Mesos image into glance

glance image-create --name ubuntu-mesos --visibility public \
  --disk-format qcow2 --container-format bare \
  --os-distro ubuntu < ubuntu-mesos.qcow2

Prepare the Mesos bay and slaves

Create a Mesos baymodel and bay

You will need to provide testkey using nova keypair-add. In devstack, public is the default external/floating network. If yours differs, choose that here.

magnum baymodel-create --name mesosbaymodel --image-id ubuntu-mesos \
  --keypair-id testkey --external-network-id public --dns-nameserver 8.8.8.8 \
  --flavor-id m1.small --coe mesos
magnum bay-create --name mesosbay --baymodel mesosbaymodel --node-count 2

Once the bay is created, you should be able to make calls to the REST API, as well as connect to the web dashboard using the master's IP, port 8080.

MASTER_IP=$(magnum bay-show mesosbay | awk '/ api_address /{print $4}')

Load the rules into each slave

Currently there is no automation or external API for defining the container policies. Here, we define several sets of rules with values that correspond to the tags of the containers that we will load into mesos.

  1. Group 100 can communicate with unclassified locations (external).
  2. Group 100 can communicate itself.
  3. Groups 100 and 200 can communicate.

Group 100 will later be assigned to the DNS and Web containers. Group 200 will be assigned to the DB container.

echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 100 0 }" 
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 0 100 }"
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 100 100 }"
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 200 100 }"
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 100 200 }"

Start DNS on each slave

(This example uses http from the python httpie package. Curl works fine too.)

First, create a local file with the description of your mesos application/group. See Marathon REST API for details on the json syntax.

cat > mesos-dns.json <<EOF
{
  "id" : "system",
  "apps":[
    {
      "id": "mesos-dns",
      "instances": 2,
      "cpus": 0.2,
      "mem": 50,
      "cmd": "/mesos-dns -config=/config.json -v=1",
      "constraints": [["hostname", "UNIQUE"]],
      "container": {
        "type": "DOCKER",
        "docker": {
          "image": "mesosphere/mesos-dns:latest",
          "network": "BRIDGE",
          "parameters": [
            {"key": "expose", "value": "100/0"}
          ]
        },
        "volumes": [
          {
            "containerPath": "/config.json",
            "hostPath": "/etc/mesos-dns/config.js",
            "mode": "RO"
          },
          {
            "containerPath": "/mesos-dns",
            "hostPath": "/usr/bin/mesos-dns",
            "mode": "RO"
          }
        ]  
      }
    }
  ]
}
EOF
http POST $MASTER_IP:8080/v2/groups < mesos-dns.json

In the above app description, notice that the # of instances matches with the number of slaves in our mesos bay, along with the UNIQUE constraint. This ensures that each mesos slave has a local dns resolver. Other deployment models exist, this is just one example.

The DNS containers also all start with an expose value of 100/0. This magic value will be passed to the iovisor-docker-plugin when it creates the ipvlan device and will cause all traffic from this container to be tagged correspondingly. The eBPF programs will enforce the policy for traffic to and from group 100 on that device. Future work exists to come up with a cleaner way to specify these tags rather than an overloaded expose port field.

Start your first app

Now tell marathon to launch our example 2-tier web application. This uses a node-express frontend with a postgres backend.

cat > express-example.json <<EOF
{
  "id" : "baz",
  "apps":[
    {
      "id": "web",
      "dependencies": ["../postgres"],
      "instances": 1,
      "cpus": 0.2,
      "mem": 50,
      "env": {
        "POSTGRES_PASSWORD": "postgrespass",
        "POSTGRES_HOST": "postgres-baz.marathon.mesos"
      },
      "container": {
        "type": "DOCKER",
        "docker": {
          "image": "drzaeus77/mesos-express-example",
          "network": "BRIDGE",
          "parameters": [
            { "key": "expose", "value": "100/0" },
            { "key": "dns", "value": "10.1.1.3"},
            { "key": "dns", "value": "10.1.0.3"}
          ]
        }
      }
    },
    {
      "id": "postgres",
      "instances": 1,
      "cpus": 0.2,
      "mem": 200,
      "env": {
        "POSTGRES_PASSWORD": "postgrespass"
      },
      "container": {
        "type": "DOCKER",
        "docker": {
          "image": "infoslack/alpine-postgres",
          "network": "BRIDGE",
          "parameters": [
            { "key": "expose", "value": "200/0" },
            { "key": "dns", "value": "10.1.1.3"},
            { "key": "dns", "value": "10.1.0.3"}
          ],
          "volumes": [
            {
              "containerPath": "/var/lib/postgresql/data",
              "hostPath": "/var/lib/postgresql/data",
              "mode": "RW"
            }
          ]
        }
      }
    }
  ]
}
EOF
http POST $MASTER_IP:8080/v2/groups < express-example.json

Here we launch the two apps in the group with different tags, 100 and 200. According to the loaded policy, only the 100 tagged containers will be able to reach the 200 tagged containers. Try reaching to the IP of the DB container from the mesos master, and see that it fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment