This walkthrough shows how several open source projects integrate to implement a novel networking solution for docker containers within Mesos.
For a few diagrams of the networking provided here, see these slides.
Relevant repositories mentioned within this guide are:
https://github.com/iovisor/bcc.git https://github.com/iovisor/bcc-fuse.git https://github.com/drzaeus77/magnum.git https://github.com/drzaeus77/docker.git https://github.com/drzaeus77/libnetwork.git https://github.com/drzaeus77/docker-plugin.git https://github.com/drzaeus77/mesos.git https://github.com/drzaeus77/netlink.git https://github.com/drzaeus77/express-example.git
First step is to create a multinode devstack setup. I chose to create one with 1 controller and 2 compute nodes. I used neutron with ml2+linuxbridge+vxlan for the networking.
This happens to be a nested VM devstack, so the network subnets you may notice are numbered how libvirt uses them. Magnum shouldn't care, so you may reuse this or use your own as you wish.
Note however that a private magnum fork is required.
Controller localrc
HOST_IP=192.168.122.243
SERVICE_HOST=192.168.122.243
MYSQL_HOST=192.168.122.243
RABBIT_HOST=192.168.122.243
GLANCE_HOSTPORT=192.168.122.243:9292
MULTI_HOST=1
LOGFILE=/opt/stack/logs/stack.sh.log
ADMIN_PASSWORD=pg12345
MYSQL_PASSWORD=mysqlpass
RABBIT_PASSWORD=rabbitpass
SERVICE_PASSWORD=servicepass
SERVICE_TOKEN=servicetoken
disable_service n-net
disable_service n-cpu
enable_service q-svc
enable_service q-agt
enable_service q-dhcp
enable_service q-meta
enable_service q-l3
enable_service neutron
enable_plugin magnum https://github.com/drzaeus77/magnum.git
#Neutron
Q_USE_SECGROUP=False
Q_AGENT=linuxbridge
FLOATING_RANGE="192.168.124.1/24"
Q_FLOATING_ALLOCATION_POOL=start=192.168.124.3,end=192.168.124.253
PUBLIC_NETWORK_GATEWAY="192.168.124.1"
Q_L3_ENABLED=True
PUBLIC_INTERFACE=eth2
Q_USE_PROVIDERNET_FOR_PUBLIC=True
OVS_PHYSICAL_BRIDGE=
PUBLIC_BRIDGE=br-ex
ENABLE_TENANT_VLANS=True
PHYSICAL_NETWORK=public
Q_ML2_PLUGIN_EXT_DRIVERS=
LB_PHYSICAL_INTERFACE=eth2
Q_ML2_PLUGIN_MECHANISM_DRIVERS=linuxbridge,l2population
Q_ML2_PLUGIN_TYPE_DRIVERS=flat,vlan,vxlan
Q_ML2_TENANT_NETWORK_TYPE=vxlan
ENABLE_TENANT_TUNNELS=True
TUNNEL_ENDPOINT_IP=192.168.123.10
Compute localrc
HOST_IP=192.168.122.149
SERVICE_HOST=192.168.122.243
MYSQL_HOST=$SERVICE_HOST
RABBIT_HOST=$SERVICE_HOST
GLANCE_HOSTPORT=$SERVICE_HOST:9292
Q_HOST=$SERVICE_HOST
MULTI_HOST=1
LOGFILE=/opt/stack/logs/stack.sh.log
ADMIN_PASSWORD=pg12345
MYSQL_PASSWORD=mysqlpass
RABBIT_PASSWORD=rabbitpass
SERVICE_PASSWORD=servicepass
SERVICE_TOKEN=servicetoken
NOVA_VNC_ENABLED=True
NOVNCPROXY_URL="http://192.168.122.243:6080/vnc_auto.html"
VNCSERVER_LISTEN=$HOST_IP
VNCSERVER_PROXYCLIENT_ADDRESS=$VNCSERVER_LISTEN
ENABLED_SERVICES=n-cpu,neutron,q-agt
#Neutron
Q_USE_SECGROUP=False
FLOATING_RANGE="192.168.124.1/24"
Q_FLOATING_ALLOCATION_POOL=start=192.168.124.3,end=192.168.124.253
PUBLIC_NETWORK_GATEWAY="192.168.124.1"
Q_L3_ENABLED=True
PUBLIC_INTERFACE=eth0
Q_USE_PROVIDERNET_FOR_PUBLIC=True
OVS_PHYSICAL_BRIDGE=
PUBLIC_BRIDGE=br-ex
ENABLE_TENANT_VLANS=True
PHYSICAL_NETWORK=public
Q_ML2_PLUGIN_EXT_DRIVERS=
Q_PLUGIN=ml2
Q_AGENT=linuxbridge
LB_PHYSICAL_INTERFACE=
Q_ML2_PLUGIN_MECHANISM_DRIVERS=linuxbridge,l2population
Q_ML2_PLUGIN_TYPE_DRIVERS=flat,vlan,vxlan
Q_ML2_TENANT_NETWORK_TYPE=vxlan
ENABLE_TENANT_TUNNELS=True
TUNNEL_ENDPOINT_IP=192.168.123.11
This demo exercises several pieces of code from various open source repositories. The binaries are hosted for easy installation, but the steps to build them yourself are shown here.
If you choose to build from source, then the script in
/opt/stack/magnum/templates/heat-mesos/elements/mesos/post-install.d/40-update-docker
should be modified to point to a server where the custom binaries are hosted.
In order to customize the interfaces inside the docker network sandbox, the docker code is updated with two changes:
- Update to the latest version of the go netlink library with some additional features.
- Use new functionality in the go netlink library to add tc qdisc+filter to each interface to enable iovisor functionality.
git clone --branch release/v1.9 https://github.com/drzaeus77/docker.git
cd docker
make
# Created binary: bundles/1.9.0-rc4/binary/docker-1.9.0-rc4
This plugin implements the libnetwork interface supported by docker 1.9. Rather than the standard veth interface, it creates an ipvlan device and configures settings in iovisor to enable packet tagging based on IP address.
go get github.com/drzaeus77/docker-plugin/iovplug
go install github.com/drzaeus77/docker-plugin/util/iov-plug
go install github.com/drzaeus77/docker-plugin/util/iovisor-docker-plugin
In order to pass a custom network type to docker run (--net iovisor
) as well
as to support the new docker inspect network format in docker 1.9, some minor
changes to Mesos are required.
git clone https://github.com/drzaeus77/mesos
cd mesos
tag=`git rev-parse --short HEAD`
docker build -t mesos/mesos:git-$tag .
docker run --rm -v `pwd`:/mnt mesos/mesos:git-$tag tar -C / -zcf /mnt/mesos-$tag.tar.gz usr/local
Once devstack is ready, you will need to prepare the binary mesos image used in the magnum templates. These steps are adapted from [A Mesos cluster with Heat][heat-mesos] [heat-mesos]: http://docs.openstack.org/developer/magnum/dev/dev-heat-mesos.html)
ln -s /opt/stack/magnum
git clone https://git.openstack.org/openstack/diskimage-builder.git
git clone https://git.openstack.org/openstack/dib-utils.git
export PATH="${PWD}/dib-utils/bin:$PATH"
export ELEMENTS_PATH=magnum/magnum/templates/heat-mesos/elements
export DIB_RELEASE=trusty
diskimage-builder/bin/disk-image-create ubuntu vm docker mesos -o ubuntu-mesos.qcow2
glance image-create --name ubuntu-mesos --visibility public \
--disk-format qcow2 --container-format bare \
--os-distro ubuntu < ubuntu-mesos.qcow2
You will need to provide testkey using nova keypair-add
. In devstack, public
is the default external/floating network. If yours differs, choose that here.
magnum baymodel-create --name mesosbaymodel --image-id ubuntu-mesos \
--keypair-id testkey --external-network-id public --dns-nameserver 8.8.8.8 \
--flavor-id m1.small --coe mesos
magnum bay-create --name mesosbay --baymodel mesosbaymodel --node-count 2
Once the bay is created, you should be able to make calls to the REST API, as well as connect to the web dashboard using the master's IP, port 8080.
MASTER_IP=$(magnum bay-show mesosbay | awk '/ api_address /{print $4}')
Currently there is no automation or external API for defining the container policies. Here, we define several sets of rules with values that correspond to the tags of the containers that we will load into mesos.
- Group 100 can communicate with unclassified locations (external).
- Group 100 can communicate itself.
- Groups 100 and 200 can communicate.
Group 100 will later be assigned to the DNS and Web containers. Group 200 will be assigned to the DB container.
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 100 0 }"
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 0 100 }"
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 100 100 }"
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 200 100 }"
echo "{1 0 0}" > /run/bcc/foo/maps/grp2policy/"{ 100 200 }"
(This example uses http
from the python httpie package. Curl works fine too.)
First, create a local file with the description of your mesos application/group. See Marathon REST API for details on the json syntax.
cat > mesos-dns.json <<EOF
{
"id" : "system",
"apps":[
{
"id": "mesos-dns",
"instances": 2,
"cpus": 0.2,
"mem": 50,
"cmd": "/mesos-dns -config=/config.json -v=1",
"constraints": [["hostname", "UNIQUE"]],
"container": {
"type": "DOCKER",
"docker": {
"image": "mesosphere/mesos-dns:latest",
"network": "BRIDGE",
"parameters": [
{"key": "expose", "value": "100/0"}
]
},
"volumes": [
{
"containerPath": "/config.json",
"hostPath": "/etc/mesos-dns/config.js",
"mode": "RO"
},
{
"containerPath": "/mesos-dns",
"hostPath": "/usr/bin/mesos-dns",
"mode": "RO"
}
]
}
}
]
}
EOF
http POST $MASTER_IP:8080/v2/groups < mesos-dns.json
In the above app description, notice that the # of instances matches with the number of slaves in our mesos bay, along with the UNIQUE constraint. This ensures that each mesos slave has a local dns resolver. Other deployment models exist, this is just one example.
The DNS containers also all start with an expose
value of 100/0
. This magic
value will be passed to the iovisor-docker-plugin when it creates the ipvlan
device and will cause all traffic from this container to be tagged
correspondingly. The eBPF programs will enforce the policy for traffic to and
from group 100 on that device. Future work exists to come up with a cleaner way
to specify these tags rather than an overloaded expose port field.
Now tell marathon to launch our example 2-tier web application. This uses a node-express frontend with a postgres backend.
cat > express-example.json <<EOF
{
"id" : "baz",
"apps":[
{
"id": "web",
"dependencies": ["../postgres"],
"instances": 1,
"cpus": 0.2,
"mem": 50,
"env": {
"POSTGRES_PASSWORD": "postgrespass",
"POSTGRES_HOST": "postgres-baz.marathon.mesos"
},
"container": {
"type": "DOCKER",
"docker": {
"image": "drzaeus77/mesos-express-example",
"network": "BRIDGE",
"parameters": [
{ "key": "expose", "value": "100/0" },
{ "key": "dns", "value": "10.1.1.3"},
{ "key": "dns", "value": "10.1.0.3"}
]
}
}
},
{
"id": "postgres",
"instances": 1,
"cpus": 0.2,
"mem": 200,
"env": {
"POSTGRES_PASSWORD": "postgrespass"
},
"container": {
"type": "DOCKER",
"docker": {
"image": "infoslack/alpine-postgres",
"network": "BRIDGE",
"parameters": [
{ "key": "expose", "value": "200/0" },
{ "key": "dns", "value": "10.1.1.3"},
{ "key": "dns", "value": "10.1.0.3"}
],
"volumes": [
{
"containerPath": "/var/lib/postgresql/data",
"hostPath": "/var/lib/postgresql/data",
"mode": "RW"
}
]
}
}
}
]
}
EOF
http POST $MASTER_IP:8080/v2/groups < express-example.json
Here we launch the two apps in the group with different tags, 100 and 200. According to the loaded policy, only the 100 tagged containers will be able to reach the 200 tagged containers. Try reaching to the IP of the DB container from the mesos master, and see that it fails.