Skip to content

Instantly share code, notes, and snippets.

@psychemedia
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save psychemedia/67a1c27ae1b0f0cee7ef to your computer and use it in GitHub Desktop.
Save psychemedia/67a1c27ae1b0f0cee7ef to your computer and use it in GitHub Desktop.
Example of running mongo replica set in containers via docker and then introducing network partitions
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:4356f27bda95676c5a0abfdc46ba76fa677f4963491713926457041f6aed95ab"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Dockering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook demonstrates how to set up a variety of MongoDB configurations using Linux containers and then experiment with failure modes.\n",
"\n",
"Containers are like virtual machines within a virtual machine. Each MongoDB instance will run inside it's own container, isolated from other MongoDB instances and connected to them via a TCP/IP connection.\n",
"\n",
"To make the setting up containerised MongoDB instances easier, we'll use an application called *Docker*, which is widely used for supporting virtualisation in cloud hosted systems."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Install the docker service - this will be preinstalled in final VM\n",
"!sudo apt-get install docker.io\n",
"#Install the python docker wrapper - this will be preinstalled in final VM\n",
"!pip3 install docker-py"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need to create some docker images - one to run the mongo database server, another (for sharded databases) to act as a configuration server."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We're going to use the recipe described by https://sebastianvoss.com/docker-mongodb-sharded-cluster.html\n",
"#You should only run this cell once - once you have run it, change the cell from a Code cell to a Raw NBConvert cell\n",
"docker_config_mongod='''FROM ubuntu:latest\n",
"\n",
"# Add 10gen official apt source to the sources list\n",
"RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10\n",
"RUN echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' | tee /etc/apt/sources.list.d/10gen.list\n",
"\n",
"# Install MongoDB\n",
"RUN apt-get update\n",
"RUN apt-get install mongodb-10gen\n",
"\n",
"# Create the MongoDB data directory\n",
"RUN mkdir -p /data/db\n",
"\n",
"EXPOSE 27017\n",
"ENTRYPOINT [\"usr/bin/mongod\"]'''\n",
"\n",
"docker_config_mongos='''FROM dev24/mongodb:latest\n",
"\n",
"EXPOSE 27017\n",
"ENTRYPOINT [\"usr/bin/mongos\"]'''\n",
"\n",
"!mkdir docker_mongodb_cluster\n",
"\n",
"!mkdir docker_mongodb_cluster/mongod\n",
"with open('docker_mongodb_cluster/mongod/Dockerfile','w') as f:\n",
" f.write(docker_config_mongod)\n",
" \n",
"!mkdir docker_mongodb_cluster/mongos\n",
"with open('docker_mongodb_cluster/mongos/Dockerfile','w') as f:\n",
" f.write(docker_config_mongos)\n",
"\n",
" \n",
"#Create the docker images\n",
"%cd docker_mongodb_cluster\n",
"!sudo docker.io build -t dev24/mongodb mongod\n",
"!sudo docker.io build -t dev24/mongos mongos\n",
"%cd /vagrant/notebooks"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import docker"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Connect to docker\n",
"c = docker.Client(base_url='unix://var/run/docker.sock',\n",
" version='1.10',\n",
" timeout=10)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We can run a mongo database server in a container using the following command:\n",
"#\n",
"##sudo docker run \\\n",
"## -P -name rs1_srv1 \\\n",
"## -d dev24/mongodb \\\n",
"## --replSet rs1 \\\n",
"## --noprealloc --smallfiles\n",
"#\n",
"#The -P flag is a docker flag, the rest are passed through the mongodb server)\n",
"#That is, the following command is run to start each mongo server using a configuration to keep it small for now...:\n",
"# usr/bin/mongod --replSet REPLICA_SET_NAME --noprealloc --smallfiles\n",
"#Add a -v flag to command to specify verbose stdio logging (increase the number of v's for more... eg -vv or -vvvv)\n",
"\n",
"def createReplicaSetNode(c,stub,num=0):\n",
" ''' Create and run a specified number of mongo database servers as a replica set '''\n",
" name='{stub}_srv{num}'.format(stub=stub,num=num)\n",
" command='--replSet {stub} --noprealloc --smallfiles'.format(stub=stub)\n",
" c.create_container('dev24/mongodb',name=name,command=command)\n",
" c.start(name,publish_all_ports=True)\n",
" return name"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 229
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"createReplicaSetNode(c,'test')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 105,
"text": [
"'test_srv0'"
]
}
],
"prompt_number": 105
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Some helper functions via https://github.com/docker/docker-py\n",
"\n",
"#Equivalent of docker ps\n",
"def docker_ps(c):\n",
" return c.containers(quiet=False, all=False, trunc=True, latest=False, since=None, before=None, limit=-1)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 106
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"docker_ps(c)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 107,
"text": [
"[{'Id': 'be65be35905b694f116a1e0f7e53a4f161a82d2c4ed08d8832b96dcb8c3fb628',\n",
" 'Ports': [{'PublicPort': 49206,\n",
" 'PrivatePort': 27017,\n",
" 'Type': 'tcp',\n",
" 'IP': '0.0.0.0'}],\n",
" 'Status': 'Up 8 seconds',\n",
" 'Image': 'dev24/mongodb:latest',\n",
" 'Command': 'usr/bin/mongod --replSet test --noprealloc --smallfiles',\n",
" 'Created': 1413577005,\n",
" 'Names': ['/test_srv0']}]"
]
}
],
"prompt_number": 107
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are three main ways for addressing each of the containers and the mongo server running within them.\n",
"\n",
"Firstly, we can call the container by name. Typically, in the naming scheme I'm going to use this will take the form *rsN_srvMM*, where *rsN* is the name of the replica set and *srvMM* the server within the set.\n",
"\n",
"Secondly, we can look to a local port to connect to the mongo database server running within a particular container and then connect to that port number."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!docker.io ps | head -n 3"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\r\n",
"01cf0f7845f4 dev24/mongodb:latest usr/bin/mongod --rep About a minute ago Up About a minute 0.0.0.0:49215->27017/tcp rs4_srv2 \r\n",
"8b436ad8d083 dev24/mongodb:latest usr/bin/mongod --rep About a minute ago Up About a minute 0.0.0.0:49214->27017/tcp rs4_srv1 \r\n"
]
}
],
"prompt_number": 189
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Find the local port bound for 27017/tcp for each server in the replica set\n",
"def get27017tcp_port(c,container):\n",
" cConfig = c.inspect_container(container)\n",
" return int(cConfig['NetworkSettings']['Ports']['27017/tcp'][0]['HostPort'])\n",
"\n",
"def get27017tcp_ports(c,containers):\n",
" ports={}\n",
" for container in containers:\n",
" ports[container]= get27017tcp_port(c,container)\n",
" return ports"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 109
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"get27017tcp_ports(c,['test_srv0'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 111,
"text": [
"{'test_srv0': 49206}"
]
}
],
"prompt_number": 111
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Thirdly, we can connect to the default port 27017 on the IP address associated with the container:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def getContainIPaddress(c,container):\n",
" cConfig = c.inspect_container(container)\n",
" return cConfig['NetworkSettings']['IPAddress']\n",
"\n",
"def getContainIPaddresses(c,containers):\n",
" ipaddresses={}\n",
" for container in containers:\n",
" ipaddresses[container]= getContainIPaddress(c,container)\n",
" return ipaddresses"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"getContainIPaddress(c,'test_srv0')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 112,
"text": [
"'172.17.0.2'"
]
}
],
"prompt_number": 112
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def showContainers(c):\n",
" for xc in c.containers(quiet=False, all=False, trunc=True, latest=False, since=None,\n",
" before=None, limit=-1):\n",
" print(xc['Names'],xc['Status'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 92
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"showContainers(c)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 204
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Helper routines for shutting down and removing containers\n",
"def tidyAwayContainer(c,container):\n",
" container=container.strip('/')\n",
" c.stop(container)\n",
" c.remove_container(container)\n",
" \n",
"def tidyAwayContainers(c,containers):\n",
" for container in containers:\n",
" tidyAwayContainer(c,container)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tidyAwayContainer(c,'test_srv0')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 114
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a real replica set we want multiple servers running. Let's create a function to create several nodes in the same replica set."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Let's create a function that will create several nodes\n",
"def createReplicaSetNodes(c,stub,numNodes):\n",
" ''' Create and run a specified number of mongo database servers as a replica set '''\n",
" names=[]\n",
" for i in range(0,numNodes):\n",
" name=createReplicaSetNode(c,stub,i)\n",
" names.append(name)\n",
" return names"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 117
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can initialise the replica set with a particular configuration defined using an object that has the following form:\n",
"\n",
"```\n",
"rs_conf={\"_id\" : \"rs1\",\n",
" \"members\" : [{\"_id\" : 1, \"host\" : \"localhost:10001\"},\n",
" {\"_id\" : 2, \"host\" : \"localhost:10002\"},\n",
" {\"_id\" : 3, \"host\" : \"localhost:10003\"} ]}}\n",
"```"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def rs_config(c,rsid,num=3):\n",
" ''' Create a replica set of nodes and then define a configuation file for that replica set '''\n",
" createReplicaSetNodes(c,rsid,num)\n",
" _rs_config={\"_id\" : rsid, 'members':[] }\n",
" #This is scrappy - should really return something better from the creation\n",
" for i in range(0,num):\n",
" name='{stub}_srv{num}'.format(stub=rsid,num=i)\n",
" #c.inspect_container(name)\n",
" #get IP and port\n",
" _rs_config['members'].append({\"_id\":i,\"host\":'{0}:{1}'.format(getContainIPaddress(c,name),27017)})\n",
" return _rs_config"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 115
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tidyAwayContainers(c,['rs4_srv0','rs4_srv1','rs4_srv2'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 230
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"rsc=rs_config(c,'rs4')\n",
"rsc"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 235,
"text": [
"{'_id': 'rs4',\n",
" 'members': [{'_id': 0, 'host': '172.17.0.2:27017'},\n",
" {'_id': 1, 'host': '172.17.0.3:27017'},\n",
" {'_id': 2, 'host': '172.17.0.4:27017'}]}"
]
}
],
"prompt_number": 235
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Initialise the replica set\n",
"from pymongo import MongoClient\n",
"\n",
"#We'll use the 0th server in the set as a the node\n",
"mc = MongoClient('localhost', get27017tcp_port(c,'rs4_srv0'))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 236
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#In the mongo console, we would typically use the command rs.config() to initial the replica set\n",
"#Here, we use the replSetInitiate admin command, applying it with the desired configuration\n",
"mc.admin.command( \"replSetInitiate\",rsc);"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 237
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We may need to wait a minute or two for the configuration to come up\n",
"#If you get an error message that suggests the configuration is up yet, wait a few seconds then rerun the cell\n",
"mc.admin.command('replSetGetStatus')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 240,
"text": [
"{'set': 'rs4',\n",
" 'myState': 1,\n",
" 'members': [{'health': 1.0,\n",
" 'stateStr': 'PRIMARY',\n",
" 'optime': Timestamp(1413580563, 1),\n",
" 'self': True,\n",
" 'uptime': 57,\n",
" '_id': 0,\n",
" 'optimeDate': datetime.datetime(2014, 10, 17, 21, 16, 3),\n",
" 'name': '172.17.0.2:27017',\n",
" 'state': 1},\n",
" {'stateStr': 'SECONDARY',\n",
" 'pingMs': 0,\n",
" '_id': 1,\n",
" 'health': 1.0,\n",
" 'name': '172.17.0.3:27017',\n",
" 'syncingTo': '172.17.0.2:27017',\n",
" 'uptime': 37,\n",
" 'optime': Timestamp(1413580563, 1),\n",
" 'lastHeartbeat': datetime.datetime(2014, 10, 17, 21, 16, 43),\n",
" 'optimeDate': datetime.datetime(2014, 10, 17, 21, 16, 3),\n",
" 'state': 2,\n",
" 'lastHeartbeatRecv': datetime.datetime(2014, 10, 17, 21, 16, 43)},\n",
" {'stateStr': 'SECONDARY',\n",
" 'pingMs': 0,\n",
" '_id': 2,\n",
" 'health': 1.0,\n",
" 'name': '172.17.0.4:27017',\n",
" 'syncingTo': '172.17.0.2:27017',\n",
" 'uptime': 37,\n",
" 'optime': Timestamp(1413580563, 1),\n",
" 'lastHeartbeat': datetime.datetime(2014, 10, 17, 21, 16, 43),\n",
" 'optimeDate': datetime.datetime(2014, 10, 17, 21, 16, 3),\n",
" 'state': 2,\n",
" 'lastHeartbeatRecv': datetime.datetime(2014, 10, 17, 21, 16, 43)}],\n",
" 'date': datetime.datetime(2014, 10, 17, 21, 16, 44),\n",
" 'ok': 1.0}"
]
}
],
"prompt_number": 240
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from pymongo import MongoReplicaSetClient\n",
"\n",
"testclient= MongoReplicaSetClient('{0}:{1}'.format(getContainIPaddress(c,'rs4_srv0'),27017), replicaSet='rs4')\n",
"testdb=testclient.testdb\n",
"testcollection=testdb.testcollection"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 241
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"testcollection.insert({'name':'test1'})"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 242,
"text": [
"ObjectId('5441874630e3dd0a4a4ee018')"
]
}
],
"prompt_number": 242
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for x in range(0,10):\n",
" testcollection.insert({'name':'test'+str(x)})"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 266
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for ff in testcollection.find():\n",
" print(ff)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"{'_id': ObjectId('5441874630e3dd0a4a4ee018'), 'name': 'test1'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee019'), 'name': 'test0'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee01a'), 'name': 'test1'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee01b'), 'name': 'test2'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee01c'), 'name': 'test3'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee01d'), 'name': 'test4'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee01e'), 'name': 'test5'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee01f'), 'name': 'test6'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee020'), 'name': 'test7'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee021'), 'name': 'test8'}\n",
"{'_id': ObjectId('5441874830e3dd0a4a4ee022'), 'name': 'test9'}\n"
]
}
],
"prompt_number": 244
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can model network failures by setting up firewall rules to block messages being passed between particular containers."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"##https://github.com/dcm-oss/blockade/\n",
"\n",
"#\n",
"# Copyright (C) 2014 Dell, Inc.\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# http://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License.\n",
"#\n",
"\n",
"import random\n",
"import string\n",
"import subprocess\n",
"\n",
"import collections\n",
"\n",
"\n",
"#---errors.py\n",
"class BlockadeError(Exception):\n",
" \"\"\"Expected error within Blockade\n",
" \"\"\"\n",
"\n",
"\n",
"class BlockadeConfigError(BlockadeError):\n",
" \"\"\"Error in configuration\n",
" \"\"\"\n",
"\n",
"\n",
"class AlreadyInitializedError(BlockadeError):\n",
" \"\"\"Blockade already created in this context\n",
" \"\"\"\n",
"\n",
"\n",
"class NotInitializedError(BlockadeError):\n",
" \"\"\"Blockade not created in this context\n",
" \"\"\"\n",
"\n",
"\n",
"class InconsistentStateError(BlockadeError):\n",
" \"\"\"Blockade state is inconsistent (partially created or destroyed)\n",
" \"\"\"\n",
" \n",
"#---\n",
"\n",
"\n",
"\n",
"def parse_partition_index(blockade_id, chain):\n",
" prefix = \"%s-p\" % (blockade_id,)\n",
" if chain and chain.startswith(prefix):\n",
" try:\n",
" return int(chain[len(prefix):])\n",
" except ValueError:\n",
" pass\n",
" raise ValueError(\"chain %s is not a blockade partition\" % (chain,))\n",
"\n",
"\n",
"def partition_chain_name(blockade_id, partition_index):\n",
" return \"%s-p%s\" % (blockade_id, partition_index)\n",
"\n",
"\n",
"def iptables_call_output(*args):\n",
" cmd = [\"iptables\", \"-n\"] + list(args)\n",
" try:\n",
" output = subprocess.check_output(cmd)\n",
" return output.decode().split(\"\\n\")\n",
" except subprocess.CalledProcessError:\n",
" raise BlockadeError(\"Problem calling '%s'\" % \" \".join(cmd))\n",
"\n",
"\n",
"def iptables_call(*args):\n",
" cmd = [\"iptables\"] + list(args)\n",
" try:\n",
" subprocess.check_call(cmd)\n",
" except subprocess.CalledProcessError:\n",
" raise BlockadeError(\"Problem calling '%s'\" % \" \".join(cmd))\n",
"\n",
"\n",
"def iptables_get_chain_rules(chain):\n",
" if not chain:\n",
" raise ValueError(\"invalid chain\")\n",
" lines = iptables_call_output(\"-L\", chain)\n",
" if len(lines) < 2:\n",
" raise BlockadeError(\"Can't understand iptables output: \\n%s\" %\n",
" \"\\n\".join(lines))\n",
"\n",
" chain_line, header_line = lines[:2]\n",
" if not (chain_line.startswith(\"Chain \" + chain) and\n",
" header_line.startswith(\"target\")):\n",
" raise BlockadeError(\"Can't understand iptables output: \\n%s\" %\n",
" \"\\n\".join(lines))\n",
" return lines[2:]\n",
"\n",
"\n",
"def iptables_get_source_chains(blockade_id):\n",
" \"\"\"Get a map of blockade chains IDs -> list of IPs targeted at them\n",
"\n",
" For figuring out which container is in which partition\n",
" \"\"\"\n",
" result = {}\n",
" if not blockade_id:\n",
" raise ValueError(\"invalid blockade_id\")\n",
" lines = iptables_get_chain_rules(\"FORWARD\")\n",
"\n",
" for line in lines:\n",
" parts = line.split()\n",
" if len(parts) < 4:\n",
" continue\n",
" try:\n",
" partition_index = parse_partition_index(blockade_id, parts[0])\n",
" except ValueError:\n",
" continue # not a rule targetting a blockade chain\n",
"\n",
" source = parts[3]\n",
" if source:\n",
" result[source] = partition_index\n",
" return result\n",
"\n",
"\n",
"def iptables_delete_rules(chain, predicate):\n",
" if not chain:\n",
" raise ValueError(\"invalid chain\")\n",
" if not isinstance(predicate, collections.Callable):\n",
" raise ValueError(\"invalid predicate\")\n",
"\n",
" lines = iptables_get_chain_rules(chain)\n",
"\n",
" # TODO this is susceptible to check-then-act races.\n",
" # better to ultimately switch to python-iptables if it becomes less buggy\n",
" for index, line in reversed(list(enumerate(lines, 1))):\n",
" line = line.strip()\n",
" if line and predicate(line):\n",
" iptables_call(\"-D\", chain, str(index))\n",
"\n",
"\n",
"def iptables_delete_blockade_rules(blockade_id):\n",
" def predicate(rule):\n",
" target = rule.split()[0]\n",
" try:\n",
" parse_partition_index(blockade_id, target)\n",
" except ValueError:\n",
" return False\n",
" return True\n",
" iptables_delete_rules(\"FORWARD\", predicate)\n",
"\n",
"\n",
"def iptables_delete_blockade_chains(blockade_id):\n",
" if not blockade_id:\n",
" raise ValueError(\"invalid blockade_id\")\n",
"\n",
" lines = iptables_call_output(\"-L\")\n",
" for line in lines:\n",
" parts = line.split()\n",
" if len(parts) >= 2 and parts[0] == \"Chain\":\n",
" chain = parts[1]\n",
" try:\n",
" parse_partition_index(blockade_id, chain)\n",
" except ValueError:\n",
" continue\n",
" # if we are a valid blockade chain, flush and delete\n",
" iptables_call(\"-F\", chain)\n",
" iptables_call(\"-X\", chain)\n",
"\n",
"\n",
"def iptables_insert_rule(chain, src=None, dest=None, target=None):\n",
" \"\"\"Insert a new rule in the chain\n",
" \"\"\"\n",
" if not chain:\n",
" raise ValueError(\"Invalid chain\")\n",
" if not target:\n",
" raise ValueError(\"Invalid target\")\n",
" if not (src or dest):\n",
" raise ValueError(\"Need src, dest, or both\")\n",
"\n",
" args = [\"-I\", chain]\n",
" if src:\n",
" args += [\"-s\", src]\n",
" if dest:\n",
" args += [\"-d\", dest]\n",
" args += [\"-j\", target]\n",
" iptables_call(*args)\n",
"\n",
"\n",
"def iptables_create_chain(chain):\n",
" \"\"\"Create a new chain\n",
" \"\"\"\n",
" if not chain:\n",
" raise ValueError(\"Invalid chain\")\n",
" iptables_call(\"-N\", chain)\n",
"\n",
"\n",
"def clear_iptables(blockade_id):\n",
" \"\"\"Remove all iptables rules and chains related to this blockade\n",
" \"\"\"\n",
" # first remove refererences to our custom chains\n",
" iptables_delete_blockade_rules(blockade_id)\n",
"\n",
" # then remove the chains themselves\n",
" iptables_delete_blockade_chains(blockade_id)\n",
"\n",
"\n",
"def partition_containers(blockade_id, partitions):\n",
" if not partitions or len(partitions) == 1:\n",
" return\n",
" for index, partition in enumerate(partitions, 1):\n",
" chain_name = partition_chain_name(blockade_id, index)\n",
"\n",
" # create chain for partition and block traffic TO any other partition\n",
" iptables_create_chain(chain_name)\n",
" for other in partitions:\n",
" if partition is other:\n",
" continue\n",
" for container in other:\n",
" if container.ip_address:\n",
" iptables_insert_rule(chain_name, dest=container.ip_address,\n",
" target=\"DROP\")\n",
"\n",
" # direct traffic FROM any container in the partition to the new chain\n",
" for container in partition:\n",
" iptables_insert_rule(\"FORWARD\", src=container.ip_address,\n",
" target=chain_name)\n",
"\n",
"\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 222
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"class netobj():\n",
" def __init__(self, ip_address):\n",
" self.ip_address = ip_address"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 223
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We assign the IP addresses of the containers into different partitions."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#In this case, let's be cruel and put the primary in a partition on its own\n",
"partition_containers('test1w2s2', [ [netobj('172.17.0.2')],[netobj('172.17.0.3'),netobj('172.17.0.4')]])\n",
"#Wait a bit before generating the log..."
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 267
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#In an ssh shell, we can use the follow sort of command to look at a real time stream of stdio log messages from the container\n",
"#!docker.io logs --follow=true rs4_srv1\n",
"#testcollection.insert({'name':'test3'})\n",
"!docker.io logs rs4_srv0 > rs4_srv0_log.txt\n",
"!docker.io logs rs4_srv1 > rs4_srv1_log.txt\n",
"!docker.io logs rs4_srv2 > rs4_srv2_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 249
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail -n 30 rs4_srv0_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:16:54.782 [FileAllocator] done allocating datafile /data/db/testdb.ns, size: 16MB, took 0.029 secs\r\n",
"Fri Oct 17 21:16:54.801 [FileAllocator] allocating new datafile /data/db/testdb.0, filling with zeroes...\r\n",
"Fri Oct 17 21:16:54.836 [FileAllocator] done allocating datafile /data/db/testdb.0, size: 16MB, took 0.034 secs\r\n",
"Fri Oct 17 21:16:54.837 [conn16] build index testdb.testcollection { _id: 1 }\r\n",
"Fri Oct 17 21:16:54.837 [conn16] build index done. scanned 0 total records. 0 secs\r\n",
"Fri Oct 17 21:17:23.781 [conn17] end connection 172.17.0.3:44992 (8 connections now open)\r\n",
"Fri Oct 17 21:17:23.782 [initandlisten] connection accepted from 172.17.0.3:45399 #19 (9 connections now open)\r\n",
"Fri Oct 17 21:17:23.808 [conn18] end connection 172.17.0.4:56665 (8 connections now open)\r\n",
"Fri Oct 17 21:17:23.808 [initandlisten] connection accepted from 172.17.0.4:57073 #20 (9 connections now open)\r\n",
"Fri Oct 17 21:17:53.799 [conn19] end connection 172.17.0.3:45399 (8 connections now open)\r\n",
"Fri Oct 17 21:17:53.800 [initandlisten] connection accepted from 172.17.0.3:45802 #21 (9 connections now open)\r\n",
"Fri Oct 17 21:17:53.828 [conn20] end connection 172.17.0.4:57073 (8 connections now open)\r\n",
"Fri Oct 17 21:17:53.829 [initandlisten] connection accepted from 172.17.0.4:57476 #22 (9 connections now open)\r\n",
"Fri Oct 17 21:18:05.574 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Fri Oct 17 21:18:05.574 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Fri Oct 17 21:18:05.575 [rsHealthPoll] replSet info 172.17.0.3:27017 is down (or slow to respond): \r\n",
"Fri Oct 17 21:18:05.575 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state DOWN\r\n",
"Fri Oct 17 21:18:05.575 [rsHealthPoll] replSet info 172.17.0.4:27017 is down (or slow to respond): \r\n",
"Fri Oct 17 21:18:05.575 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state DOWN\r\n",
"Fri Oct 17 21:18:05.575 [rsMgr] can't see a majority of the set, relinquishing primary\r\n",
"Fri Oct 17 21:18:05.575 [rsMgr] replSet relinquishing primary state\r\n",
"Fri Oct 17 21:18:05.575 [rsMgr] replSet SECONDARY\r\n",
"Fri Oct 17 21:18:05.575 [rsMgr] replSet closing client sockets after relinquishing primary\r\n",
"Fri Oct 17 21:18:05.576 [conn15] end connection 172.17.0.4:56332 (8 connections now open)\r\n",
"Fri Oct 17 21:18:05.576 [conn13] end connection 172.17.0.4:56324 (7 connections now open)\r\n",
"Fri Oct 17 21:18:05.576 [conn12] end connection 172.17.0.3:44651 (6 connections now open)\r\n",
"Fri Oct 17 21:18:05.576 [conn1] end connection 172.17.42.1:55245 (5 connections now open)\r\n",
"Fri Oct 17 21:18:05.576 [conn16] end connection 172.17.42.1:56065 (4 connections now open)\r\n",
"Fri Oct 17 21:18:05.577 [conn14] end connection 172.17.0.3:44657 (3 connections now open)\r\n",
"Fri Oct 17 21:18:05.577 [conn2] end connection 172.17.42.1:55325 (2 connections now open)\r\n"
]
}
],
"prompt_number": 250
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail rs4_srv1_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:17:07.813 [conn8] end connection 172.17.0.4:60922 (3 connections now open)\r\n",
"Fri Oct 17 21:17:07.813 [initandlisten] connection accepted from 172.17.0.4:33102 #11 (5 connections now open)\r\n",
"Fri Oct 17 21:17:21.542 [conn9] end connection 172.17.0.2:54062 (3 connections now open)\r\n",
"Fri Oct 17 21:17:21.542 [initandlisten] connection accepted from 172.17.0.2:54471 #12 (5 connections now open)\r\n",
"Fri Oct 17 21:17:37.833 [conn11] end connection 172.17.0.4:33102 (3 connections now open)\r\n",
"Fri Oct 17 21:17:37.834 [initandlisten] connection accepted from 172.17.0.4:33504 #13 (5 connections now open)\r\n",
"Fri Oct 17 21:17:51.568 [conn12] end connection 172.17.0.2:54471 (3 connections now open)\r\n",
"Fri Oct 17 21:17:51.570 [initandlisten] connection accepted from 172.17.0.2:54876 #14 (4 connections now open)\r\n",
"Fri Oct 17 21:18:05.797 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Fri Oct 17 21:18:05.798 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n"
]
}
],
"prompt_number": 251
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail rs4_srv2_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:17:05.532 [conn7] end connection 172.17.0.2:49448 (3 connections now open)\r\n",
"Fri Oct 17 21:17:05.533 [initandlisten] connection accepted from 172.17.0.2:49858 #10 (4 connections now open)\r\n",
"Fri Oct 17 21:17:07.771 [conn8] end connection 172.17.0.3:51615 (3 connections now open)\r\n",
"Fri Oct 17 21:17:07.772 [initandlisten] connection accepted from 172.17.0.3:52028 #11 (4 connections now open)\r\n",
"Fri Oct 17 21:17:35.551 [conn10] end connection 172.17.0.2:49858 (3 connections now open)\r\n",
"Fri Oct 17 21:17:35.551 [initandlisten] connection accepted from 172.17.0.2:50261 #12 (5 connections now open)\r\n",
"Fri Oct 17 21:17:37.789 [conn11] end connection 172.17.0.3:52028 (3 connections now open)\r\n",
"Fri Oct 17 21:17:37.790 [initandlisten] connection accepted from 172.17.0.3:52430 #13 (4 connections now open)\r\n",
"Fri Oct 17 21:18:05.829 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Fri Oct 17 21:18:05.829 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n"
]
}
],
"prompt_number": 252
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!sudo iptables -L\n",
"#Clear the network problems...\n",
"clear_iptables('test1w2s2')\n",
"#Wait a bit before generating the log..."
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 265
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#In an ssh shell, we can use the follow sort of command to look at a real time stream of stdio log messages from the container\n",
"#!docker.io logs --follow=true rs4_srv1\n",
"#testcollection.insert({'name':'test3'})\n",
"!docker.io logs rs4_srv0 > rs4_srv0_log.txt\n",
"!docker.io logs rs4_srv1 > rs4_srv1_log.txt\n",
"!docker.io logs rs4_srv2 > rs4_srv2_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 254
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail -n 30 rs4_srv0_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:18:05.577 [conn14] end connection 172.17.0.3:44657 (3 connections now open)\r\n",
"Fri Oct 17 21:18:05.577 [conn2] end connection 172.17.42.1:55325 (2 connections now open)\r\n",
"Fri Oct 17 21:18:13.580 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:18:13.581 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:18:17.579 [rsMgr] replSet can't see a majority, will not try to elect self\r\n",
"Fri Oct 17 21:18:22.307 [initandlisten] connection accepted from 172.17.42.1:57290 #23 (3 connections now open)\r\n",
"Fri Oct 17 21:18:25.579 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:18:25.579 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:18:37.581 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:18:37.581 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:18:49.584 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:18:49.584 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:19:01.585 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:19:01.590 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Fri Oct 17 21:19:07.592 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Fri Oct 17 21:19:08.834 [initandlisten] connection accepted from 172.17.0.4:58525 #24 (4 connections now open)\r\n",
"Fri Oct 17 21:19:09.802 [initandlisten] connection accepted from 172.17.0.3:46852 #25 (5 connections now open)\r\n",
"Fri Oct 17 21:19:10.592 [rsHealthPoll] replSet member 172.17.0.3:27017 is up\r\n",
"Fri Oct 17 21:19:10.592 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state SECONDARY\r\n",
"Fri Oct 17 21:19:10.592 [rsMgr] not electing self, 172.17.0.3:27017 would veto with '172.17.0.2:27017 is trying to elect itself but 172.17.0.4:27017 is already primary and more up-to-date'\r\n",
"Fri Oct 17 21:19:10.594 [rsHealthPoll] replSet member 172.17.0.4:27017 is up\r\n",
"Fri Oct 17 21:19:10.594 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state PRIMARY\r\n",
"Fri Oct 17 21:19:11.592 [rsBackgroundSync] replSet syncing to: 172.17.0.4:27017\r\n",
"Fri Oct 17 21:19:12.542 [rsSyncNotifier] replset setting oplog notifier to 172.17.0.4:27017\r\n",
"Fri Oct 17 21:19:12.546 [rsSyncNotifier] build index local.me { _id: 1 }\r\n",
"Fri Oct 17 21:19:12.548 [rsSyncNotifier] build index done. scanned 0 total records. 0.001 secs\r\n",
"Fri Oct 17 21:19:24.850 [conn24] end connection 172.17.0.4:58525 (4 connections now open)\r\n",
"Fri Oct 17 21:19:24.850 [initandlisten] connection accepted from 172.17.0.4:58739 #26 (5 connections now open)\r\n",
"Fri Oct 17 21:19:25.812 [conn25] end connection 172.17.0.3:46852 (4 connections now open)\r\n",
"Fri Oct 17 21:19:25.814 [initandlisten] connection accepted from 172.17.0.3:47081 #27 (5 connections now open)\r\n"
]
}
],
"prompt_number": 256
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail rs4_srv1_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:18:50.800 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:19:02.801 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:19:07.897 [conn16] end connection 172.17.0.4:34328 (3 connections now open)\r\n",
"Fri Oct 17 21:19:07.898 [initandlisten] connection accepted from 172.17.0.4:34753 #17 (5 connections now open)\r\n",
"Fri Oct 17 21:19:09.803 [rsHealthPoll] replset info 172.17.0.2:27017 thinks that we are down\r\n",
"Fri Oct 17 21:19:09.803 [rsHealthPoll] replSet member 172.17.0.2:27017 is up\r\n",
"Fri Oct 17 21:19:09.804 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state SECONDARY\r\n",
"Fri Oct 17 21:19:10.590 [initandlisten] connection accepted from 172.17.0.2:55935 #18 (5 connections now open)\r\n",
"Fri Oct 17 21:19:24.600 [conn18] end connection 172.17.0.2:55935 (4 connections now open)\r\n",
"Fri Oct 17 21:19:24.600 [initandlisten] connection accepted from 172.17.0.2:56166 #19 (5 connections now open)\r\n"
]
}
],
"prompt_number": 257
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail rs4_srv2_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:19:08.835 [rsHealthPoll] replset info 172.17.0.2:27017 thinks that we are down\r\n",
"Fri Oct 17 21:19:08.835 [rsHealthPoll] replSet member 172.17.0.2:27017 is up\r\n",
"Fri Oct 17 21:19:08.836 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state SECONDARY\r\n",
"Fri Oct 17 21:19:10.593 [initandlisten] connection accepted from 172.17.0.2:51538 #18 (6 connections now open)\r\n",
"Fri Oct 17 21:19:11.592 [initandlisten] connection accepted from 172.17.0.2:51596 #19 (7 connections now open)\r\n",
"Fri Oct 17 21:19:12.543 [initandlisten] connection accepted from 172.17.0.2:51610 #20 (8 connections now open)\r\n",
"Fri Oct 17 21:19:12.594 [conn18] end connection 172.17.0.2:51538 (7 connections now open)\r\n",
"Fri Oct 17 21:19:12.595 [initandlisten] connection accepted from 172.17.0.2:51612 #21 (8 connections now open)\r\n",
"Fri Oct 17 21:19:13.554 [slaveTracking] build index local.slaves { _id: 1 }\r\n",
"Fri Oct 17 21:19:13.556 [slaveTracking] build index done. scanned 0 total records. 0.002 secs\r\n"
]
}
],
"prompt_number": 258
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Create a new problem - this time we partition of a single secondary server\n",
"partition_containers('test1w2s2', [ [netobj('172.17.0.4')],[netobj('172.17.0.3'),netobj('172.17.0.2')]])\n",
"#Wait a bit before generating the log..."
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#In an ssh shell, we can use the follow sort of command to look at a real time stream of stdio log messages from the container\n",
"#!docker.io logs --follow=true rs4_srv1\n",
"#testcollection.insert({'name':'test3'})\n",
"!docker.io logs rs4_srv0 > rs4_srv0_log.txt\n",
"!docker.io logs rs4_srv1 > rs4_srv1_log.txt\n",
"!docker.io logs rs4_srv2 > rs4_srv2_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 273
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail -n 30 rs4_srv0_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:20:55.879 [initandlisten] connection accepted from 172.17.0.3:48294 #33 (6 connections now open)\r\n",
"Fri Oct 17 21:21:24.927 [conn32] end connection 172.17.0.4:59953 (4 connections now open)\r\n",
"Fri Oct 17 21:21:24.927 [initandlisten] connection accepted from 172.17.0.4:60360 #34 (6 connections now open)\r\n",
"Fri Oct 17 21:21:25.900 [conn33] end connection 172.17.0.3:48294 (4 connections now open)\r\n",
"Fri Oct 17 21:21:25.900 [initandlisten] connection accepted from 172.17.0.3:48702 #35 (6 connections now open)\r\n",
"Fri Oct 17 21:21:54.950 [conn34] end connection 172.17.0.4:60360 (4 connections now open)\r\n",
"Fri Oct 17 21:21:54.951 [initandlisten] connection accepted from 172.17.0.4:60777 #36 (5 connections now open)\r\n",
"Fri Oct 17 21:21:55.920 [initandlisten] connection accepted from 172.17.0.3:49119 #37 (6 connections now open)\r\n",
"Fri Oct 17 21:21:55.920 [conn35] end connection 172.17.0.3:48702 (5 connections now open)\r\n",
"Fri Oct 17 21:22:24.973 [conn36] end connection 172.17.0.4:60777 (4 connections now open)\r\n",
"Fri Oct 17 21:22:24.974 [initandlisten] connection accepted from 172.17.0.4:32959 #38 (5 connections now open)\r\n",
"Fri Oct 17 21:22:25.944 [conn37] end connection 172.17.0.3:49119 (4 connections now open)\r\n",
"Fri Oct 17 21:22:25.945 [initandlisten] connection accepted from 172.17.0.3:49533 #39 (6 connections now open)\r\n",
"Fri Oct 17 21:22:32.339 [rsSync] build index local.replset.minvalid { _id: 1 }\r\n",
"Fri Oct 17 21:22:32.342 [rsSync] build index done. scanned 0 total records. 0 secs\r\n",
"Fri Oct 17 21:22:54.725 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Fri Oct 17 21:22:54.725 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:22:54.754 [conn21] end connection 172.17.0.3:45802 (4 connections now open)\r\n",
"Fri Oct 17 21:22:55.726 [rsHealthPoll] replSet info 172.17.0.4:27017 is down (or slow to respond): \r\n",
"Fri Oct 17 21:22:55.726 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state DOWN\r\n",
"Fri Oct 17 21:22:55.727 [rsMgr] replSet info electSelf 0\r\n",
"Fri Oct 17 21:22:55.964 [conn39] end connection 172.17.0.3:49533 (3 connections now open)\r\n",
"Fri Oct 17 21:22:55.964 [initandlisten] connection accepted from 172.17.0.3:49942 #40 (4 connections now open)\r\n",
"Fri Oct 17 21:23:03.728 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:23:12.373 [rsBackgroundSync] Socket recv() timeout 172.17.0.4:27017\r\n",
"Fri Oct 17 21:23:12.374 [rsBackgroundSync] SocketException: remote: 172.17.0.4:27017 error: 9001 socket exception [RECV_TIMEOUT] server [172.17.0.4:27017] \r\n",
"Fri Oct 17 21:23:12.375 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: 172.17.0.4:27017\r\n",
"Fri Oct 17 21:23:12.375 [rsMgr] replSet PRIMARY\r\n",
"Fri Oct 17 21:23:14.376 [initandlisten] connection accepted from 172.17.0.3:50202 #41 (5 connections now open)\r\n",
"Fri Oct 17 21:23:15.727 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n"
]
}
],
"prompt_number": 274
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail -n 20 rs4_srv1_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:22:32.335 [rsSyncNotifier] caught exception (socket exception [SEND_ERROR] for 172.17.0.2:27017) in destructor (~PiggyBackData)\r\n",
"Fri Oct 17 21:22:53.574 [conn14] end connection 172.17.0.2:54876 (5 connections now open)\r\n",
"Fri Oct 17 21:22:53.953 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Fri Oct 17 21:22:53.954 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:22:54.731 [conn30] end connection 172.17.0.2:58618 (4 connections now open)\r\n",
"Fri Oct 17 21:22:54.731 [initandlisten] connection accepted from 172.17.0.2:59027 #32 (5 connections now open)\r\n",
"Fri Oct 17 21:22:54.954 [rsHealthPoll] replSet info 172.17.0.4:27017 is down (or slow to respond): \r\n",
"Fri Oct 17 21:22:54.954 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state DOWN\r\n",
"Fri Oct 17 21:22:54.955 [rsMgr] not electing self, 172.17.0.2:27017 would veto with '172.17.0.3:27017 is trying to elect itself but 172.17.0.4:27017 is already primary and more up-to-date'\r\n",
"Fri Oct 17 21:22:55.727 [conn32] replSet info voting yea for 172.17.0.2:27017 (0)\r\n",
"Fri Oct 17 21:23:00.573 [rsMgr] replSet not trying to elect self as responded yea to someone else recently\r\n",
"Fri Oct 17 21:23:02.955 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:23:06.042 [rsMgr] replSet not trying to elect self as responded yea to someone else recently\r\n",
"Fri Oct 17 21:23:12.373 [rsBackgroundSync] Socket recv() timeout 172.17.0.4:27017\r\n",
"Fri Oct 17 21:23:12.374 [rsBackgroundSync] SocketException: remote: 172.17.0.4:27017 error: 9001 socket exception [RECV_TIMEOUT] server [172.17.0.4:27017] \r\n",
"Fri Oct 17 21:23:12.374 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: 172.17.0.4:27017\r\n",
"Fri Oct 17 21:23:12.612 [rsMgr] not electing self, 172.17.0.2:27017 would veto with 'I am already primary, 172.17.0.3:27017 can try again once I've stepped down'\r\n",
"Fri Oct 17 21:23:13.978 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state PRIMARY\r\n",
"Fri Oct 17 21:23:14.375 [rsBackgroundSync] replSet syncing to: 172.17.0.2:27017\r\n",
"Fri Oct 17 21:23:14.954 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n"
]
}
],
"prompt_number": 276
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail -n 20 rs4_srv2_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Fri Oct 17 21:22:54.987 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:22:55.026 [rsHealthPoll] replSet info 172.17.0.3:27017 is down (or slow to respond): \r\n",
"Fri Oct 17 21:22:55.026 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state DOWN\r\n",
"Fri Oct 17 21:22:55.986 [rsHealthPoll] replSet info 172.17.0.2:27017 is down (or slow to respond): \r\n",
"Fri Oct 17 21:22:55.986 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state DOWN\r\n",
"Fri Oct 17 21:22:55.986 [rsMgr] can't see a majority of the set, relinquishing primary\r\n",
"Fri Oct 17 21:22:55.986 [rsMgr] replSet relinquishing primary state\r\n",
"Fri Oct 17 21:22:55.986 [rsMgr] replSet SECONDARY\r\n",
"Fri Oct 17 21:22:55.986 [rsMgr] replSet closing client sockets after relinquishing primary\r\n",
"Fri Oct 17 21:22:55.986 [conn9] end connection 172.17.42.1:34964 (9 connections now open)\r\n",
"Fri Oct 17 21:22:55.987 [conn33] end connection 172.17.0.3:56297 (9 connections now open)\r\n",
"Fri Oct 17 21:22:55.987 [conn1] end connection 172.17.42.1:34221 (9 connections now open)\r\n",
"Fri Oct 17 21:22:55.987 [conn19] end connection 172.17.0.2:51596 (9 connections now open)\r\n",
"Fri Oct 17 21:22:55.987 [conn20] end connection 172.17.0.2:51610 (9 connections now open)\r\n",
"Fri Oct 17 21:22:55.987 [conn34] end connection 172.17.0.3:56464 (9 connections now open)\r\n",
"Fri Oct 17 21:23:03.027 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:23:03.988 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:23:07.026 [rsMgr] replSet can't see a majority, will not try to elect self\r\n",
"Fri Oct 17 21:23:15.030 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Fri Oct 17 21:23:15.990 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n"
]
}
],
"prompt_number": 277
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#\u2013--JUNK BELOW HERE"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!sudo iptables -L"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Chain INPUT (policy ACCEPT)\r\n",
"target prot opt source destination \r\n",
"\r\n",
"Chain FORWARD (policy ACCEPT)\r\n",
"target prot opt source destination \r\n",
"ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED\r\n",
"ACCEPT all -- anywhere anywhere \r\n",
"ACCEPT all -- anywhere anywhere \r\n",
"\r\n",
"Chain OUTPUT (policy ACCEPT)\r\n",
"target prot opt source destination \r\n",
"\r\n",
"Chain test1-p1 (0 references)\r\n",
"target prot opt source destination \r\n",
"\r\n",
"Chain test1-pu76 (0 references)\r\n",
"target prot opt source destination \r\n",
"\r\n",
"Chain test12-p1 (0 references)\r\n",
"target prot opt source destination \r\n",
"\r\n",
"Chain test122-p1 (0 references)\r\n",
"target prot opt source destination \r\n"
]
}
],
"prompt_number": 234
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# !netstat -lntu"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 103,
"text": [
"['local']"
]
}
],
"prompt_number": 103
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!more /etc/default/docker.io"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"# Docker Upstart and SysVinit configuration file\r\n",
"\r\n",
"# Customize location of Docker binary (especially for development testing).\r\n",
"#DOCKER=\"/usr/local/bin/docker\"\r\n",
"\r\n",
"# Use DOCKER_OPTS to modify the daemon startup options.\r\n",
"#DOCKER_OPTS=\"-dns 8.8.8.8 -dns 8.8.4.4\"\r\n",
"\r\n",
"# If you need Docker to use an HTTP proxy, it can also be specified here.\r\n",
"#export http_proxy=\"http://127.0.0.1:3128/\"\r\n",
"\r\n",
"# This is also a handy place to tweak where Docker's temporary files go.\r\n",
"#export TMPDIR=\"/mnt/bigdrive/docker-tmp\"\r\n"
]
}
],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!sudo docker.io run -P -name rs2_srv4 -d dev24/mongodb --replSet rs2 --noprealloc --smallfiles"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Warning: '-name' is deprecated, it will be replaced by '--name' soon. See usage.\r\n",
"c3e21243cf1ed59236f2b5be5f3d287a62d8b77b8b03c3c25afac946e69eaf08\r\n"
]
}
],
"prompt_number": 32
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"class NetworkState(object):\n",
" NORMAL = \"NORMAL\"\n",
" SLOW = \"SLOW\"\n",
" FLAKY = \"FLAKY\"\n",
" UNKNOWN = \"UNKNOWN\"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"class BlockadeNetwork(object):\n",
" def __init__(self, config):\n",
" self.config = config\n",
"\n",
" def new_veth_device_name(self):\n",
" chars = string.ascii_letters + string.digits\n",
" return \"veth\" + \"\".join(random.choice(chars) for _ in range(8))\n",
"\n",
" def network_state(self, device):\n",
" return network_state(device)\n",
"\n",
" def flaky(self, device):\n",
" flaky_config = self.config.network['flaky'].split()\n",
" traffic_control_netem(device, [\"loss\"] + flaky_config)\n",
"\n",
" def slow(self, device):\n",
" slow_config = self.config.network['slow'].split()\n",
" traffic_control_netem(device, [\"delay\"] + slow_config)\n",
"\n",
" def fast(self, device):\n",
" traffic_control_restore(device)\n",
"\n",
" def restore(self, blockade_id):\n",
" clear_iptables(blockade_id)\n",
"\n",
" def partition_containers(self, blockade_id, partitions):\n",
" clear_iptables(blockade_id)\n",
" partition_containers(blockade_id, partitions)\n",
"\n",
" def get_ip_partitions(self, blockade_id):\n",
" return iptables_get_source_chains(blockade_id)\n",
" \n",
"def traffic_control_restore(device):\n",
" cmd = [\"tc\", \"qdisc\", \"del\", \"dev\", device, \"root\"]\n",
"\n",
" p = subprocess.Popen(cmd, stdout=subprocess.PIPE,\n",
" stderr=subprocess.PIPE)\n",
" _, stderr = p.communicate()\n",
" stderr = stderr.decode()\n",
"\n",
" if p.returncode != 0:\n",
" if p.returncode == 2 and stderr:\n",
" if \"No such file or directory\" in stderr:\n",
" return\n",
"\n",
" # TODO log error somewhere?\n",
" raise BlockadeError(\"Problem calling traffic control: \" +\n",
" \" \".join(cmd))\n",
"\n",
"\n",
"def traffic_control_netem(device, params):\n",
" try:\n",
" cmd = [\"tc\", \"qdisc\", \"replace\", \"dev\", device,\n",
" \"root\", \"netem\"] + params\n",
" subprocess.check_call(cmd)\n",
"\n",
" except subprocess.CalledProcessError:\n",
" # TODO log error somewhere?\n",
" raise BlockadeError(\"Problem calling traffic control: \" +\n",
" \" \".join(cmd))\n",
"\n",
"\n",
"def network_state(device):\n",
" try:\n",
" output = subprocess.check_output(\n",
" [\"tc\", \"qdisc\", \"show\", \"dev\", device]).decode()\n",
" # sloppy but good enough for now\n",
" if \" delay \" in output:\n",
" return NetworkState.SLOW\n",
" if \" loss \" in output:\n",
" return NetworkState.FLAKY\n",
" return NetworkState.NORMAL\n",
"\n",
" except subprocess.CalledProcessError:\n",
" # TODO log error somewhere?\n",
" return NetworkState.UNKNOWN"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#!netstat -lntu"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!sudo docker.io ps --all #--no-trunc"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\r\n",
"19daf983bcb5 dev24/mongodb:latest usr/bin/mongod --rep 19 hours ago Up 19 hours 27017/tcp rs3_srv2 \r\n",
"c3e21243cf1e dev24/mongodb:latest usr/bin/mongod --rep 20 hours ago Up 20 hours 0.0.0.0:49169->27017/tcp rs2_srv4 \r\n",
"03d0dee48b5a dev24/mongos:latest usr/bin/mongos --por 43 hours ago Up 43 hours 0.0.0.0:49168->27017/tcp mongos1 \r\n",
"e63e5ad534f3 dev24/mongodb:latest usr/bin/mongod --nop 45 hours ago Up 45 hours 0.0.0.0:49161->27017/tcp cfg3 \r\n",
"ac61bd95aa07 dev24/mongodb:latest usr/bin/mongod --nop 45 hours ago Up 45 hours 0.0.0.0:49160->27017/tcp cfg2 \r\n",
"25f80dc762fa dev24/mongodb:latest usr/bin/mongod --nop 45 hours ago Up 45 hours 0.0.0.0:49159->27017/tcp cfg1 \r\n",
"81385e4ce354 dev24/mongodb:latest usr/bin/mongod --rep 45 hours ago Up 45 hours 0.0.0.0:49158->27017/tcp rs2_srv3 \r\n",
"f21255ed5479 dev24/mongodb:latest usr/bin/mongod --rep 45 hours ago Up 45 hours 0.0.0.0:49157->27017/tcp rs2_srv2 \r\n",
"2ffb31f70bab dev24/mongodb:latest usr/bin/mongod --rep 45 hours ago Up 45 hours 0.0.0.0:49156->27017/tcp rs2_srv1 \r\n",
"a80c69e9e434 dev24/mongodb:latest usr/bin/mongod --rep 45 hours ago Up 45 hours 0.0.0.0:49155->27017/tcp rs1_srv3 \r\n",
"1edfb75fa453 dev24/mongodb:latest usr/bin/mongod --rep 45 hours ago Up 45 hours 0.0.0.0:49154->27017/tcp rs1_srv2 \r\n",
"84fa8b8bfc0c dev24/mongodb:latest usr/bin/mongod --rep 45 hours ago Up 45 hours 0.0.0.0:49153->27017/tcp rs1_srv1 \r\n"
]
}
],
"prompt_number": 187
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"traffic_control_netem('vethb03e', [\"loss\", '40%'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 204
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"device='vethb03e'\n",
"traffic_control_netem(device, [\"loss\", '40%'])\n",
"traffic_control_netem(device, [\"delay\",\"75ms\",\"100ms\",\"distribution\",\"normal\"])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 218
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"traffic_control_restore('vethafb1')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 228
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We can see the different veth names from netstat\n",
"!netstat -i\n",
"#but this doesn't give a way of associating them with a container (sudo docker.io ps)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Kernel Interface table\r\n",
"Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg\r\n",
"docker0 1500 0 36256 0 0 0 65281 0 0 0 BMRU\r\n",
"eth0 1500 0 211877 0 0 0 127081 0 0 0 BMRU\r\n",
"lo 65536 0 16401 0 0 0 16401 0 0 0 LRU\r\n",
"veth6782 1500 0 300349 0 0 0 305285 0 0 0 BRU\r\n",
"veth9541 1500 0 235540 0 0 0 258248 0 0 0 BRU\r\n",
"veth0cc7 1500 0 9 0 0 0 64 0 0 0 BRU\r\n",
"veth4c89 1500 0 9 0 0 0 91 0 0 0 BRU\r\n",
"veth5b89 1500 0 234335 0 0 0 256913 0 0 0 BRU\r\n",
"veth6bd2 1500 0 233697 0 0 0 255926 0 0 0 BRU\r\n",
"veth86dc 1500 0 149942 0 0 0 172384 0 0 0 BRU\r\n",
"vetha087 1500 0 199264 0 0 0 275534 0 0 0 BRU\r\n",
"vethafb1 1500 0 845641 0 0 0 605569 0 0 0 BRU\r\n",
"vethb03e 1500 0 301702 0 0 0 306421 0 0 0 BRU\r\n",
"vethe5d1 1500 0 132717 0 0 0 175587 0 0 0 BRU\r\n",
"vethf7c0 1500 0 235662 0 0 0 258371 0 0 0 BRU\r\n"
]
}
],
"prompt_number": 201
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment