Skip to content

Instantly share code, notes, and snippets.

@psychemedia
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save psychemedia/67a1c27ae1b0f0cee7ef to your computer and use it in GitHub Desktop.
Save psychemedia/67a1c27ae1b0f0cee7ef to your computer and use it in GitHub Desktop.
Example of running mongo replica set in containers via docker and then introducing network partitions
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:c87375d51589ee9f0289afef5f0b63162adb93679b907b1f05de30c5a12f17c9"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Mongo - Network Failures"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MongoDB is designed to support distributed operations and as such can tolerate network failures. In this notebook, we'll see how a MongoDB replica set fares in the face of simulated network failures.\n",
"\n",
"(More technical details about how the MongoDB replica set is configured can be found in the notebook [Dockering](./dockering.ipynb) )."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from dockeringMongo import *"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"c = docker.Client(base_url='unix://var/run/docker.sock',\n",
" version='1.10',\n",
" timeout=10)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Fire up three MongoDB containers that we'll use in a replica set\n",
"\n",
"#STUB is an identifier used to label the nodes\n",
"#Superstition - short stub required?\n",
"STUB='rs3'\n",
"rsc=rs_config(c,STUB,num=3)\n",
"rsc"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 90,
"text": [
"{'members': [{'host': '172.17.0.2:27017', '_id': 0},\n",
" {'host': '172.17.0.3:27017', '_id': 1},\n",
" {'host': '172.17.0.4:27017', '_id': 2}],\n",
" '_id': 'rs3'}"
]
}
],
"prompt_number": 90
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"showContainers(c)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"['/rs3_srv2'] Up Less than a second\n",
"['/rs3_srv1'] Up 1 seconds\n",
"['/rs3_srv0'] Up 1 seconds\n"
]
}
],
"prompt_number": 91
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#tidyAwayContainers(c,['rs3_srv0','rs3_srv1','rs3_srv2'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 92
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"docker_ps(c)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 93,
"text": [
"[{'Created': 1415618951,\n",
" 'Names': ['/rs3_srv2'],\n",
" 'Ports': [{'PublicPort': 49185,\n",
" 'IP': '0.0.0.0',\n",
" 'Type': 'tcp',\n",
" 'PrivatePort': 27017}],\n",
" 'Command': 'usr/bin/mongod --replSet rs3 --noprealloc --smallfiles',\n",
" 'Image': 'dev24/mongodb:latest',\n",
" 'Id': '7b694cdab9f8e52f730b20e92470a7e8aae4d11b5434dce98a2ada7659fac921',\n",
" 'Status': 'Up 3 seconds'},\n",
" {'Created': 1415618951,\n",
" 'Names': ['/rs3_srv1'],\n",
" 'Ports': [{'PublicPort': 49184,\n",
" 'IP': '0.0.0.0',\n",
" 'Type': 'tcp',\n",
" 'PrivatePort': 27017}],\n",
" 'Command': 'usr/bin/mongod --replSet rs3 --noprealloc --smallfiles',\n",
" 'Image': 'dev24/mongodb:latest',\n",
" 'Id': 'bae7907f61baa7fb16ccb481f06dcf1f9269ead7c4047f8241ebedb5e11a0b3b',\n",
" 'Status': 'Up 3 seconds'},\n",
" {'Created': 1415618951,\n",
" 'Names': ['/rs3_srv0'],\n",
" 'Ports': [{'PublicPort': 49183,\n",
" 'IP': '0.0.0.0',\n",
" 'Type': 'tcp',\n",
" 'PrivatePort': 27017}],\n",
" 'Command': 'usr/bin/mongod --replSet rs3 --noprealloc --smallfiles',\n",
" 'Image': 'dev24/mongodb:latest',\n",
" 'Id': '256dc64e421a825766deceacc55869679633dd2c244267dbd909ec6628a59727',\n",
" 'Status': 'Up 3 seconds'}]"
]
}
],
"prompt_number": 93
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Initialise the replica set\n",
"from pymongo import MongoClient\n",
"\n",
"#We'll use the 0th server in the set as a the node\n",
"mc = MongoClient('localhost', get27017tcp_port(c,STUB+'_srv0'),replicaset=STUB)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 94
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Set up connections to the other members of the replica set\n",
"client1=MongoClient('localhost', get27017tcp_port(c,STUB+'_srv1'),replicaset=STUB)\n",
"client2=MongoClient('localhost', get27017tcp_port(c,STUB+'_srv2'),replicaset=STUB)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 95
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#In the mongo console, we would typically use the command MONGOCLIENT.config() to initialise the replica set\n",
"#Here, we use the replSetInitiate admin command, applying it with the desired configuration\n",
"mc.admin.command( \"replSetInitiate\",rsc);"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 96
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#We may need to wait a minute or two for the configuration to come up\n",
"#If you get an error message that suggests the configuration isn't up yet, wait a few seconds then rerun the cell\n",
"mc.admin.command('replSetGetStatus')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 98,
"text": [
"{'date': datetime.datetime(2014, 11, 10, 11, 29, 35),\n",
" 'members': [{'self': True,\n",
" '_id': 0,\n",
" 'stateStr': 'STARTUP2',\n",
" 'optime': Timestamp(1415618964, 1),\n",
" 'name': '172.17.0.2:27017',\n",
" 'uptime': 24,\n",
" 'health': 1.0,\n",
" 'state': 5,\n",
" 'optimeDate': datetime.datetime(2014, 11, 10, 11, 29, 24)},\n",
" {'health': 1.0,\n",
" 'optime': Timestamp(0, 0),\n",
" 'optimeDate': datetime.datetime(1970, 1, 1, 0, 0),\n",
" 'lastHeartbeat': datetime.datetime(2014, 11, 10, 11, 29, 34),\n",
" 'pingMs': 1,\n",
" '_id': 1,\n",
" 'lastHeartbeatMessage': 'still initializing',\n",
" 'stateStr': 'UNKNOWN',\n",
" 'name': '172.17.0.3:27017',\n",
" 'uptime': 1,\n",
" 'lastHeartbeatRecv': datetime.datetime(1970, 1, 1, 0, 0),\n",
" 'state': 6},\n",
" {'health': 1.0,\n",
" 'optime': Timestamp(0, 0),\n",
" 'optimeDate': datetime.datetime(1970, 1, 1, 0, 0),\n",
" 'lastHeartbeat': datetime.datetime(2014, 11, 10, 11, 29, 34),\n",
" 'pingMs': 0,\n",
" '_id': 2,\n",
" 'lastHeartbeatMessage': 'still initializing',\n",
" 'stateStr': 'UNKNOWN',\n",
" 'name': '172.17.0.4:27017',\n",
" 'uptime': 1,\n",
" 'lastHeartbeatRecv': datetime.datetime(1970, 1, 1, 0, 0),\n",
" 'state': 6}],\n",
" 'ok': 1.0,\n",
" 'myState': 5,\n",
" 'set': 'rs3'}"
]
}
],
"prompt_number": 98
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#The statusReport() command lets you ask a mongoDB node what it thinks the state of the world is\n",
"statusReport(mc)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"I am machine id 0 as PRIMARY\n",
"Machine id 1 (SECONDARY) last heard from at 2014-11-10 10:31:56\n",
"Machine id 2 (SECONDARY) last heard from at 2014-11-10 10:31:56\n"
]
}
],
"prompt_number": 41
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#If we're quick, we can watch the machines come up into the fold\n",
"statusReport(client1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 (UNKNOWN) last heard from at 1970-01-01 00:00:00\n",
"I am machine id 1 as STARTUP2\n",
"Machine id 2 (UNKNOWN) last heard from at 1970-01-01 00:00:00\n"
]
}
],
"prompt_number": 99
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(client1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 (PRIMARY) last heard from at 2014-11-10 11:29:58\n",
"I am machine id 1 as STARTUP2\n",
"Machine id 2 (STARTUP2) last heard from at 2014-11-10 11:29:58\n"
]
}
],
"prompt_number": 100
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(client1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 (PRIMARY) last heard from at 2014-11-10 11:30:24\n",
"I am machine id 1 as SECONDARY\n",
"Machine id 2 (SECONDARY) last heard from at 2014-11-10 11:30:24\n"
]
}
],
"prompt_number": 101
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Modelling Network Errors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this section we'll use firewall rules between containers to model network partitions."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#The blockade library contains functions for setting up and tearing down iptable firewall rulesets\n",
"from blockade import *"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 103
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#As we define blockade rules by IP address, we need to know those addresses\n",
"rsc"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 104,
"text": [
"{'members': [{'host': '172.17.0.2:27017', '_id': 0},\n",
" {'host': '172.17.0.3:27017', '_id': 1},\n",
" {'host': '172.17.0.4:27017', '_id': 2}],\n",
" '_id': 'rs3'}"
]
}
],
"prompt_number": 104
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Maybe worth creating an object that lists stuff like this by instance name?\n",
"memberIPaddress(rsc,'rs3_srv2')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 105,
"text": [
"'172.17.0.4:27017'"
]
}
],
"prompt_number": 105
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `partition_containers()` function creates a named ruleset that puts listed IP addresses into separate partitions.\n",
"\n",
"Let's start by splitting off one of the secondary machines - rs3_srv2, on IP address 172.17.0.4:27017 - into a partition on its own:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#The machines in each partition are put into their own lists\n",
"#partition_containers(RULESET,[ PARTITION_1_LIST, PARTITION_2_LIST ]\n",
"partition_containers('test1w2s2', [ [netobj('172.17.0.4')],[netobj('172.17.0.2'),netobj('172.17.0.3')]])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 106
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wait a few seconds for the machines to realise all is not well before proceeding...\n",
"\n",
"...then check their statuses:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(mc)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"I am machine id 0 as PRIMARY\n",
"Machine id 1 (SECONDARY) last heard from at 2014-11-10 11:32:42\n",
"Machine id 2 ((not reachable/healthy)) last heard from at 2014-11-10 11:32:38\n"
]
}
],
"prompt_number": 109
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(client1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 (PRIMARY) last heard from at 2014-11-10 11:32:56\n",
"I am machine id 1 as SECONDARY\n",
"Machine id 2 ((not reachable/healthy)) last heard from at 2014-11-10 11:32:50\n"
]
}
],
"prompt_number": 110
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(client2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 ((not reachable/healthy)) last heard from at 2014-11-10 11:32:46\n",
"Machine id 1 ((not reachable/healthy)) last heard from at 2014-11-10 11:32:51\n",
"I am machine id 2 as SECONDARY\n"
]
}
],
"prompt_number": 111
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's fix the network..."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"clear_iptables('test1w2s2')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 112
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Wait a second or too then see how things are...\n",
"statusReport(mc)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"I am machine id 0 as PRIMARY\n",
"Machine id 1 (SECONDARY) last heard from at 2014-11-10 11:34:16\n",
"Machine id 2 (SECONDARY) last heard from at 2014-11-10 11:34:17\n"
]
}
],
"prompt_number": 113
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What happens if we knock out the connection betweem the primary server and the secondary servers?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Put the primary server into a partition on it's own\n",
"partition_containers('test1w2s2', [ [netobj('172.17.0.2')],[netobj('172.17.0.3'),netobj('172.17.0.4')]])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 114
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, wait a few seconds for things to happen... then check the statuses:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(mc)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"I am machine id 0 as SECONDARY\n",
"Machine id 1 ((not reachable/healthy)) last heard from at 2014-11-10 11:34:48\n",
"Machine id 2 ((not reachable/healthy)) last heard from at 2014-11-10 11:34:49\n"
]
}
],
"prompt_number": 115
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(client1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 ((not reachable/healthy)) last heard from at 2014-11-10 11:34:52\n",
"I am machine id 1 as SECONDARY\n",
"Machine id 2 (PRIMARY) last heard from at 2014-11-10 11:35:02\n"
]
}
],
"prompt_number": 116
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(client2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 ((not reachable/healthy)) last heard from at 2014-11-10 11:34:53\n",
"Machine id 1 (SECONDARY) last heard from at 2014-11-10 11:35:03\n",
"I am machine id 2 as PRIMARY\n"
]
}
],
"prompt_number": 117
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"clear_iptables('test1w2s2')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 118
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(mc)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"I am machine id 0 as SECONDARY\n",
"Machine id 1 (SECONDARY) last heard from at 2014-11-10 11:35:55\n",
"Machine id 2 (PRIMARY) last heard from at 2014-11-10 11:35:54\n"
]
}
],
"prompt_number": 119
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(client1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 (SECONDARY) last heard from at 2014-11-10 11:35:57\n",
"I am machine id 1 as SECONDARY\n",
"Machine id 2 (PRIMARY) last heard from at 2014-11-10 11:35:58\n"
]
}
],
"prompt_number": 120
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"statusReport(client2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Machine id 0 (SECONDARY) last heard from at 2014-11-10 11:36:14\n",
"Machine id 1 (SECONDARY) last heard from at 2014-11-10 11:36:13\n",
"I am machine id 2 as PRIMARY\n"
]
}
],
"prompt_number": 121
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Grab a copy of the logs from each of the servers to see what happened..."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#In an ssh shell, the following sort of command displays a real time stream of stdio log messages from the container\n",
"#!docker.io logs --follow=true {STUB}_srv1\n",
"!docker.io logs {STUB}_srv0 > {STUB}_srv0_log.txt\n",
"!docker.io logs {STUB}_srv1 > {STUB}_srv1_log.txt\n",
"!docker.io logs {STUB}_srv2 > {STUB}_srv2_log.txt\n",
"#?we may also be able to use mongostat - http://docs.mongodb.org/manual/reference/program/mongostat/"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 122
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Maybe we need to use a log parser or highlighter to pull out interesting messages? Maybe we need to visualise the communication between the servers, eg with a combined timeline?*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See below for the log for rs3_srv0, the original primary. There are several things to note:\n",
"\n",
"* around 11:29:42, it mutters to itself about the state of the network and whether it's electable;\n",
"* between 11:29:48 and 11:29:49 it goes for itself as PRIMARY;\n",
"* through 11:30:03 multiple connections are opened with other nodes in the replica set;\n",
"* by 11:30:06, the replica set is up and running.\n",
"* at 11:32:38, sight of rs3_srv2 is lost and it is reported as DOWN as we partition it;\n",
"* through 11:33, more connections are opened to rs3_srv1 - is this a defensive measure?\n",
"* at 11:34:07, rs3_srv2 comes back as the partition is removed;\n",
"* at 11:34:37, we lose a connection to rs3_srv2, and through 11:34:4x note things are really down as a result of the partition we placed rs3_srv0 into;\n",
"* at 11:34:49.634, rs3_srv0 gives up PRIMARY;\n",
"* at 11:35:39, the partition is removed and the connections to the other machines in the replica set are restored; rs3_srv0 doesn't elect itself back as PRIMARY - rs3_srv2 has that job now."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail -500 {STUB}_srv0_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"note: noprealloc may hurt performance in many applications\r\n",
"Mon Nov 10 11:29:11.786 [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db/ 64-bit host=256dc64e421a\r\n",
"Mon Nov 10 11:29:11.787 [initandlisten] db version v2.4.11\r\n",
"Mon Nov 10 11:29:11.787 [initandlisten] git version: fa13d1ee8da0f112f588570b4070f73d7af2f7fd\r\n",
"Mon Nov 10 11:29:11.787 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49\r\n",
"Mon Nov 10 11:29:11.787 [initandlisten] allocator: tcmalloc\r\n",
"Mon Nov 10 11:29:11.787 [initandlisten] options: { noprealloc: true, replSet: \"rs3\", smallfiles: true }\r\n",
"Mon Nov 10 11:29:11.789 [initandlisten] journal dir=/data/db/journal\r\n",
"Mon Nov 10 11:29:11.790 [initandlisten] recover : no journal files present, no recovery needed\r\n",
"Mon Nov 10 11:29:11.953 [FileAllocator] allocating new datafile /data/db/local.ns, filling with zeroes...\r\n",
"Mon Nov 10 11:29:11.953 [FileAllocator] creating directory /data/db/_tmp\r\n",
"Mon Nov 10 11:29:12.073 [FileAllocator] done allocating datafile /data/db/local.ns, size: 16MB, took 0.117 secs\r\n",
"Mon Nov 10 11:29:12.136 [FileAllocator] allocating new datafile /data/db/local.0, filling with zeroes...\r\n",
"Mon Nov 10 11:29:12.213 [FileAllocator] done allocating datafile /data/db/local.0, size: 16MB, took 0.076 secs\r\n",
"Mon Nov 10 11:29:12.213 [initandlisten] command local.$cmd command: { create: \"startup_log\", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 reslen:37 260ms\r\n",
"Mon Nov 10 11:29:12.214 [initandlisten] waiting for connections on port 27017\r\n",
"Mon Nov 10 11:29:12.217 [websvr] admin web console waiting for connections on port 28017\r\n",
"Mon Nov 10 11:29:12.224 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)\r\n",
"Mon Nov 10 11:29:12.224 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done\r\n",
"Mon Nov 10 11:29:17.687 [initandlisten] connection accepted from 172.17.42.1:36164 #1 (1 connection now open)\r\n",
"Mon Nov 10 11:29:21.324 [conn1] replSet replSetInitiate admin command received from client\r\n",
"Mon Nov 10 11:29:21.325 [conn1] replSet replSetInitiate config object parses ok, 3 members specified\r\n",
"Mon Nov 10 11:29:21.334 [conn1] replSet replSetInitiate all members seem up\r\n",
"Mon Nov 10 11:29:21.334 [conn1] ******\r\n",
"Mon Nov 10 11:29:21.335 [conn1] creating replication oplog of size: 1333MB...\r\n",
"Mon Nov 10 11:29:21.336 [FileAllocator] allocating new datafile /data/db/local.1, filling with zeroes...\r\n",
"Mon Nov 10 11:29:22.385 [FileAllocator] done allocating datafile /data/db/local.1, size: 511MB, took 1.048 secs\r\n",
"Mon Nov 10 11:29:22.385 [FileAllocator] allocating new datafile /data/db/local.2, filling with zeroes...\r\n",
"Mon Nov 10 11:29:23.367 [FileAllocator] done allocating datafile /data/db/local.2, size: 511MB, took 0.98 secs\r\n",
"Mon Nov 10 11:29:23.368 [FileAllocator] allocating new datafile /data/db/local.3, filling with zeroes...\r\n",
"Mon Nov 10 11:29:24.429 [FileAllocator] done allocating datafile /data/db/local.3, size: 511MB, took 1.06 secs\r\n",
"Mon Nov 10 11:29:24.438 [conn1] ******\r\n",
"Mon Nov 10 11:29:24.440 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)\r\n",
"Mon Nov 10 11:29:24.440 [conn1] replSet info saving a newer config version to local.system.replset\r\n",
"Mon Nov 10 11:29:24.443 [conn1] replSet saveConfigLocally done\r\n",
"Mon Nov 10 11:29:24.443 [conn1] replSet replSetInitiate config now saved locally. Should come online in about a minute.\r\n",
"Mon Nov 10 11:29:24.443 [conn1] command admin.$cmd command: { replSetInitiate: { members: [ { host: \"172.17.0.2:27017\", _id: 0 }, { host: \"172.17.0.3:27017\", _id: 1 }, { host: \"172.17.0.4:27017\", _id: 2 } ], _id: \"rs3\" } } ntoreturn:1 keyUpdates:0 locks(micros) W:3108180 reslen:112 3119ms\r\n",
"Mon Nov 10 11:29:34.445 [rsStart] replSet I am 172.17.0.2:27017\r\n",
"Mon Nov 10 11:29:34.446 [rsStart] replSet STARTUP2\r\n",
"Mon Nov 10 11:29:34.448 [rsHealthPoll] replSet member 172.17.0.4:27017 is up\r\n",
"Mon Nov 10 11:29:34.448 [rsHealthPoll] replSet member 172.17.0.3:27017 is up\r\n",
"Mon Nov 10 11:29:35.450 [rsSync] replSet SECONDARY\r\n",
"Mon Nov 10 11:29:40.453 [rsMgr] replSet not electing self, not all members up and we have been up less than 5 minutes\r\n",
"Mon Nov 10 11:29:42.204 [initandlisten] connection accepted from 172.17.0.3:53312 #2 (2 connections now open)\r\n",
"Mon Nov 10 11:29:42.205 [conn2] end connection 172.17.0.3:53312 (1 connection now open)\r\n",
"Mon Nov 10 11:29:42.205 [initandlisten] connection accepted from 172.17.0.3:53313 #3 (2 connections now open)\r\n",
"Mon Nov 10 11:29:42.296 [initandlisten] connection accepted from 172.17.0.4:36754 #4 (3 connections now open)\r\n",
"Mon Nov 10 11:29:42.299 [conn4] end connection 172.17.0.4:36754 (2 connections now open)\r\n",
"Mon Nov 10 11:29:42.300 [initandlisten] connection accepted from 172.17.0.4:36755 #5 (3 connections now open)\r\n",
"Mon Nov 10 11:29:42.461 [rsHealthPoll] replset info 172.17.0.4:27017 thinks that we are down\r\n",
"Mon Nov 10 11:29:42.462 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state STARTUP2\r\n",
"Mon Nov 10 11:29:42.464 [rsHealthPoll] replset info 172.17.0.3:27017 thinks that we are down\r\n",
"Mon Nov 10 11:29:42.467 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state STARTUP2\r\n",
"Mon Nov 10 11:29:42.468 [rsMgr] not electing self, 172.17.0.4:27017 would veto with 'I don't think 172.17.0.2:27017 is electable'\r\n",
"Mon Nov 10 11:29:42.469 [rsMgr] not electing self, 172.17.0.4:27017 would veto with 'I don't think 172.17.0.2:27017 is electable'\r\n",
"Mon Nov 10 11:29:48.469 [rsMgr] replSet info electSelf 0\r\n",
"Mon Nov 10 11:29:49.456 [rsMgr] replSet PRIMARY\r\n",
"Mon Nov 10 11:29:58.228 [conn3] end connection 172.17.0.3:53313 (2 connections now open)\r\n",
"Mon Nov 10 11:29:58.229 [initandlisten] connection accepted from 172.17.0.3:53321 #6 (3 connections now open)\r\n",
"Mon Nov 10 11:29:58.328 [conn5] end connection 172.17.0.4:36755 (2 connections now open)\r\n",
"Mon Nov 10 11:29:58.329 [initandlisten] connection accepted from 172.17.0.4:36761 #7 (3 connections now open)\r\n",
"Mon Nov 10 11:30:02.871 [initandlisten] connection accepted from 172.17.0.4:36763 #8 (4 connections now open)\r\n",
"Mon Nov 10 11:30:02.892 [conn8] end connection 172.17.0.4:36763 (3 connections now open)\r\n",
"Mon Nov 10 11:30:03.073 [initandlisten] connection accepted from 172.17.0.3:53325 #9 (4 connections now open)\r\n",
"Mon Nov 10 11:30:03.090 [conn9] end connection 172.17.0.3:53325 (3 connections now open)\r\n",
"Mon Nov 10 11:30:03.234 [initandlisten] connection accepted from 172.17.0.3:53326 #10 (4 connections now open)\r\n",
"Mon Nov 10 11:30:03.320 [initandlisten] connection accepted from 172.17.0.4:36766 #11 (5 connections now open)\r\n",
"Mon Nov 10 11:30:03.894 [initandlisten] connection accepted from 172.17.0.4:36767 #12 (6 connections now open)\r\n",
"Mon Nov 10 11:30:04.094 [initandlisten] connection accepted from 172.17.0.3:53329 #13 (7 connections now open)\r\n",
"Mon Nov 10 11:30:04.482 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state RECOVERING\r\n",
"Mon Nov 10 11:30:04.490 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state RECOVERING\r\n",
"Mon Nov 10 11:30:04.902 [slaveTracking] build index local.slaves { _id: 1 }\r\n",
"Mon Nov 10 11:30:04.904 [slaveTracking] build index done. scanned 0 total records. 0.001 secs\r\n",
"Mon Nov 10 11:30:06.484 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:30:06.491 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:30:28.249 [conn6] end connection 172.17.0.3:53321 (6 connections now open)\r\n",
"Mon Nov 10 11:30:28.249 [initandlisten] connection accepted from 172.17.0.3:53333 #14 (7 connections now open)\r\n",
"Mon Nov 10 11:30:28.347 [conn7] end connection 172.17.0.4:36761 (6 connections now open)\r\n",
"Mon Nov 10 11:30:28.347 [initandlisten] connection accepted from 172.17.0.4:36773 #15 (8 connections now open)\r\n",
"Mon Nov 10 11:30:58.272 [conn14] end connection 172.17.0.3:53333 (6 connections now open)\r\n",
"Mon Nov 10 11:30:58.273 [initandlisten] connection accepted from 172.17.0.3:53339 #16 (7 connections now open)\r\n",
"Mon Nov 10 11:30:58.372 [conn15] end connection 172.17.0.4:36773 (6 connections now open)\r\n",
"Mon Nov 10 11:30:58.372 [initandlisten] connection accepted from 172.17.0.4:36779 #17 (7 connections now open)\r\n",
"Mon Nov 10 11:31:28.299 [conn16] end connection 172.17.0.3:53339 (6 connections now open)\r\n",
"Mon Nov 10 11:31:28.299 [initandlisten] connection accepted from 172.17.0.3:53345 #18 (7 connections now open)\r\n",
"Mon Nov 10 11:31:28.397 [conn17] end connection 172.17.0.4:36779 (6 connections now open)\r\n",
"Mon Nov 10 11:31:28.397 [initandlisten] connection accepted from 172.17.0.4:36785 #19 (8 connections now open)\r\n",
"Mon Nov 10 11:31:58.324 [conn18] end connection 172.17.0.3:53345 (6 connections now open)\r\n",
"Mon Nov 10 11:31:58.324 [initandlisten] connection accepted from 172.17.0.3:53351 #20 (7 connections now open)\r\n",
"Mon Nov 10 11:31:58.420 [conn19] end connection 172.17.0.4:36785 (6 connections now open)\r\n",
"Mon Nov 10 11:31:58.421 [initandlisten] connection accepted from 172.17.0.4:36791 #21 (7 connections now open)\r\n",
"Mon Nov 10 11:32:28.347 [conn20] end connection 172.17.0.3:53351 (6 connections now open)\r\n",
"Mon Nov 10 11:32:28.348 [initandlisten] connection accepted from 172.17.0.3:53357 #22 (7 connections now open)\r\n",
"Mon Nov 10 11:32:38.591 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Mon Nov 10 11:32:38.591 [rsHealthPoll] replSet info 172.17.0.4:27017 is down (or slow to respond): \r\n",
"Mon Nov 10 11:32:38.591 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state DOWN\r\n",
"Mon Nov 10 11:32:46.593 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:32:58.373 [conn22] end connection 172.17.0.3:53357 (6 connections now open)\r\n",
"Mon Nov 10 11:32:58.374 [initandlisten] connection accepted from 172.17.0.3:53375 #23 (7 connections now open)\r\n",
"Mon Nov 10 11:32:58.593 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:33:04.594 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:33:18.597 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:33:24.598 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:33:28.404 [conn23] end connection 172.17.0.3:53375 (6 connections now open)\r\n",
"Mon Nov 10 11:33:28.405 [initandlisten] connection accepted from 172.17.0.3:53396 #24 (7 connections now open)\r\n",
"Mon Nov 10 11:33:38.602 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:33:44.602 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:33:58.427 [conn24] end connection 172.17.0.3:53396 (6 connections now open)\r\n",
"Mon Nov 10 11:33:58.427 [initandlisten] connection accepted from 172.17.0.3:53417 #25 (8 connections now open)\r\n",
"Mon Nov 10 11:33:58.606 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:34:04.608 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:34:07.612 [rsHealthPoll] replset info 172.17.0.4:27017 thinks that we are down\r\n",
"Mon Nov 10 11:34:07.613 [rsHealthPoll] replSet member 172.17.0.4:27017 is up\r\n",
"Mon Nov 10 11:34:07.613 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:34:08.474 [initandlisten] connection accepted from 172.17.0.4:36866 #26 (8 connections now open)\r\n",
"Mon Nov 10 11:34:08.474 [conn26] end connection 172.17.0.4:36866 (7 connections now open)\r\n",
"Mon Nov 10 11:34:08.475 [initandlisten] connection accepted from 172.17.0.4:36867 #27 (9 connections now open)\r\n",
"Mon Nov 10 11:34:08.918 [initandlisten] connection accepted from 172.17.0.4:36868 #28 (9 connections now open)\r\n",
"Mon Nov 10 11:34:13.610 [conn11] end connection 172.17.0.4:36766 (8 connections now open)\r\n",
"Mon Nov 10 11:34:28.455 [conn25] end connection 172.17.0.3:53417 (7 connections now open)\r\n",
"Mon Nov 10 11:34:28.455 [initandlisten] connection accepted from 172.17.0.3:53432 #29 (9 connections now open)\r\n",
"Mon Nov 10 11:34:28.491 [conn27] end connection 172.17.0.4:36867 (7 connections now open)\r\n",
"Mon Nov 10 11:34:28.492 [initandlisten] connection accepted from 172.17.0.4:36872 #30 (8 connections now open)\r\n",
"Mon Nov 10 11:34:37.630 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:34:40.681 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Mon Nov 10 11:34:40.681 [rsHealthPoll] replSet info 172.17.0.3:27017 is down (or slow to respond): \r\n",
"Mon Nov 10 11:34:40.681 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state DOWN\r\n",
"Mon Nov 10 11:34:43.633 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:34:48.687 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:34:49.634 [rsHealthPoll] replSet info 172.17.0.4:27017 is down (or slow to respond): \r\n",
"Mon Nov 10 11:34:49.634 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state DOWN\r\n",
"Mon Nov 10 11:34:49.634 [rsMgr] can't see a majority of the set, relinquishing primary\r\n",
"Mon Nov 10 11:34:49.634 [rsMgr] replSet relinquishing primary state\r\n",
"Mon Nov 10 11:34:49.634 [rsMgr] replSet SECONDARY\r\n",
"Mon Nov 10 11:34:49.634 [rsMgr] replSet closing client sockets after relinquishing primary\r\n",
"Mon Nov 10 11:34:49.635 [conn13] end connection 172.17.0.3:53329 (7 connections now open)\r\n",
"Mon Nov 10 11:34:49.635 [conn28] end connection 172.17.0.4:36868 (7 connections now open)\r\n",
"Mon Nov 10 11:34:49.635 [conn10] end connection 172.17.0.3:53326 (6 connections now open)\r\n",
"Mon Nov 10 11:34:49.635 [conn1] end connection 172.17.42.1:36164 (7 connections now open)\r\n",
"Mon Nov 10 11:34:49.635 [conn12] end connection 172.17.0.4:36767 (4 connections now open)\r\n",
"Mon Nov 10 11:34:57.637 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:34:57.637 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:00.891 [initandlisten] connection accepted from 172.17.42.1:36315 #31 (4 connections now open)\r\n",
"Mon Nov 10 11:35:01.635 [rsMgr] replSet can't see a majority, will not try to elect self\r\n",
"Mon Nov 10 11:35:03.637 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:35:09.638 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:17.642 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:35:21.642 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:23.643 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:35:33.647 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:37.649 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:35:39.474 [initandlisten] connection accepted from 172.17.0.3:53482 #32 (5 connections now open)\r\n",
"Mon Nov 10 11:35:39.651 [rsHealthPoll] replSet member 172.17.0.3:27017 is up\r\n",
"Mon Nov 10 11:35:39.651 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:35:39.652 [rsMgr] not electing self, 172.17.0.3:27017 would veto with '172.17.0.2:27017 is trying to elect itself but 172.17.0.4:27017 is already primary and more up-to-date'\r\n",
"Mon Nov 10 11:35:40.498 [initandlisten] connection accepted from 172.17.0.4:36922 #33 (6 connections now open)\r\n",
"Mon Nov 10 11:35:40.657 [rsHealthPoll] replSet member 172.17.0.4:27017 is up\r\n",
"Mon Nov 10 11:35:40.657 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state PRIMARY\r\n",
"Mon Nov 10 11:35:40.693 [rsBackgroundSync] replSet syncing to: 172.17.0.4:27017\r\n",
"Mon Nov 10 11:35:41.537 [rsSyncNotifier] replset setting oplog notifier to 172.17.0.4:27017\r\n",
"Mon Nov 10 11:35:41.541 [rsSyncNotifier] build index local.me { _id: 1 }\r\n",
"Mon Nov 10 11:35:41.543 [rsSyncNotifier] build index done. scanned 0 total records. 0.002 secs\r\n",
"Mon Nov 10 11:35:57.487 [conn32] end connection 172.17.0.3:53482 (5 connections now open)\r\n",
"Mon Nov 10 11:35:57.487 [initandlisten] connection accepted from 172.17.0.3:53490 #34 (7 connections now open)\r\n",
"Mon Nov 10 11:35:58.513 [conn33] end connection 172.17.0.4:36922 (5 connections now open)\r\n",
"Mon Nov 10 11:35:58.513 [initandlisten] connection accepted from 172.17.0.4:36930 #35 (6 connections now open)\r\n"
]
}
],
"prompt_number": 125
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How about rs3_srv1's story?\n",
"\n",
"* through 11:29:4x, it starts to get into the replica set, agreeing to rs3_srv0 as PRIMARY at 11:29:48.472, and rebuilding itself through 11:30:03;\n",
"* at 11:32:38.346 it loses sight of rs3_srv2, resighting it at 11:34:06;\n",
"* at 11:34:40, lose sight of the current PRIMARY, rs3_srv0;\n",
"* at 11:34:40.460, rs3_srv1 chooses not to stand for PRIMARY becuase it thinks rs3_srv2 will veto it, even though the server rs3_srv2 thinks is PRIMARY is the one that rs3_srv2 can't see. *(??Do we need also need an example situation where secondaries can see each other but one of the secondaries can't see the PRIMARY?)*\n",
"* at 11:34:42 it votes for rs3_srv2 as PRIMARY; at 11:34:47, 11:34:47, and 11:34:58 it holds back from nominating itself becuase of its recent vote for someone else;\n",
"* at 11:35:00, rs3_srv2 becomes PRIMARY;\n",
"* at 11:35:39, rs3_srv0 rejoins as a SECONDARY."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail -500 {STUB}_srv1_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"note: noprealloc may hurt performance in many applications\r\n",
"Mon Nov 10 11:29:11.889 [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db/ 64-bit host=bae7907f61ba\r\n",
"Mon Nov 10 11:29:11.889 [initandlisten] db version v2.4.11\r\n",
"Mon Nov 10 11:29:11.889 [initandlisten] git version: fa13d1ee8da0f112f588570b4070f73d7af2f7fd\r\n",
"Mon Nov 10 11:29:11.889 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49\r\n",
"Mon Nov 10 11:29:11.889 [initandlisten] allocator: tcmalloc\r\n",
"Mon Nov 10 11:29:11.889 [initandlisten] options: { noprealloc: true, replSet: \"rs3\", smallfiles: true }\r\n",
"Mon Nov 10 11:29:11.897 [initandlisten] journal dir=/data/db/journal\r\n",
"Mon Nov 10 11:29:11.897 [initandlisten] recover : no journal files present, no recovery needed\r\n",
"Mon Nov 10 11:29:12.042 [FileAllocator] allocating new datafile /data/db/local.ns, filling with zeroes...\r\n",
"Mon Nov 10 11:29:12.042 [FileAllocator] creating directory /data/db/_tmp\r\n",
"Mon Nov 10 11:29:12.101 [FileAllocator] done allocating datafile /data/db/local.ns, size: 16MB, took 0.058 secs\r\n",
"Mon Nov 10 11:29:12.136 [FileAllocator] allocating new datafile /data/db/local.0, filling with zeroes...\r\n",
"Mon Nov 10 11:29:12.180 [FileAllocator] done allocating datafile /data/db/local.0, size: 16MB, took 0.041 secs\r\n",
"Mon Nov 10 11:29:12.181 [initandlisten] command local.$cmd command: { create: \"startup_log\", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 reslen:37 139ms\r\n",
"Mon Nov 10 11:29:12.182 [initandlisten] waiting for connections on port 27017\r\n",
"Mon Nov 10 11:29:12.183 [websvr] admin web console waiting for connections on port 28017\r\n",
"Mon Nov 10 11:29:12.199 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)\r\n",
"Mon Nov 10 11:29:12.199 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done\r\n",
"Mon Nov 10 11:29:19.446 [initandlisten] connection accepted from 172.17.42.1:38722 #1 (1 connection now open)\r\n",
"Mon Nov 10 11:29:21.327 [initandlisten] connection accepted from 172.17.0.2:34178 #2 (2 connections now open)\r\n",
"Mon Nov 10 11:29:21.328 [conn2] end connection 172.17.0.2:34178 (1 connection now open)\r\n",
"Mon Nov 10 11:29:21.330 [initandlisten] connection accepted from 172.17.0.2:34180 #3 (2 connections now open)\r\n",
"Mon Nov 10 11:29:22.201 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)\r\n",
"Mon Nov 10 11:29:32.202 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)\r\n",
"Mon Nov 10 11:29:42.205 [rsStart] trying to contact 172.17.0.2:27017\r\n",
"Mon Nov 10 11:29:42.208 [rsStart] replSet I am 172.17.0.3:27017\r\n",
"Mon Nov 10 11:29:42.208 [rsStart] replSet got config version 1 from a remote, saving locally\r\n",
"Mon Nov 10 11:29:42.208 [rsStart] replSet info saving a newer config version to local.system.replset\r\n",
"Mon Nov 10 11:29:42.211 [rsStart] replSet saveConfigLocally done\r\n",
"Mon Nov 10 11:29:42.211 [rsStart] replSet STARTUP2\r\n",
"Mon Nov 10 11:29:42.212 [rsSync] ******\r\n",
"Mon Nov 10 11:29:42.212 [rsSync] creating replication oplog of size: 1256MB...\r\n",
"Mon Nov 10 11:29:42.212 [FileAllocator] allocating new datafile /data/db/local.1, filling with zeroes...\r\n",
"Mon Nov 10 11:29:42.302 [initandlisten] connection accepted from 172.17.0.4:41232 #4 (3 connections now open)\r\n",
"Mon Nov 10 11:29:42.308 [initandlisten] connection accepted from 172.17.0.4:41233 #5 (4 connections now open)\r\n",
"Mon Nov 10 11:29:42.308 [conn4] end connection 172.17.0.4:41232 (3 connections now open)\r\n",
"Mon Nov 10 11:29:43.719 [FileAllocator] done allocating datafile /data/db/local.1, size: 511MB, took 1.506 secs\r\n",
"Mon Nov 10 11:29:43.720 [FileAllocator] allocating new datafile /data/db/local.2, filling with zeroes...\r\n",
"Mon Nov 10 11:29:44.214 [rsHealthPoll] replSet member 172.17.0.2:27017 is up\r\n",
"Mon Nov 10 11:29:44.215 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:29:44.221 [rsHealthPoll] replSet member 172.17.0.4:27017 is up\r\n",
"Mon Nov 10 11:29:44.222 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state STARTUP2\r\n",
"Mon Nov 10 11:29:45.377 [FileAllocator] done allocating datafile /data/db/local.2, size: 511MB, took 1.656 secs\r\n",
"Mon Nov 10 11:29:45.379 [FileAllocator] allocating new datafile /data/db/local.3, filling with zeroes...\r\n",
"Mon Nov 10 11:29:47.059 [FileAllocator] done allocating datafile /data/db/local.3, size: 511MB, took 1.679 secs\r\n",
"Mon Nov 10 11:29:47.070 [rsSync] ******\r\n",
"Mon Nov 10 11:29:47.071 [rsSync] replSet initial sync pending\r\n",
"Mon Nov 10 11:29:47.072 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync\r\n",
"Mon Nov 10 11:29:48.469 [conn3] end connection 172.17.0.2:34180 (2 connections now open)\r\n",
"Mon Nov 10 11:29:48.471 [initandlisten] connection accepted from 172.17.0.2:34190 #6 (3 connections now open)\r\n",
"Mon Nov 10 11:29:48.472 [conn6] replSet info voting yea for 172.17.0.2:27017 (0)\r\n",
"Mon Nov 10 11:29:50.222 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state PRIMARY\r\n",
"Mon Nov 10 11:30:03.072 [rsSync] replSet initial sync pending\r\n",
"Mon Nov 10 11:30:03.073 [rsSync] replSet syncing to: 172.17.0.2:27017\r\n",
"Mon Nov 10 11:30:03.079 [rsSync] build index local.me { _id: 1 }\r\n",
"Mon Nov 10 11:30:03.080 [rsSync] build index done. scanned 0 total records. 0.001 secs\r\n",
"Mon Nov 10 11:30:03.083 [rsSync] build index local.replset.minvalid { _id: 1 }\r\n",
"Mon Nov 10 11:30:03.084 [rsSync] build index done. scanned 0 total records. 0 secs\r\n",
"Mon Nov 10 11:30:03.084 [rsSync] replSet initial sync drop all databases\r\n",
"Mon Nov 10 11:30:03.084 [rsSync] dropAllDatabasesExceptLocal 1\r\n",
"Mon Nov 10 11:30:03.085 [rsSync] replSet initial sync clone all databases\r\n",
"Mon Nov 10 11:30:03.085 [rsSync] replSet initial sync data copy, starting syncup\r\n",
"Mon Nov 10 11:30:03.085 [rsSync] oplog sync 1 of 3\r\n",
"Mon Nov 10 11:30:03.086 [rsSync] oplog sync 2 of 3\r\n",
"Mon Nov 10 11:30:03.086 [rsSync] replSet initial sync building indexes\r\n",
"Mon Nov 10 11:30:03.086 [rsSync] oplog sync 3 of 3\r\n",
"Mon Nov 10 11:30:03.087 [rsSync] replSet initial sync finishing up\r\n",
"Mon Nov 10 11:30:03.088 [rsSync] replSet set minValid=5460a194:1\r\n",
"Mon Nov 10 11:30:03.090 [rsSync] replSet RECOVERING\r\n",
"Mon Nov 10 11:30:03.090 [rsSync] replSet initial sync done\r\n",
"Mon Nov 10 11:30:03.233 [rsBackgroundSync] replSet syncing to: 172.17.0.2:27017\r\n",
"Mon Nov 10 11:30:04.092 [rsSyncNotifier] replset setting oplog notifier to 172.17.0.2:27017\r\n",
"Mon Nov 10 11:30:04.242 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state RECOVERING\r\n",
"Mon Nov 10 11:30:05.097 [rsSync] replSet SECONDARY\r\n",
"Mon Nov 10 11:30:06.244 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:30:10.348 [conn5] end connection 172.17.0.4:41233 (2 connections now open)\r\n",
"Mon Nov 10 11:30:10.349 [initandlisten] connection accepted from 172.17.0.4:41245 #7 (3 connections now open)\r\n",
"Mon Nov 10 11:30:18.500 [conn6] end connection 172.17.0.2:34190 (2 connections now open)\r\n",
"Mon Nov 10 11:30:18.500 [initandlisten] connection accepted from 172.17.0.2:34202 #8 (4 connections now open)\r\n",
"Mon Nov 10 11:30:40.370 [conn7] end connection 172.17.0.4:41245 (2 connections now open)\r\n",
"Mon Nov 10 11:30:40.370 [initandlisten] connection accepted from 172.17.0.4:41251 #9 (3 connections now open)\r\n",
"Mon Nov 10 11:30:48.520 [conn8] end connection 172.17.0.2:34202 (2 connections now open)\r\n",
"Mon Nov 10 11:30:48.521 [initandlisten] connection accepted from 172.17.0.2:34208 #10 (4 connections now open)\r\n",
"Mon Nov 10 11:31:10.391 [conn9] end connection 172.17.0.4:41251 (2 connections now open)\r\n",
"Mon Nov 10 11:31:10.392 [initandlisten] connection accepted from 172.17.0.4:41257 #11 (4 connections now open)\r\n",
"Mon Nov 10 11:31:18.545 [conn10] end connection 172.17.0.2:34208 (2 connections now open)\r\n",
"Mon Nov 10 11:31:18.546 [initandlisten] connection accepted from 172.17.0.2:34214 #12 (4 connections now open)\r\n",
"Mon Nov 10 11:31:40.411 [conn11] end connection 172.17.0.4:41257 (2 connections now open)\r\n",
"Mon Nov 10 11:31:40.412 [initandlisten] connection accepted from 172.17.0.4:41263 #13 (3 connections now open)\r\n",
"Mon Nov 10 11:31:48.566 [conn12] end connection 172.17.0.2:34214 (2 connections now open)\r\n",
"Mon Nov 10 11:31:48.567 [initandlisten] connection accepted from 172.17.0.2:34220 #14 (3 connections now open)\r\n",
"Mon Nov 10 11:32:10.430 [conn13] end connection 172.17.0.4:41263 (2 connections now open)\r\n",
"Mon Nov 10 11:32:10.431 [initandlisten] connection accepted from 172.17.0.4:41269 #15 (3 connections now open)\r\n",
"Mon Nov 10 11:32:18.583 [conn14] end connection 172.17.0.2:34220 (2 connections now open)\r\n",
"Mon Nov 10 11:32:18.584 [initandlisten] connection accepted from 172.17.0.2:34226 #16 (3 connections now open)\r\n",
"Mon Nov 10 11:32:38.346 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Mon Nov 10 11:32:38.346 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:32:39.346 [rsHealthPoll] replSet info 172.17.0.4:27017 is down (or slow to respond): \r\n",
"Mon Nov 10 11:32:39.346 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state DOWN\r\n",
"Mon Nov 10 11:32:47.349 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:32:48.603 [conn16] end connection 172.17.0.2:34226 (2 connections now open)\r\n",
"Mon Nov 10 11:32:48.603 [initandlisten] connection accepted from 172.17.0.2:34240 #17 (4 connections now open)\r\n",
"Mon Nov 10 11:32:59.349 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:11.347 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:18.621 [conn17] end connection 172.17.0.2:34240 (2 connections now open)\r\n",
"Mon Nov 10 11:33:18.622 [initandlisten] connection accepted from 172.17.0.2:34261 #18 (3 connections now open)\r\n",
"Mon Nov 10 11:33:23.354 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:35.357 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:47.355 [rsHealthPoll] replset info 172.17.0.4:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:48.648 [conn18] end connection 172.17.0.2:34261 (2 connections now open)\r\n",
"Mon Nov 10 11:33:48.648 [initandlisten] connection accepted from 172.17.0.2:34282 #19 (3 connections now open)\r\n",
"Mon Nov 10 11:33:59.357 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:34:05.357 [rsHealthPoll] couldn't connect to 172.17.0.4:27017: couldn't connect to server 172.17.0.4:27017\r\n",
"Mon Nov 10 11:34:06.360 [rsHealthPoll] replset info 172.17.0.4:27017 thinks that we are down\r\n",
"Mon Nov 10 11:34:06.360 [rsHealthPoll] replSet member 172.17.0.4:27017 is up\r\n",
"Mon Nov 10 11:34:06.360 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:34:07.484 [initandlisten] connection accepted from 172.17.0.4:41339 #20 (4 connections now open)\r\n",
"Mon Nov 10 11:34:07.485 [initandlisten] connection accepted from 172.17.0.4:41340 #21 (5 connections now open)\r\n",
"Mon Nov 10 11:34:07.485 [conn20] end connection 172.17.0.4:41339 (4 connections now open)\r\n",
"Mon Nov 10 11:34:18.669 [conn19] end connection 172.17.0.2:34282 (3 connections now open)\r\n",
"Mon Nov 10 11:34:18.669 [initandlisten] connection accepted from 172.17.0.2:34301 #22 (4 connections now open)\r\n",
"Mon Nov 10 11:34:35.508 [conn21] end connection 172.17.0.4:41340 (3 connections now open)\r\n",
"Mon Nov 10 11:34:35.509 [initandlisten] connection accepted from 172.17.0.4:41350 #23 (4 connections now open)\r\n",
"Mon Nov 10 11:34:40.458 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Mon Nov 10 11:34:40.458 [rsHealthPoll] replSet info 172.17.0.2:27017 is down (or slow to respond): \r\n",
"Mon Nov 10 11:34:40.458 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state DOWN\r\n",
"Mon Nov 10 11:34:40.460 [rsMgr] not electing self, 172.17.0.4:27017 would veto with '172.17.0.3:27017 is trying to elect itself but 172.17.0.2:27017 is already primary and more up-to-date'\r\n",
"Mon Nov 10 11:34:42.111 [conn23] replSet info voting yea for 172.17.0.4:27017 (2)\r\n",
"Mon Nov 10 11:34:47.189 [rsMgr] replSet not trying to elect self as responded yea to someone else recently\r\n",
"Mon Nov 10 11:34:48.461 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:34:52.952 [rsMgr] replSet not trying to elect self as responded yea to someone else recently\r\n",
"Mon Nov 10 11:34:58.978 [rsMgr] replSet not trying to elect self as responded yea to someone else recently\r\n",
"Mon Nov 10 11:34:59.283 [rsBackgroundSync] Socket recv() timeout 172.17.0.2:27017\r\n",
"Mon Nov 10 11:34:59.283 [rsBackgroundSync] SocketException: remote: 172.17.0.2:27017 error: 9001 socket exception [RECV_TIMEOUT] server [172.17.0.2:27017] \r\n",
"Mon Nov 10 11:34:59.284 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: 172.17.0.2:27017\r\n",
"Mon Nov 10 11:35:00.406 [rsHealthPoll] replSet member 172.17.0.4:27017 is now in state PRIMARY\r\n",
"Mon Nov 10 11:35:00.464 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:01.284 [rsBackgroundSync] replSet syncing to: 172.17.0.4:27017\r\n",
"Mon Nov 10 11:35:05.530 [conn23] end connection 172.17.0.4:41350 (3 connections now open)\r\n",
"Mon Nov 10 11:35:05.531 [initandlisten] connection accepted from 172.17.0.4:41374 #24 (4 connections now open)\r\n",
"Mon Nov 10 11:35:12.464 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:24.467 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:35.553 [conn24] end connection 172.17.0.4:41374 (3 connections now open)\r\n",
"Mon Nov 10 11:35:35.553 [initandlisten] connection accepted from 172.17.0.4:41395 #25 (5 connections now open)\r\n",
"Mon Nov 10 11:35:36.470 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:39.475 [rsHealthPoll] replset info 172.17.0.2:27017 thinks that we are down\r\n",
"Mon Nov 10 11:35:39.475 [rsHealthPoll] replSet member 172.17.0.2:27017 is up\r\n",
"Mon Nov 10 11:35:39.476 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:35:39.650 [initandlisten] connection accepted from 172.17.0.2:34355 #26 (5 connections now open)\r\n",
"Mon Nov 10 11:35:45.656 [conn26] end connection 172.17.0.2:34355 (4 connections now open)\r\n",
"Mon Nov 10 11:35:45.656 [initandlisten] connection accepted from 172.17.0.2:34359 #27 (5 connections now open)\r\n",
"Mon Nov 10 11:36:05.581 [initandlisten] connection accepted from 172.17.0.4:41408 #28 (6 connections now open)\r\n",
"Mon Nov 10 11:36:05.581 [conn25] end connection 172.17.0.4:41395 (5 connections now open)\r\n",
"Mon Nov 10 11:36:15.680 [conn27] end connection 172.17.0.2:34359 (4 connections now open)\r\n",
"Mon Nov 10 11:36:15.680 [initandlisten] connection accepted from 172.17.0.2:34365 #29 (5 connections now open)\r\n"
]
}
],
"prompt_number": 126
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How did things look from the perspective of rs3_2?\n",
"\n",
"* as with rs3_srv1, we have initialisation up to 11:30:04;\n",
"* at 11:32:34.449, spot the first connection error; this server is lost in the wilderness...\n",
"* at 11:34:07, come back up, not bothering to try to take on the mantle of PRIMARY;\n",
"* at 11:34:40, lose sight of rs3_srv0, the current PRIMARY;\n",
"* at 11:34:42.110, make a pitch to become PRIMARY;\n",
"* at 11:34:58.999, become PRIMARY;\n",
"* at 11:35:40.499, rs3_srv0 comes back, letting us know it thought we were down, and is accepted back in as SECONDARY."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail -500 {STUB}_srv2_log.txt"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"note: noprealloc may hurt performance in many applications\r\n",
"Mon Nov 10 11:29:12.064 [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db/ 64-bit host=7b694cdab9f8\r\n",
"Mon Nov 10 11:29:12.064 [initandlisten] db version v2.4.11\r\n",
"Mon Nov 10 11:29:12.064 [initandlisten] git version: fa13d1ee8da0f112f588570b4070f73d7af2f7fd\r\n",
"Mon Nov 10 11:29:12.064 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49\r\n",
"Mon Nov 10 11:29:12.064 [initandlisten] allocator: tcmalloc\r\n",
"Mon Nov 10 11:29:12.064 [initandlisten] options: { noprealloc: true, replSet: \"rs3\", smallfiles: true }\r\n",
"Mon Nov 10 11:29:12.070 [initandlisten] journal dir=/data/db/journal\r\n",
"Mon Nov 10 11:29:12.071 [initandlisten] recover : no journal files present, no recovery needed\r\n",
"Mon Nov 10 11:29:12.172 [FileAllocator] allocating new datafile /data/db/local.ns, filling with zeroes...\r\n",
"Mon Nov 10 11:29:12.172 [FileAllocator] creating directory /data/db/_tmp\r\n",
"Mon Nov 10 11:29:12.241 [FileAllocator] done allocating datafile /data/db/local.ns, size: 16MB, took 0.067 secs\r\n",
"Mon Nov 10 11:29:12.260 [FileAllocator] allocating new datafile /data/db/local.0, filling with zeroes...\r\n",
"Mon Nov 10 11:29:12.288 [FileAllocator] done allocating datafile /data/db/local.0, size: 16MB, took 0.027 secs\r\n",
"Mon Nov 10 11:29:12.288 [initandlisten] command local.$cmd command: { create: \"startup_log\", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 reslen:37 117ms\r\n",
"Mon Nov 10 11:29:12.289 [initandlisten] waiting for connections on port 27017\r\n",
"Mon Nov 10 11:29:12.290 [websvr] admin web console waiting for connections on port 28017\r\n",
"Mon Nov 10 11:29:12.293 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)\r\n",
"Mon Nov 10 11:29:12.293 [rsStart] replSet info you may need to run replSetInitiate -- rs.initiate() in the shell -- if that is not already done\r\n",
"Mon Nov 10 11:29:19.451 [initandlisten] connection accepted from 172.17.42.1:43299 #1 (1 connection now open)\r\n",
"Mon Nov 10 11:29:21.329 [initandlisten] connection accepted from 172.17.0.2:58014 #2 (2 connections now open)\r\n",
"Mon Nov 10 11:29:21.330 [conn2] end connection 172.17.0.2:58014 (1 connection now open)\r\n",
"Mon Nov 10 11:29:21.333 [initandlisten] connection accepted from 172.17.0.2:58016 #3 (2 connections now open)\r\n",
"Mon Nov 10 11:29:22.295 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)\r\n",
"Mon Nov 10 11:29:32.295 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)\r\n",
"Mon Nov 10 11:29:42.207 [initandlisten] connection accepted from 172.17.0.3:60156 #4 (3 connections now open)\r\n",
"Mon Nov 10 11:29:42.208 [conn4] end connection 172.17.0.3:60156 (2 connections now open)\r\n",
"Mon Nov 10 11:29:42.299 [rsStart] trying to contact 172.17.0.2:27017\r\n",
"Mon Nov 10 11:29:42.303 [rsStart] replSet I am 172.17.0.4:27017\r\n",
"Mon Nov 10 11:29:42.303 [rsStart] replSet got config version 1 from a remote, saving locally\r\n",
"Mon Nov 10 11:29:42.303 [rsStart] replSet info saving a newer config version to local.system.replset\r\n",
"Mon Nov 10 11:29:42.306 [rsStart] replSet saveConfigLocally done\r\n",
"Mon Nov 10 11:29:42.306 [rsStart] replSet STARTUP2\r\n",
"Mon Nov 10 11:29:42.307 [rsSync] ******\r\n",
"Mon Nov 10 11:29:42.307 [rsSync] creating replication oplog of size: 1253MB...\r\n",
"Mon Nov 10 11:29:42.308 [FileAllocator] allocating new datafile /data/db/local.1, filling with zeroes...\r\n",
"Mon Nov 10 11:29:42.309 [rsHealthPoll] replset info 172.17.0.3:27017 thinks that we are down\r\n",
"Mon Nov 10 11:29:42.309 [rsHealthPoll] replSet member 172.17.0.3:27017 is up\r\n",
"Mon Nov 10 11:29:42.309 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state STARTUP2\r\n",
"Mon Nov 10 11:29:43.945 [FileAllocator] done allocating datafile /data/db/local.1, size: 511MB, took 1.629 secs\r\n",
"Mon Nov 10 11:29:43.968 [FileAllocator] allocating new datafile /data/db/local.2, filling with zeroes...\r\n",
"Mon Nov 10 11:29:44.209 [initandlisten] connection accepted from 172.17.0.3:60161 #5 (3 connections now open)\r\n",
"Mon Nov 10 11:29:44.307 [rsHealthPoll] replSet member 172.17.0.2:27017 is up\r\n",
"Mon Nov 10 11:29:44.307 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:29:45.507 [FileAllocator] done allocating datafile /data/db/local.2, size: 511MB, took 1.537 secs\r\n",
"Mon Nov 10 11:29:45.528 [FileAllocator] allocating new datafile /data/db/local.3, filling with zeroes...\r\n",
"Mon Nov 10 11:29:46.854 [FileAllocator] done allocating datafile /data/db/local.3, size: 511MB, took 1.325 secs\r\n",
"Mon Nov 10 11:29:46.866 [rsSync] ******\r\n",
"Mon Nov 10 11:29:46.866 [rsSync] replSet initial sync pending\r\n",
"Mon Nov 10 11:29:46.867 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync\r\n",
"Mon Nov 10 11:29:48.471 [conn3] replSet info voting yea for 172.17.0.2:27017 (0)\r\n",
"Mon Nov 10 11:29:50.324 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state PRIMARY\r\n",
"Mon Nov 10 11:30:02.478 [conn3] end connection 172.17.0.2:58016 (2 connections now open)\r\n",
"Mon Nov 10 11:30:02.479 [initandlisten] connection accepted from 172.17.0.2:58028 #6 (3 connections now open)\r\n",
"Mon Nov 10 11:30:02.870 [rsSync] replSet initial sync pending\r\n",
"Mon Nov 10 11:30:02.870 [rsSync] replSet syncing to: 172.17.0.2:27017\r\n",
"Mon Nov 10 11:30:02.875 [rsSync] build index local.me { _id: 1 }\r\n",
"Mon Nov 10 11:30:02.881 [rsSync] build index done. scanned 0 total records. 0.005 secs\r\n",
"Mon Nov 10 11:30:02.883 [rsSync] build index local.replset.minvalid { _id: 1 }\r\n",
"Mon Nov 10 11:30:02.884 [rsSync] build index done. scanned 0 total records. 0 secs\r\n",
"Mon Nov 10 11:30:02.884 [rsSync] replSet initial sync drop all databases\r\n",
"Mon Nov 10 11:30:02.884 [rsSync] dropAllDatabasesExceptLocal 1\r\n",
"Mon Nov 10 11:30:02.884 [rsSync] replSet initial sync clone all databases\r\n",
"Mon Nov 10 11:30:02.887 [rsSync] replSet initial sync data copy, starting syncup\r\n",
"Mon Nov 10 11:30:02.887 [rsSync] oplog sync 1 of 3\r\n",
"Mon Nov 10 11:30:02.888 [rsSync] oplog sync 2 of 3\r\n",
"Mon Nov 10 11:30:02.888 [rsSync] replSet initial sync building indexes\r\n",
"Mon Nov 10 11:30:02.888 [rsSync] oplog sync 3 of 3\r\n",
"Mon Nov 10 11:30:02.888 [rsSync] replSet initial sync finishing up\r\n",
"Mon Nov 10 11:30:02.891 [rsSync] replSet set minValid=5460a194:1\r\n",
"Mon Nov 10 11:30:02.892 [rsSync] replSet RECOVERING\r\n",
"Mon Nov 10 11:30:02.892 [rsSync] replSet initial sync done\r\n",
"Mon Nov 10 11:30:03.319 [rsBackgroundSync] replSet syncing to: 172.17.0.2:27017\r\n",
"Mon Nov 10 11:30:03.893 [rsSyncNotifier] replset setting oplog notifier to 172.17.0.2:27017\r\n",
"Mon Nov 10 11:30:04.342 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state RECOVERING\r\n",
"Mon Nov 10 11:30:04.894 [rsSync] replSet SECONDARY\r\n",
"Mon Nov 10 11:30:06.344 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:30:12.248 [conn5] end connection 172.17.0.3:60161 (2 connections now open)\r\n",
"Mon Nov 10 11:30:12.249 [initandlisten] connection accepted from 172.17.0.3:60173 #7 (3 connections now open)\r\n",
"Mon Nov 10 11:30:32.505 [conn6] end connection 172.17.0.2:58028 (2 connections now open)\r\n",
"Mon Nov 10 11:30:32.506 [initandlisten] connection accepted from 172.17.0.2:58040 #8 (3 connections now open)\r\n",
"Mon Nov 10 11:30:42.271 [conn7] end connection 172.17.0.3:60173 (2 connections now open)\r\n",
"Mon Nov 10 11:30:42.272 [initandlisten] connection accepted from 172.17.0.3:60179 #9 (3 connections now open)\r\n",
"Mon Nov 10 11:31:02.526 [conn8] end connection 172.17.0.2:58040 (2 connections now open)\r\n",
"Mon Nov 10 11:31:02.527 [initandlisten] connection accepted from 172.17.0.2:58046 #10 (4 connections now open)\r\n",
"Mon Nov 10 11:31:12.290 [conn9] end connection 172.17.0.3:60179 (2 connections now open)\r\n",
"Mon Nov 10 11:31:12.291 [initandlisten] connection accepted from 172.17.0.3:60185 #11 (3 connections now open)\r\n",
"Mon Nov 10 11:31:32.555 [conn10] end connection 172.17.0.2:58046 (2 connections now open)\r\n",
"Mon Nov 10 11:31:32.556 [initandlisten] connection accepted from 172.17.0.2:58052 #12 (3 connections now open)\r\n",
"Mon Nov 10 11:31:42.311 [conn11] end connection 172.17.0.3:60185 (2 connections now open)\r\n",
"Mon Nov 10 11:31:42.312 [initandlisten] connection accepted from 172.17.0.3:60191 #13 (3 connections now open)\r\n",
"Mon Nov 10 11:32:02.574 [conn12] end connection 172.17.0.2:58052 (2 connections now open)\r\n",
"Mon Nov 10 11:32:02.575 [initandlisten] connection accepted from 172.17.0.2:58058 #14 (3 connections now open)\r\n",
"Mon Nov 10 11:32:12.333 [conn13] end connection 172.17.0.3:60191 (2 connections now open)\r\n",
"Mon Nov 10 11:32:12.334 [initandlisten] connection accepted from 172.17.0.3:60197 #15 (3 connections now open)\r\n",
"Mon Nov 10 11:32:34.449 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:32:38.446 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Mon Nov 10 11:32:38.446 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:32:39.446 [rsHealthPoll] replSet info 172.17.0.3:27017 is down (or slow to respond): \r\n",
"Mon Nov 10 11:32:39.447 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state DOWN\r\n",
"Mon Nov 10 11:32:40.450 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:32:46.452 [rsHealthPoll] replSet info 172.17.0.2:27017 is down (or slow to respond): \r\n",
"Mon Nov 10 11:32:46.452 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state DOWN\r\n",
"Mon Nov 10 11:32:46.452 [rsMgr] replSet can't see a majority, will not try to elect self\r\n",
"Mon Nov 10 11:32:47.450 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:32:53.857 [rsBackgroundSync] Socket recv() timeout 172.17.0.2:27017\r\n",
"Mon Nov 10 11:32:53.857 [rsBackgroundSync] SocketException: remote: 172.17.0.2:27017 error: 9001 socket exception [RECV_TIMEOUT] server [172.17.0.2:27017] \r\n",
"Mon Nov 10 11:32:53.858 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: 172.17.0.2:27017\r\n",
"Mon Nov 10 11:32:54.453 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:32:59.459 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:00.454 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:33:11.466 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:14.459 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:33:20.459 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:33:23.468 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:34.463 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:33:35.470 [rsHealthPoll] replset info 172.17.0.3:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:33:40.464 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:33:47.478 [rsHealthPoll] couldn't connect to 172.17.0.3:27017: couldn't connect to server 172.17.0.3:27017\r\n",
"Mon Nov 10 11:33:53.479 [rsHealthPoll] couldn't connect to 172.17.0.3:27017: couldn't connect to server 172.17.0.3:27017\r\n",
"Mon Nov 10 11:33:54.467 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:34:00.470 [rsHealthPoll] couldn't connect to 172.17.0.2:27017: couldn't connect to server 172.17.0.2:27017\r\n",
"Mon Nov 10 11:34:06.359 [initandlisten] connection accepted from 172.17.0.3:60265 #16 (4 connections now open)\r\n",
"Mon Nov 10 11:34:07.483 [rsHealthPoll] couldn't connect to 172.17.0.3:27017: couldn't connect to server 172.17.0.3:27017\r\n",
"Mon Nov 10 11:34:07.486 [rsHealthPoll] replSet member 172.17.0.3:27017 is up\r\n",
"Mon Nov 10 11:34:07.486 [rsHealthPoll] replSet member 172.17.0.3:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:34:07.487 [rsMgr] not electing self, 172.17.0.3:27017 would veto with '172.17.0.4:27017 is trying to elect itself but 172.17.0.2:27017 is already primary and more up-to-date'\r\n",
"Mon Nov 10 11:34:07.612 [initandlisten] connection accepted from 172.17.0.2:58127 #17 (5 connections now open)\r\n",
"Mon Nov 10 11:34:08.361 [conn16] end connection 172.17.0.3:60265 (4 connections now open)\r\n",
"Mon Nov 10 11:34:08.361 [initandlisten] connection accepted from 172.17.0.3:60268 #18 (6 connections now open)\r\n",
"Mon Nov 10 11:34:08.475 [rsHealthPoll] replSet member 172.17.0.2:27017 is up\r\n",
"Mon Nov 10 11:34:08.475 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state PRIMARY\r\n",
"Mon Nov 10 11:34:08.917 [rsBackgroundSync] replSet syncing to: 172.17.0.2:27017\r\n",
"Mon Nov 10 11:34:09.613 [conn17] end connection 172.17.0.2:58127 (4 connections now open)\r\n",
"Mon Nov 10 11:34:09.614 [initandlisten] connection accepted from 172.17.0.2:58135 #19 (5 connections now open)\r\n",
"Mon Nov 10 11:34:36.385 [conn18] end connection 172.17.0.3:60268 (4 connections now open)\r\n",
"Mon Nov 10 11:34:36.386 [initandlisten] connection accepted from 172.17.0.3:60278 #20 (5 connections now open)\r\n",
"Mon Nov 10 11:34:40.490 [rsHealthPoll] DBClientCursor::init call() failed\r\n",
"Mon Nov 10 11:34:40.490 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:34:41.491 [rsHealthPoll] replSet info 172.17.0.2:27017 is down (or slow to respond): \r\n",
"Mon Nov 10 11:34:41.492 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state DOWN\r\n",
"Mon Nov 10 11:34:42.110 [rsMgr] replSet info electSelf 2\r\n",
"Mon Nov 10 11:34:49.494 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:34:58.998 [rsBackgroundSync] Socket recv() timeout 172.17.0.2:27017\r\n",
"Mon Nov 10 11:34:58.998 [rsBackgroundSync] SocketException: remote: 172.17.0.2:27017 error: 9001 socket exception [RECV_TIMEOUT] server [172.17.0.2:27017] \r\n",
"Mon Nov 10 11:34:58.998 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: 172.17.0.2:27017\r\n",
"Mon Nov 10 11:34:58.999 [rsMgr] replSet PRIMARY\r\n",
"Mon Nov 10 11:35:01.285 [initandlisten] connection accepted from 172.17.0.3:60297 #21 (6 connections now open)\r\n",
"Mon Nov 10 11:35:01.496 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:06.410 [conn20] end connection 172.17.0.3:60278 (5 connections now open)\r\n",
"Mon Nov 10 11:35:06.411 [initandlisten] connection accepted from 172.17.0.3:60302 #22 (6 connections now open)\r\n",
"Mon Nov 10 11:35:13.495 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:25.496 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:36.434 [conn22] end connection 172.17.0.3:60302 (5 connections now open)\r\n",
"Mon Nov 10 11:35:36.435 [initandlisten] connection accepted from 172.17.0.3:60323 #23 (6 connections now open)\r\n",
"Mon Nov 10 11:35:37.496 [rsHealthPoll] replset info 172.17.0.2:27017 heartbeat failed, retrying\r\n",
"Mon Nov 10 11:35:40.499 [rsHealthPoll] replset info 172.17.0.2:27017 thinks that we are down\r\n",
"Mon Nov 10 11:35:40.499 [rsHealthPoll] replSet member 172.17.0.2:27017 is up\r\n",
"Mon Nov 10 11:35:40.499 [rsHealthPoll] replSet member 172.17.0.2:27017 is now in state SECONDARY\r\n",
"Mon Nov 10 11:35:40.656 [initandlisten] connection accepted from 172.17.0.2:58189 #24 (7 connections now open)\r\n",
"Mon Nov 10 11:35:40.656 [initandlisten] connection accepted from 172.17.0.2:58191 #25 (8 connections now open)\r\n",
"Mon Nov 10 11:35:40.656 [conn24] end connection 172.17.0.2:58189 (6 connections now open)\r\n",
"Mon Nov 10 11:35:40.693 [initandlisten] connection accepted from 172.17.0.2:58192 #26 (8 connections now open)\r\n",
"Mon Nov 10 11:35:41.538 [initandlisten] connection accepted from 172.17.0.2:58193 #27 (9 connections now open)\r\n",
"Mon Nov 10 11:35:42.549 [slaveTracking] build index local.slaves { _id: 1 }\r\n",
"Mon Nov 10 11:35:42.552 [slaveTracking] build index done. scanned 0 total records. 0.002 secs\r\n",
"Mon Nov 10 11:36:04.674 [conn25] end connection 172.17.0.2:58191 (8 connections now open)\r\n",
"Mon Nov 10 11:36:04.675 [initandlisten] connection accepted from 172.17.0.2:58197 #28 (9 connections now open)\r\n",
"Mon Nov 10 11:36:06.462 [conn23] end connection 172.17.0.3:60323 (8 connections now open)\r\n",
"Mon Nov 10 11:36:06.463 [initandlisten] connection accepted from 172.17.0.3:60336 #29 (9 connections now open)\r\n"
]
}
],
"prompt_number": 127
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Tidy up\n",
"#Remove the blockade rules\n",
"clear_iptables('test1w2s2')\n",
"#Shut down the containers\n",
"tidyAwayContainers(c,['rs3_srv0','rs3_srv1','rs3_srv2'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 128
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Writing to Replica Sets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Need examples of writing to different servers in the replica set under different partion conditions to show what happens to the replication in each case? Eg in matters of eventual consistency, etc?"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"PS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I think there's a lot to be said for giving students a transcript such as this showing a worked example. Having got their eye in, we can perhaps get them to run slightly different scenarios, eg a disconnect between rs3_srv0 and rs3_srv2, with connections still up between rs3_srv0 and rs3_srv1, and rs3_srv1 and rs3_srv2. I'm not sure if we can do one-way breaks? (eg so rs3_srv1 can see rs3_srv0 but rs3_srv1 can't see rs3_srv0?)\n",
"\n",
"Also, students could run scenarios of inserting data into different machines under different network conditions?\n",
"\n",
"```\n",
"testclient= MongoReplicaSetClient('{0}:{1}'.format(getContainIPaddress(c,STUB+'_srv0'),27017), replicaSet=STUB)\n",
"testdb=testclient.testdb\n",
"testcollection=testdb.testcollection\n",
"testcollection.insert({'name':'test1'})\n",
"testcollection.find()\n",
"```"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment