kshailen/Stopping one etcd machine in 3 node etcd cluster

## Stopping one etcd machine in 3 node etcd cluster
1.Test procedure
A- Take cluster of 3 etcd noses as shown below
	core@core-02 ~ $ etcdctl cluster-health
	member 348dd9a63bc9c9d3 is healthy: got healthy result from http://172.17.8.102:2379
	member 7d26e3d2ee11a98e is healthy: got healthy result from http://172.17.8.103:2379
	member 95d2e7af71fc961d is healthy: got healthy result from http://172.17.8.101:2379
	cluster is healthy
	core@core-02 ~ $ etcdctl member list
	348dd9a63bc9c9d3: name=219d42232433483c8ad19163ba1c6020 peerURLs=http://172.17.8.102:2380 clientURLs=http://172.17.8.102:2379 isLeader=true
	7d26e3d2ee11a98e: name=edbfd19500b0496485e286801bdfa04b peerURLs=http://172.17.8.103:2380 clientURLs=http://172.17.8.103:2379 isLeader=false
	95d2e7af71fc961d: name=5b74677278ea4b6ca5dcc43262d2b0e5 peerURLs=http://172.17.8.101:2380 clientURLs=http://172.17.8.101:2379 isLeader=false
	core@core-02 ~ $
	core@core-02 ~ $
	core@core-02 ~ $
B- Put a key valu pair using no leader member of etcd cluster as shown below
	core@core-02 ~ $ curl -X PUT http://172.17.8.101:2379/v2/keys/message -d value="Hello"
	{"action":"set","node":{"key":"/message","value":"Hello","modifiedIndex":1879285,"createdIndex":1879285}}
	core@core-02 ~ $

C- Shutdown 172.17.8.101 and see cluster health.

	core@core-02 ~ $ ping 172.17.8.101
	PING 172.17.8.101 (172.17.8.101) 56(84) bytes of data.
	^C
	--- 172.17.8.101 ping statistics ---
	3 packets transmitted, 0 received, 100% packet loss, time 2032ms

	core@core-02 ~ $ etcdctl member list
	348dd9a63bc9c9d3: name=219d42232433483c8ad19163ba1c6020 peerURLs=http://172.17.8.102:2380 clientURLs=http://172.17.8.102:2379 isLeader=true
	7d26e3d2ee11a98e: name=edbfd19500b0496485e286801bdfa04b peerURLs=http://172.17.8.103:2380 clientURLs=http://172.17.8.103:2379 isLeader=false
	95d2e7af71fc961d: name=5b74677278ea4b6ca5dcc43262d2b0e5 peerURLs=http://172.17.8.101:2380 clientURLs=http://172.17.8.101:2379 isLeader=false
	core@core-02 ~ $ etcdctl cluster-health
	member 348dd9a63bc9c9d3 is healthy: got healthy result from http://172.17.8.102:2379
	member 7d26e3d2ee11a98e is healthy: got healthy result from http://172.17.8.103:2379
	failed to check the health of member 95d2e7af71fc961d on http://172.17.8.101:2379: Get http://172.17.8.101:2379/health: dial tcp 172.17.8.101:2379: i/o timeout
	member 95d2e7af71fc961d is unreachable: [http://172.17.8.101:2379] are all unreachable
	cluster is healthy
	core@core-02 ~ $


2.Pass criteria:

We should be able to get key/value pair message/Hello, if one member is down in a cluster of 3 nodes. And in heath check cluster should be healthy.

Fault Tolerance Table:
It is recommended to have an odd number of members in a cluster. Having an odd cluster size doesn't change the number needed for majority, but you gain a higher tolerance for failure by adding the extra member. You can see this in practice when comparing even and odd sized clusters:

CLUSTER SIZE	MAJORITY	FAILURE TOLERANCE
1				1				0
2				2				0
3				2				1
4				3				1
5				3				2
6				4				2
7				4				3
8				5				3
9				5				4

3.Result

	core@core-02 ~ $ curl -X GET http://172.17.8.102:2379/v2/keys/message
	{"action":"get","node":{"key":"/message","value":"Hello","modifiedIndex":1879285,"createdIndex":1879285}}
	core@core-02 ~ $


	core@core-02 ~ $ curl -L http://127.0.0.1:2379/health
	{"health": "true"}core@core-02 ~ $
	core@core-02 ~ $


4.If fails Recovery steps
IF it fails then we should bring this VM up from Hypervisor and restart etcd service on this VM.
	1.Test procedure
	A- Take cluster of 3 etcd noses as shown below
	core@core-02 ~ $ etcdctl cluster-health
	member 348dd9a63bc9c9d3 is healthy: got healthy result from http://172.17.8.102:2379
	member 7d26e3d2ee11a98e is healthy: got healthy result from http://172.17.8.103:2379
	member 95d2e7af71fc961d is healthy: got healthy result from http://172.17.8.101:2379
	cluster is healthy
	core@core-02 ~ $ etcdctl member list
	348dd9a63bc9c9d3: name=219d42232433483c8ad19163ba1c6020 peerURLs=http://172.17.8.102:2380 clientURLs=http://172.17.8.102:2379 isLeader=true
	7d26e3d2ee11a98e: name=edbfd19500b0496485e286801bdfa04b peerURLs=http://172.17.8.103:2380 clientURLs=http://172.17.8.103:2379 isLeader=false
	95d2e7af71fc961d: name=5b74677278ea4b6ca5dcc43262d2b0e5 peerURLs=http://172.17.8.101:2380 clientURLs=http://172.17.8.101:2379 isLeader=false
	core@core-02 ~ $
	core@core-02 ~ $
	core@core-02 ~ $
	B- Put a key valu pair using no leader member of etcd cluster as shown below
	core@core-02 ~ $ curl -X PUT http://172.17.8.101:2379/v2/keys/message -d value="Hello"
	{"action":"set","node":{"key":"/message","value":"Hello","modifiedIndex":1879285,"createdIndex":1879285}}
	core@core-02 ~ $

	C- Shutdown 172.17.8.101 and see cluster health.

	core@core-02 ~ $ ping 172.17.8.101
	PING 172.17.8.101 (172.17.8.101) 56(84) bytes of data.
	^C
	--- 172.17.8.101 ping statistics ---
	3 packets transmitted, 0 received, 100% packet loss, time 2032ms

	core@core-02 ~ $ etcdctl member list
	348dd9a63bc9c9d3: name=219d42232433483c8ad19163ba1c6020 peerURLs=http://172.17.8.102:2380 clientURLs=http://172.17.8.102:2379 isLeader=true
	7d26e3d2ee11a98e: name=edbfd19500b0496485e286801bdfa04b peerURLs=http://172.17.8.103:2380 clientURLs=http://172.17.8.103:2379 isLeader=false
	95d2e7af71fc961d: name=5b74677278ea4b6ca5dcc43262d2b0e5 peerURLs=http://172.17.8.101:2380 clientURLs=http://172.17.8.101:2379 isLeader=false
	core@core-02 ~ $ etcdctl cluster-health
	member 348dd9a63bc9c9d3 is healthy: got healthy result from http://172.17.8.102:2379
	member 7d26e3d2ee11a98e is healthy: got healthy result from http://172.17.8.103:2379
	failed to check the health of member 95d2e7af71fc961d on http://172.17.8.101:2379: Get http://172.17.8.101:2379/health: dial tcp 172.17.8.101:2379: i/o timeout
	member 95d2e7af71fc961d is unreachable: [http://172.17.8.101:2379] are all unreachable
	cluster is healthy
	core@core-02 ~ $


	2.Pass criteria:

	We should be able to get key/value pair message/Hello, if one member is down in a cluster of 3 nodes. And in heath check cluster should be healthy.

	Fault Tolerance Table:
	It is recommended to have an odd number of members in a cluster. Having an odd cluster size doesn't change the number needed for majority, but you gain a higher tolerance for failure by adding the extra member. You can see this in practice when comparing even and odd sized clusters:

	CLUSTER SIZE MAJORITY FAILURE TOLERANCE
	1 1 0
	2 2 0
	3 2 1
	4 3 1
	5 3 2
	6 4 2
	7 4 3
	8 5 3
	9 5 4

	3.Result

	core@core-02 ~ $ curl -X GET http://172.17.8.102:2379/v2/keys/message
	{"action":"get","node":{"key":"/message","value":"Hello","modifiedIndex":1879285,"createdIndex":1879285}}
	core@core-02 ~ $


	core@core-02 ~ $ curl -L http://127.0.0.1:2379/health
	{"health": "true"}core@core-02 ~ $
	core@core-02 ~ $



	4.If fails Recovery steps
	IF it fails then we should bring this VM up from Hypervisor and restart etcd service on this VM.