chrislovecnm/rolling-update-2s.md

## rolling-update-2s.md

      
    Raw
  

              rolling-update-2s.md
            
          
    Rolling Update Testing Notes

Attempt to recreate problem the @justinsb is having with 2s interval
Create cluster
kops create cluster --zones us-east-1c --name rolling-update.aws.k8spro.com --yes
Validate Cluster
$ kops validate cluster
Using cluster from kubectl context: rolling-update.aws.k8spro.com

Validating cluster rolling-update.aws.k8spro.com

INSTANCE GROUPS
NAME			ROLE	MACHINETYPE	MIN	MAX	SUBNETS
master-us-east-1c	Master	m3.medium	1	1	us-east-1c
nodes			Node	t2.medium	2	2	us-east-1c

NODE STATUS
NAME				ROLE	READY
ip-172-20-36-149.ec2.internal	node	True
ip-172-20-51-88.ec2.internal	node	True
ip-172-20-52-25.ec2.internal	master	True

Your cluster rolling-update.aws.k8spro.com is ready
Upgrade the cluster
$  kops upgrade cluster --channel alpha --yes
Using cluster from kubectl context: rolling-update.aws.k8spro.com

ITEM	PROPERTY		OLD	NEW
Cluster	Channel			stable	alpha
Cluster	KubernetesVersion	1.7.2	1.7.4

Updates applied to configuration.
You can now apply these changes, using `kops update cluster rolling-update.aws.k8spro.com`
Update the cluster
kops update cluster rolling-update.aws.k8spro.com --yes
See how rolling update will work
$ kops rolling-update cluster --node-interval 2s --master-interval 2s
Using cluster from kubectl context: rolling-update.aws.k8spro.com

NAME			STATUS		NEEDUPDATE	READY	MIN	MAX	NODES
master-us-east-1c	NeedsUpdate	1		0	1	1	1
nodes			NeedsUpdate	2		0	2	2	2

Must specify --yes to rolling-update.
Roll the cluster
$ kops rolling-update cluster --node-interval 2s --master-interval 2s --yes
Pods are drain and evited from the master
$ kubectl -n kube-system get po
NAME                                                   READY     STATUS    RESTARTS   AGE
dns-controller-2912642664-4w8d4                        0/1       Pending   0          1m
etcd-server-events-ip-172-20-52-25.ec2.internal        1/1       Running   0          6m
etcd-server-ip-172-20-52-25.ec2.internal               1/1       Running   0          5m
kube-apiserver-ip-172-20-52-25.ec2.internal            1/1       Running   1          7m
kube-controller-manager-ip-172-20-52-25.ec2.internal   1/1       Running   1          5m
kube-dns-479524115-5mj9m                               3/3       Running   0          4m
kube-dns-479524115-ktc5t                               3/3       Running   0          6m
kube-dns-autoscaler-1818915203-rx2l6                   1/1       Running   0          6m
kube-proxy-ip-172-20-36-149.ec2.internal               1/1       Running   0          4m
kube-proxy-ip-172-20-51-88.ec2.internal                1/1       Running   0          5m
kube-proxy-ip-172-20-52-25.ec2.internal                1/1       Running   0          7m
New master starts but rolling-update times out as it should:
$ kops rolling-update cluster --node-interval 2s --master-interval 2s --yes
Using cluster from kubectl context: rolling-update.aws.k8spro.com

NAME			STATUS		NEEDUPDATE	READY	MIN	MAX	NODES
master-us-east-1c	NeedsUpdate	1		0	1	1	1
nodes			NeedsUpdate	2		0	2	2	2
I0904 19:56:58.331619   56145 instancegroups.go:269] Draining the node: "ip-172-20-52-25.ec2.internal".
node "ip-172-20-52-25.ec2.internal" cordoned
node "ip-172-20-52-25.ec2.internal" already cordoned
WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: etcd-server-events-ip-172-20-52-25.ec2.internal, etcd-server-ip-172-20-52-25.ec2.internal, kube-apiserver-ip-172-20-52-25.ec2.internal, kube-controller-manager-ip-172-20-52-25.ec2.internal, kube-proxy-ip-172-20-52-25.ec2.internal, kube-scheduler-ip-172-20-52-25.ec2.internal
pod "dns-controller-2912642664-vqs6z" evicted
node "ip-172-20-52-25.ec2.internal" drained
I0904 19:58:30.653936   56145 instancegroups.go:350] Stopping instance "i-0cdd9ee4181767dc9", node "ip-172-20-52-25.ec2.internal", in AWS ASG "master-us-east-1c.masters.rolling-update.aws.k8spro.com".
I0904 19:58:33.124892   56145 instancegroups.go:298] Validating the cluster.
I0904 19:58:33.586672   56145 instancegroups.go:325] Cluster validated.
I0904 19:58:34.422472   56145 instancegroups.go:269] Draining the node: "ip-172-20-36-149.ec2.internal".
node "ip-172-20-36-149.ec2.internal" cordoned
node "ip-172-20-36-149.ec2.internal" already cordoned
WARNING: Deleting pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet: kube-proxy-ip-172-20-36-149.ec2.internal
pod "kube-dns-autoscaler-1818915203-rx2l6" evicted
pod "kube-dns-479524115-5mj9m" evicted
node "ip-172-20-36-149.ec2.internal" drained
I0904 20:00:05.379275   56145 instancegroups.go:350] Stopping instance "i-0513281d4d9babedc", node "ip-172-20-36-149.ec2.internal", in AWS ASG "nodes.rolling-update.aws.k8spro.com".
I0904 20:00:07.848531   56145 instancegroups.go:298] Validating the cluster.
I0904 20:00:17.859751   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.
I0904 20:00:28.460135   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.
I0904 20:00:39.008876   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.
I0904 20:00:49.580757   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.
I0904 20:01:00.203131   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.
I0904 20:01:10.806363   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.
I0904 20:01:21.839388   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.
I0904 20:01:32.442137   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.
I0904 20:01:43.099803   56145 instancegroups.go:322] Cluster did not validate, and waiting longer: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out.

error validating cluster after removing a node: cluster validation failed: cannot get nodes for "rolling-update.aws.k8spro.com": Get https://api.rolling-update.aws.k8spro.com/api/v1/nodes: dial tcp 34.203.223.161:443: getsockopt: operation timed out
As mentioned in the cli help:
--validate-retries int         The number of times that a node will be validated.  Between validation kops sleeps the master-interval/2 or node-interval/2 duration. (default 8)

And we actually did 9 validation not eight.  This is the expected behavior.