- Upgrade the masters. This is done by incrementing the
google_container_cluster.min_master_version
field to the desired version. Masters will update one by one. - Create new node pools using the new version.
- Taint the old nodes. Pods will get scheduled onto the new nodes.
$ kubectl get nodes --no-headers -l cloud.google.com/gke-nodepool=<node-pool-name> | awk '{print $1}' | xargs -I kubectl taint nodes {} legacy=true:NoExecute
- Leave the old node pools around for a couple of weeks but scale them to 0. They'll be available if we need to go back.
increase the old legacy versioned node pool count and then taint the new node pools. The pods will get scheduled on the old nodes.
Prod workloads should be on their own node pools anyways so they're isolated. If they're on their own node pool you can do this.
- no redeploying applications into a new cluster
- no updating dns records
- cluster name does not change every time an upgrade is performed
Then change to using a NoSchedule
taint, and drain each node. This method will obey pod disruption budgets.
$ kubectl get nodes --no-headers -l cloud.google.com/gke-nodepool=<node-pool-name> | awk '{print $1}' | xargs -I kubectl taint nodes {} legacy=true:NoSchedule
$ kubectl get nodes --no-headers -l cloud.google.com/gke-nodepool=<node-pool-name> | awk '{print $1}' | xargs -I kubectl drain {} --ignore-daemonsets --delete-local-data --force