mjhuber/upgrade.md

## upgrade.md

      
    Raw
  

              upgrade.md
            
          
    GKE Upgrades


Upgrade the masters.  This is done by incrementing the google_container_cluster.min_master_version field to the desired version.  Masters will update one by one.
Create new node pools using the new version.
Taint the old nodes.  Pods will get scheduled onto the new nodes.

$ kubectl get nodes --no-headers -l cloud.google.com/gke-nodepool=<node-pool-name> | awk '{print $1}' | xargs -I kubectl taint nodes {} legacy=true:NoExecute


Leave the old node pools around for a couple of weeks but scale them to 0.  They'll be available if we need to go back.

What if you need to roll back?

increase the old legacy versioned node pool count and then taint the new node pools.  The pods will get scheduled on the old nodes.
I want to test this on nonprod workloads first.

Prod workloads should be on their own node pools anyways so they're isolated.  If they're on their own node pool you can do this.
Advantages


no redeploying applications into a new cluster
no updating dns records
cluster name does not change every time an upgrade is performed

What if i want more control over the eviction process?

Then change to using a NoSchedule taint, and drain each node.  This method will obey pod disruption budgets.
$ kubectl get nodes --no-headers -l cloud.google.com/gke-nodepool=<node-pool-name> | awk '{print $1}' | xargs -I kubectl taint nodes {} legacy=true:NoSchedule

$ kubectl get nodes --no-headers -l cloud.google.com/gke-nodepool=<node-pool-name> | awk '{print $1}' | xargs -I kubectl drain {} --ignore-daemonsets --delete-local-data --force