bk201/restart-job-fail-upgrade.md Secret

## restart-job-fail-upgrade.md

      
    Raw
  

              restart-job-fail-upgrade.md
            
          
    The upgrade fails when Rancher reconciles each machine; starting over an upgrade will "resume" the last failing node. Here is another way to deal with this:
Move the cluster state from provisioning to provisioned first


First check cluster status, it should be in "Provisioning". If not, just start a new upgrade.
kubectl get clusters.cluster.x-k8s.io local -n fleet-local

NAME    PHASE          AGE   VERSION
local   Provisioning   50d


(Not needed if the target version is v1.1.0) Edit cluster and remove pre-drain and post-drain hooks.
kubectl edit clusters.provisioning.cattle.io local -n fleet-local

and change spec.rkeConfig to {}. Don't use other values. Otherwise, it might tear down the whole cluster.
  rkeConfig: {}


Depending on the failing stage, we need to ack Rancher the hook is done. These help scripts could be useful:


Check drain status: issue ./drain-status.sh.
pre-draining:
node1 (custom-25da412d7a59)
  rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  harvester-pre-hook null
  rke-post-drain: null
  harvester-post-hook: null

draining
node1 (custom-25da412d7a59)
  rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  harvester-pre-hook {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  rke-post-drain: null
  harvester-post-hook: null

post-draining
node1 (custom-25da412d7a59)
  rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  harvester-pre-hook {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  rke-post-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
  harvester-post-hook: null


Pre-draining job fails: Issue ./pre_drain.sh <node_name>.


Post-draining job fails: Issue ./post_drain.sh <node_name>.


Wait until the cluster turns into "Provisioned." If not, check the rancher logs to see if it's waiting for something. 
$ kubectl get clusters.cluster.x-k8s.io local -n fleet-local
NAME    PHASE         AGE   VERSION
local   Provisioned   50d


Start the upgrade again

Check https://docs.harvesterhci.io/v1.1/upgrade/troubleshooting#start-over-an-upgrade