The upgrade fails when Rancher reconciles each machine; starting over an upgrade will "resume" the last failing node. Here is another way to deal with this:
-
First check cluster status, it should be in "Provisioning". If not, just start a new upgrade.
kubectl get clusters.cluster.x-k8s.io local -n fleet-local
NAME PHASE AGE VERSION local Provisioning 50d
-
(Not needed if the target version is v1.1.0) Edit cluster and remove pre-drain and post-drain hooks.
kubectl edit clusters.provisioning.cattle.io local -n fleet-local
and change
spec.rkeConfig
to{}
. Don't use other values. Otherwise, it might tear down the whole cluster.rkeConfig: {}
-
Depending on the failing stage, we need to ack Rancher the hook is done. These help scripts could be useful:
-
Check drain status: issue
./drain-status.sh
.pre-draining:
node1 (custom-25da412d7a59) rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0} harvester-pre-hook null rke-post-drain: null harvester-post-hook: null
draining
node1 (custom-25da412d7a59) rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0} harvester-pre-hook {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0} rke-post-drain: null harvester-post-hook: null
post-draining
node1 (custom-25da412d7a59) rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0} harvester-pre-hook {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0} rke-post-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0} harvester-post-hook: null
-
Pre-draining job fails: Issue
./pre_drain.sh <node_name>
. -
Post-draining job fails: Issue
./post_drain.sh <node_name>
.
-
-
Wait until the cluster turns into "Provisioned." If not, check the rancher logs to see if it's waiting for something.
$ kubectl get clusters.cluster.x-k8s.io local -n fleet-local NAME PHASE AGE VERSION local Provisioned 50d
Check https://docs.harvesterhci.io/v1.1/upgrade/troubleshooting#start-over-an-upgrade