Skip to content

Instantly share code, notes, and snippets.

@bk201
Last active November 2, 2022 10:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bk201/8a2b274772301de93a9ebfb936ffcc35 to your computer and use it in GitHub Desktop.
Save bk201/8a2b274772301de93a9ebfb936ffcc35 to your computer and use it in GitHub Desktop.
Restart a failing upgrade

The upgrade fails when Rancher reconciles each machine; starting over an upgrade will "resume" the last failing node. Here is another way to deal with this:

Move the cluster state from provisioning to provisioned first

  1. First check cluster status, it should be in "Provisioning". If not, just start a new upgrade.

    kubectl get clusters.cluster.x-k8s.io local -n fleet-local
    
    NAME    PHASE          AGE   VERSION
    local   Provisioning   50d
    
  2. (Not needed if the target version is v1.1.0) Edit cluster and remove pre-drain and post-drain hooks.

    kubectl edit clusters.provisioning.cattle.io local -n fleet-local
    

    and change spec.rkeConfig to {}. Don't use other values. Otherwise, it might tear down the whole cluster.

      rkeConfig: {}
    
  3. Depending on the failing stage, we need to ack Rancher the hook is done. These help scripts could be useful:

    • Check drain status: issue ./drain-status.sh.

      pre-draining:

      node1 (custom-25da412d7a59)
        rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
        harvester-pre-hook null
        rke-post-drain: null
        harvester-post-hook: null
      

      draining

      node1 (custom-25da412d7a59)
        rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
        harvester-pre-hook {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
        rke-post-drain: null
        harvester-post-hook: null
      

      post-draining

      node1 (custom-25da412d7a59)
        rke-pre-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
        harvester-pre-hook {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
        rke-post-drain: {"IgnoreErrors":false,"deleteEmptyDirData":true,"disableEviction":false,"enabled":true,"force":true,"gracePeriod":0,"ignoreDaemonSets":true,"postDrainHooks":[{"annotation":"harvesterhci.io/post-hook"}],"preDrainHooks":[{"annotation":"harvesterhci.io/pre-hook"}],"skipWaitForDeleteTimeoutSeconds":0,"timeout":0}
        harvester-post-hook: null
      
    • Pre-draining job fails: Issue ./pre_drain.sh <node_name>.

    • Post-draining job fails: Issue ./post_drain.sh <node_name>.

  4. Wait until the cluster turns into "Provisioned." If not, check the rancher logs to see if it's waiting for something. 

    $ kubectl get clusters.cluster.x-k8s.io local -n fleet-local
    NAME    PHASE         AGE   VERSION
    local   Provisioned   50d
    

Start the upgrade again

Check https://docs.harvesterhci.io/v1.1/upgrade/troubleshooting#start-over-an-upgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment