aaronc/proposal.md

## proposal.md

      
    Raw
  

              proposal.md
            
          
    Motivation

Our current upgrade module uses the gov module to trigger a planned upgrade, this has a few potential downsides:

if a hot-fix release is needed, the network needs to wait for the full voting window
if validators need to postpone an upgrade after the governance vote due to some issues found in testing, they can't do that
no built-in way to abort an upgrade in case the upgrade handler fails

Proposal

A signalling method has been discussed in the past, but wasn't formally specified. Here is a proposed approach.
Signals

Validators, rather than governance, "signal" upgrades and the state machine responds based on signalling thresholds:
type MsgSignalUpgrade struct {
  // the self-delegate address of the validator
  Validator sdk.AccAddress 
  // set to 0 if the upgrade should happen as soon as the quorum is reached, or future height for a planned upgrade
  UpgradeHeight uint64 
  // the name of the upgrade - the new binary must have a handler with this name to apply migrations
  UpgradeName string 
  // set this value to something greater than 2/3 to indicate that this upgrade requires a higher quorum,
  // the rationale being that we usually actually want 80 or 90%+ of the network ready to do a smooth upgrade,
  // in the case of security hot-fixes we may need to leave the low threshold of 2/3
  QuorumNeeded sdk.Dec
}
Upgrades will only be triggered at the UpgradeHeight if a QuorumNeeded weight of validators have signalled
that they will upgrade. Validators could remove their signal even a few blocks before the height to postpone or abort.
Aborting

Aborting would be handled by some --abort-upgrade <upgrade-name> command line flag to the daemon. If set, this would
cause the current binary (without the upgrade handler) to not expect an upgrade handler and continue processing blocks.
This would likely only be used in cases when the upgrade handler itself hit some error. In the error case, currently
the only fix would be to release a new binary with either bug fixes or a no-op handler. The abort flag, instead, would
basically alter the state machine behavior of the current binary to allow a smooth abort in error cases, such as
what happened with cosmos-hub-3.