Skip to content

Instantly share code, notes, and snippets.

@andy108369
Last active November 17, 2023 11:34
Show Gist options
  • Save andy108369/da1279257c018be6370310c990c25738 to your computer and use it in GitHub Desktop.
Save andy108369/da1279257c018be6370310c990c25738 to your computer and use it in GitHub Desktop.

build akash 0.26.2 with rollback function to recover from the AppHash / error on replay: wrong Block.Header.LastResultsHash errors

This enables the rollback function with --delete-pending-block (Delete the pending block in tendermint block store if exists) in akash 0.26.2 (cosmos-sdk v0.45.16, tendermint v0.34.27)

Useful for recovering from the AppHash / error on replay: wrong Block.Header.LastResultsHash errors without resorting to restoring the backup again. Especially useful for heavy archival nodes.

Previous version for akash 0.16.4 (cosmos-sdk v0.45.4, tendermint v0.34.19) https://gist.github.com/andy108369/44f5a676935286e0115431015ef66e1c

pre-built binary

If you don't want to compile it yourself, you can get the pre-built binary for akash v0.26.2 here: https://transfer.sh/lgfqoMrg4o/akash-0.26.2-rollback2.tar.gz

  • sha256sum:
d21afcff0bdfed1f333dae599bb52a2d37736263ab6a9041e74ab29fa43101be  akash-0.26.2-rollback2

Patching

1. cometbft (formerly tendermint)

Based on https://github.com/tendermint/tendermint/pull/8574/commits/81344ac9464db02421fa115041f162ab9ace9372

git clone https://github.com/akash-network/cometbft.git
cd cometbft
git checkout tags/v0.34.27-akash -b v0.34.27-akash-w-upstream-pr-8574
git remote add upstream https://github.com/tendermint/tendermint.git
git fetch upstream pull/8574/head:pr8574
git cherry-pick 81344ac9464db02421fa115041f162ab9ace9372
git rm CHANGELOG_PENDING.md
git cherry-pick --continue --no-edit
git tag v0.34.27-akash-w-upstream-pr-8574
COMETBFT_PATH="$(realpath .)"
cd ..

2. cosmos-sdk

Based on https://github.com/cosmos/cosmos-sdk/pull/11982/commits/614906d69177db9555ea16ebb3e895b2f4617ff3

git clone https://github.com/cosmos/cosmos-sdk.git
cd cosmos-sdk
git checkout tags/v0.45.16 -b v0.45.16-w-upstream-pr-11982
git fetch origin pull/11982/head:pr11982

git cherry-pick --strategy=recursive -X theirs 614906d69177db9555ea16ebb3e895b2f4617ff3
git show HEAD~1:go.sum > go.sum
git show HEAD~1:go.mod > go.mod

go mod edit -replace github.com/tendermint/tendermint="$COMETBFT_PATH"
git add go.mod go.sum

for i in baseapp/baseapp.go server/types/app.go store/iavl/tree.go store/rootmulti/rollback_test.go store/types/store.go; do
  git show HEAD~1:"$i" > "$i"
  git add "$i"
done


git apply << 'EOF'
diff --git a/server/rollback.go b/server/rollback.go
index db865661a..db9f2fb1f 100644
--- a/server/rollback.go
+++ b/server/rollback.go
@@ -5,11 +5,11 @@ import (
 	"fmt"
 
 	"github.com/spf13/cobra"
-	dbm "github.com/tendermint/tm-db"
+	dbm "github.com/cometbft/cometbft-db"
 
 	"github.com/cosmos/cosmos-sdk/client/flags"
 	"github.com/cosmos/cosmos-sdk/server/types"
-	tmcmd "github.com/tendermint/tendermint/cmd/tendermint/commands"
+	tmcmd "github.com/tendermint/tendermint/cmd/cometbft/commands"
 	cfg "github.com/tendermint/tendermint/config"
 	"github.com/tendermint/tendermint/state"
 	"github.com/tendermint/tendermint/store"
@@ -86,7 +86,9 @@ func loadStateAndBlockStore(config *cfg.Config) (*store.BlockStore, state.Store,
 	if err != nil {
 		return nil, nil, err
 	}
-	stateStore := state.NewStore(stateDB)
+	stateStore := state.NewStore(stateDB, state.StoreOptions{
+		DiscardABCIResponses: config.Storage.DiscardABCIResponses,
+	})
 
 	return blockStore, stateStore, nil
 }
EOF

git diff

git add -u .
git commit --amend --no-edit
git tag v0.45.16-w-upstream-pr-11982
COSMOS_SDK_PATH="$(realpath .)"
cd ..

3. node (akash node)

Build akash node v0.26.2 with the previously patched cometbft & cosmos-sdk.

git clone https://github.com/akash-network/node.git
cd node

git checkout -b tag-v0.26.2 v0.26.2
go mod edit -replace github.com/tendermint/tendermint="$COMETBFT_PATH"
go mod edit -replace github.com/cosmos/cosmos-sdk="$COSMOS_SDK_PATH"

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash
curl -sfL https://direnv.net/install.sh | bash
source ~/.bashrc 
nvm install v20

nvm use v20
eval "$(direnv hook bash)"
direnv allow

go install 'golang.org/dl/go1.21.0@latest'
go1.21.0 download
export GOTOOLCHAIN=go1.21.0
export GOTOOLCHAIN_SEMVER=v1.21.0

$ go version
go version go1.21.0 linux/amd64

GO_LINKMODE="internal" CGO_ENABLED=0 make

.cache/bin/akash version
.cache/bin/akash version --long | grep -E 'go version|cosmos-sdk@|tendermint@'


# Example outputs

$ .cache/bin/akash version
0.26.2
$ .cache/bin/akash version --long | grep -E 'go version|cosmos-sdk@|tendermint@'
go: go version go1.21.0 linux/amd64
- github.com/cosmos/cosmos-sdk@v0.45.16 => /tmp/1/cosmos-sdk@(devel)
- github.com/tendermint/tendermint@v0.34.27 => /tmp/1/cometbft@(devel)

$ .cache/bin/akash rollback --help | grep pending
      --delete-pending-block   Delete the pending block in tendermint block store if exists

Usage

Rollback the chain

Note: this section has been copied from the akash v0.16.4; however it is valid for v0.26.2 as well.

Note: I hit the AppHash error again when rolled back only once. But it worked when I've rolled back twice down to 6955212 height.

IMPORTANT Make sure you have set the same pruning setting in your ~/.akash/config/app.toml file if you are running the akash tool in different context (i.e. outside the container). This will corrupt the IAVL DB since changing the pruning strategy is not supported.

A single rollback takes about 30-40 minutes for the full chain (~670 GiB). No extra disk space is required.

# ./akash rollback --delete-pending-block
rollback pending block 6955215
Rolled back state to height 6955213 and hash D7BF7C12B7212B3E9FEBB538A839330711C81189CB72EF460216F6FF12985424

# ./akash rollback --delete-pending-block
rollback pending block 6955214
Rolled back state to height 6955212 and hash 6A815F05C7521E2ABB7DB6F85F11D185EF5DD1311C01D31D19B04596D40C089E

Extra

upstream

I think we need to make these patches to the upstream cosmos-sdk (already done), cometbft and akash node eventually.

In the future, validators or nodes will not need to rely on restoring from snapshots in the event of AppHash error which may occur during the chain upgrades (mostly with non-gov SW upgrades, i.e. binary swaps) (edited)

PR for cometbft => cometbft/cometbft#1610

@andy108369
Copy link
Author

Have tested this procedure against Polkachu's snapshot at 13632948 height made with akash 0.26.2.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment