Following instructions on kuutamod/run.md
Running a localnet
setup consists of
hivemind
consists of- consul as the RAFT consensus layer
- 3 seperate near localnet nodes to start the network
validator
with metrics available atcurl localhost:2233/metrics
screen -S validator ./target/debug/kuutamod --neard-home .data/near/localnet/kuutamod0/ \
--voter-node-key .data/near/localnet/kuutamod0/voter_node_key.json \
--validator-node-key .data/near/localnet/node3/node_key.json \
--validator-key .data/near/localnet/node3/validator_key.json \
--near-boot-nodes $(jq -r .public_key < .data/near/localnet/node0/node_key.json)@127.0.0.1:33301
failover
with metrics available atcurl localhost:2234/metrics
screen -S failover ./target/debug/kuutamod \
--exporter-address 127.0.0.1:2234 \
--validator-network-addr 0.0.0.0:24569 \
--voter-network-addr 0.0.0.0:24570 \
--neard-home .data/near/localnet/kuutamod1/ \
--voter-node-key .data/near/localnet/kuutamod1/voter_node_key.json \
--validator-node-key .data/near/localnet/node3/node_key.json \
--validator-key .data/near/localnet/node3/validator_key.json \
--near-boot-nodes $(jq -r .public_key < .data/near/localnet/node0/node_key.json)@127.0.0.1:33301
Initial check of the validator
and failover
metrics
Validator: kuutamod_state{type="Validating"} 1
Failover: kuutamod_state{type="Voting"} 1
Pass control + c
to send a graceful shutdown command to the main validator
Check of the validator
and failover
metrics
Validator:
Failover: kuutamod_state{type="Validating"} 1
kuutamod_state{type="Validating"} 1
The failover has taken over the validatting responsibilities of the initial validator
When problems with the initial validator
are fixed it can be restarted and it will start in a voting role until the failover
dies in which it will take over validation
With everything running via screen
passing screen -X -S <session_name> kill
will forcefully kill the process.
Passing this into the validator
will kill it and the failover
will properly take over validation (although there is a considerable ~1-2min delay especially compared to the quick failover when killed gracefully). The problem arises when trying to restart the validator
process.
Restarting the validator
process with the command above results in the following errors eventually killing the process
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
level=warn pid=174131 message="Neard finished unexpectly with signal: 6 (core dumped)" target="kuutamod::supervisor" node_id=node
level=info pid=174131 message="state changed: Voting -> Startup" target="kuutamod::supervisor" node_id=node
level=info pid=174131 message="state changed: Startup -> Syncing" target="kuutamod::supervisor" node_id=node
level=info pid=174131 message="state changed: Syncing -> Registering" target="kuutamod::supervisor" node_id=node
level=info pid=174131 message="state changed: Registering -> Voting" target="kuutamod::supervisor" node_id=node
2022-07-18T18:58:04.693039Z INFO neard: version="1.27.0" build="nix:1.27.0" latest_protocol=54
2022-07-18T18:58:04.693659Z INFO near: Opening store database at ".data/near/localnet/kuutamod0/data"
2022-07-18T18:58:04.767130Z INFO db: Created a new RocksDB instance. num_instances=1
2022-07-18T18:58:04.768723Z INFO db: Dropped a RocksDB instance. num_instances=0
thread 'main' panicked at 'Failed to open the database: DBError("IO error: While lock file: .data/near/localnet/kuutamod0/data/LOCK: Resource temporarily unavailable")', core/store/src/lib.rs:340:49
The errors point to a IO error
regarding a LOCK
file in the node's data directory. Presumably when the neard
service is gracefully shut down it removes this LOCK
but when it is forcefully shut down it is not removed.