Lighthouse is affected by a bug (sigp/lighthouse#2159) which can cause the validator_definitions.yml
file to become corrupted if the disk where Lighthouse's datadir is stored becomes completely full.
The best defense against this bug is to ensure your disk doesn't fill to 100%. If you have been running Geth since December 2020 with a 1TB SSD, you may be close to reaching this limit and we recommend that you:
- Stop Geth immediately:
sudo systemctl stop geth.service
(tweak the service name appropriately) - [Optional] If you do not already have an Eth1 failover configured, add one to your Lighthouse beacon node config. You can sign up for a free service like Infura or Alchemy and add the URL to the
ExecStart
line for your Lighthouse beacon node. More info in the book. Do not restart the validator client, or this may trigger the bug. - If you have greater than 50GB of disk space remaining, you can attempt to prune Geth's database. Follow the instructions from Yorick Downe to perform the prune. Your Lighthouse validator(s) will keep attesting fine while it runs, and will keep proposing blocks if an Eth1 failover was configured (see step 2). Go to step 5.
- If you have less than 50GB of disk space remaining you will have to delete Geth's database and start from scratch. Run the following command with the datadir that Geth is configured to use:
sudo -u GETHUSER geth removedb --datadir /your/geth/datadir
. ReplaceGETHUSER
with the user that runs your Geth service (e.g.geth
). Answer yes ([y]
) to the two prompts. - Restart the Geth service:
sudo systemctl start geth.service
.
If you are affected by the bug you will see a log message like this when the Lighthouse validator client starts:
CRIT Failed to start validator client reason: Unable to open or create validator definitions: UnableToParseFile(EndOfStream)
If you see this message then the validator definitions file has become corrupted and you need to restore it by following the steps below. If you have not seen the above CRIT log, you do not need to do anything.
- Stop the Lighthouse validator client:
sudo systemctl stop lighthousevalidator
. - Find your validator definitions file and delete it. If your Lighthouse datadir is
/var/lib/lighthouse
, then it will be located at/var/lib/lighthouse/validators/validator_definitions.yml
. You might need to usesudo
to delete it:sudo rm /var/lib/lighthouse/validators/validator_definitions.yml
. Do not userm -r
orrm -f
. - Re-import your validator keys into your Lighthouse datadir. Follow the instructions from Somer Esat's guide under the headings Copy the Validator Keystore Files and then Import Keystore Files into the Validator Wallet. The most important commands are:
$ sudo chown -R $USER:$USER /var/lib/lighthouse/validators
$ /usr/local/bin/lighthouse --network mainnet account validator import --directory $HOME/eth2deposit-cli/validator_keys --datadir /var/lib/lighthouse
$ sudo chown -R lighthousevalidator:lighthousevalidator /var/lib/lighthouse/validators
- Once the keys have been re-imported, start the Lighthouse validator once more:
sudo systemctl start lighthousevalidator
. You can check that it's running successfully withsudo journalctl -u lighthousevalidator -f
.
Yes. Instead of re-importing, you can get Lighthouse to rediscover the keystores by starting the VC after the definitions file is deleted. It will error because it lacks the password. Stop it again (sudo systemctl stop lighthousevalidator
), and edit the validator_definitions.yml
so that it contains a voting_keystore_password
as per the example here:
https://lighthouse-book.sigmaprime.io/validator-management.html
You need to input the password that you set when generating your keys with the deposit-cli
.
Run the command df -h /
or df -h /lighthouse/datadir
(substituting your actual datadir)
$ df -h /lighthouse/datadir
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/xubuntu--vg-root 232G 154G 66G 71% /
In this example, there is 66GB free on a disk of size 232GB.
https://blog.ethereum.org/2021/03/03/geth-v1-10-0/
If your Eth1/Geth node is offline and your validator tries to propose a block, it will fail. Your node will keep attesting regardless of Eth1 node connectivity, so depending on how long the Geth fix takes and how many validators you have, it may be OK without a failover.
- Yorick's PSA on r/ethstaker: https://www.reddit.com/r/ethstaker/comments/n7mnx5/psa_if_youre_running_geth_prune/