michaelsproul/empty_validator_definitions.md

## empty_validator_definitions.md

      
    Raw
  

              empty_validator_definitions.md
            
          
    Lighthouse: how to fix empty validator_definitions.yml

Lighthouse is affected by a bug (sigp/lighthouse#2159) which can cause the validator_definitions.yml file to become corrupted if the disk where Lighthouse's datadir is stored becomes completely full.
Preventing the bug

The best defense against this bug is to ensure your disk doesn't fill to 100%. If you have been running Geth since December 2020 with a 1TB SSD, you may be close to reaching this limit and we recommend that you:

Stop Geth immediately: sudo systemctl stop geth.service (tweak the service name appropriately)
[Optional] If you do not already have an Eth1 failover configured, add one to your Lighthouse beacon node config. You can sign up for a free service like Infura or Alchemy and add the URL to the ExecStart line for your Lighthouse beacon node. More info in the book. Do not restart the validator client, or this may trigger the bug.
If you have greater than 50GB of disk space remaining, you can attempt to prune Geth's database. Follow the instructions from Yorick Downe to perform the prune. Your Lighthouse validator(s) will keep attesting fine while it runs, and will keep proposing blocks if an Eth1 failover was configured (see step 2). Go to step 5.
If you have less than 50GB of disk space remaining you will have to delete Geth's database and start from scratch. Run the following command with the datadir that Geth is configured to use: sudo -u GETHUSER geth removedb --datadir /your/geth/datadir. Replace GETHUSER with the user that runs your Geth service (e.g. geth). Answer yes ([y]) to the two prompts.
Restart the Geth service: sudo systemctl start geth.service.

Remedying the bug

If you are affected by the bug you will see a log message like this when the Lighthouse validator client starts:
CRIT Failed to start validator client        reason: Unable to open or create validator definitions: UnableToParseFile(EndOfStream)

If you see this message then the validator definitions file has become corrupted and you need to restore it by following the steps below. If you have not seen the above CRIT log, you do not need to do anything.

Stop the Lighthouse validator client: sudo systemctl stop lighthousevalidator.
Find your validator definitions file and delete it. If your Lighthouse datadir is /var/lib/lighthouse, then it will be located at /var/lib/lighthouse/validators/validator_definitions.yml. You might need to use sudo to delete it: sudo rm /var/lib/lighthouse/validators/validator_definitions.yml. Do not use rm -r or rm -f.
Re-import your validator keys into your Lighthouse datadir. Follow the instructions from Somer Esat's guide under the headings Copy the Validator Keystore Files and then Import Keystore Files into the Validator Wallet. The most important commands are:

$ sudo chown -R $USER:$USER /var/lib/lighthouse/validators
$ /usr/local/bin/lighthouse --network mainnet account validator import --directory $HOME/eth2deposit-cli/validator_keys --datadir /var/lib/lighthouse
$ sudo chown -R lighthousevalidator:lighthousevalidator /var/lib/lighthouse/validators


Once the keys have been re-imported, start the Lighthouse validator once more: sudo systemctl start lighthousevalidator. You can check that it's running successfully with sudo journalctl -u lighthousevalidator -f.

FAQ

Is there an alternative to re-importing my keys?

Yes. Instead of re-importing, you can get Lighthouse to rediscover the keystores by starting the VC after the definitions file is deleted. It will error because it lacks the password. Stop it again (sudo systemctl stop lighthousevalidator), and edit the validator_definitions.yml so that it contains a voting_keystore_password as per the example here:
https://lighthouse-book.sigmaprime.io/validator-management.html
You need to input the password that you set when generating your keys with the deposit-cli.
How do I check how much disk space I have remaining?

Run the command df -h / or df -h /lighthouse/datadir (substituting your actual datadir)
$ df -h /lighthouse/datadir
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/xubuntu--vg-root  232G  154G   66G  71% /

In this example, there is 66GB free on a disk of size 232GB.
Where can I read more about Geth's database pruning?

https://blog.ethereum.org/2021/03/03/geth-v1-10-0/
What happens if I don't configure an Eth1 failover?

If your Eth1/Geth node is offline and your validator tries to propose a block, it will fail. Your node will keep attesting regardless of Eth1 node connectivity, so depending on how long the Geth fix takes and how many validators you have, it may be OK without a failover.
More Info


Yorick's PSA on r/ethstaker: https://www.reddit.com/r/ethstaker/comments/n7mnx5/psa_if_youre_running_geth_prune/