Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
2020-08-20 Eth2 Call 46
1. Losing peers
2. Syncing performance
3. (very) High-memory usage for 2 caches:
- epoch cache that in particular caches validators public keys
was way too aggressive after a couple speed optimizations (gigabytes!)
- fork choice/votes cache that is only pruned at finality every 256 epochs
and is growing in period of non-finality
1. Losing peers and syncing: At start, we subscribe to gossip but when syncing we can't verify and so propagate the latest attestations blocks and so we get ejected.
2. Similarly, subscribing too early pollutes our quarantine system that stores blocks from the network that can't be attached to the chain yet with blocks thousands epochs in the future.
PR: don't subscribe to gossip to early under review.
3. Caches memory usage:
- Epoch cache: We had duplicated caches created at each epoch.
- Fork choice: We wait a long time before pruning the cache to amortize pruning cost. A change in data structure allow for pruning every epoch (protoarray refinement)
Multinet scripts maintenance: small lighthouse-nimbus testnet to debug specific issue
- In case we have a critical nimbus bug, we are preparing to fallback to other client to keep our validators up
Testing and release updates:
- networking testing in progress
- fork choices started to be updated in the spec test repo
Fuzzing update
- Community fuzzing
- bug report
- Differential fuzzing
- consensus bug in Prysm, not checking empty when aggregating signature for verification.
Client updates
Fix syncing from a long finalized slot in the past: more stability.
When lots of blocks and processing, things in a core executor where blocking or deadlocking lighthouse —> switched to a queuing system.
Avoid switching chain head when syncing
Import prysm keystores
interior improvement
State regeneration, by replaying blocks on top of the cache
—> no control on when and where to regenerate
—> lots of regeneration and even duplicate regeneration on multiple cores in parallel
when multi requests
—> queueing system with deduplication support
Extra logic to pull hot state regularly instead of doing this at startup which delayed startup a lot.
Deadlock in block import logic after finalisation as the finalised block was dropped out of cache
More improvement in last 3 days than in last 6 months
Refactoring caches updates, peer scoring improvement to avoid junk peers.
Race condition: saving in cache at finality but finalised root never saved to disk due to timeout —> required everyone to restart
The cleanup operation at finality timed out
—> 47% validators dropped of the network
Using “roughtime” 6 time server but one had a 24h offset which led to chaos as they were using “mean” (instead of median)
-> now using system time
-> Dankrad: if big difference to local time: don’t adjust clock.
Can restore fork choice context from database
Network: getting noise to work
Sync perf
Move eth2 trinity in a separate repo
Cannot sync to head during finality incident
-> need deeper look, probably large refactor
Gossipsub 1.1 in devel, need to be added to js-libp2p
BLST: looking for a switch between pure C and WASM code
1 more senior dev full-time on Eth2
Focus at the moment: Eth1 deposit
Hsiao-Wei: Must-Have and Nice-to-have for mainnet
has been shared privately with client team
will likely be on Github.
Afri: How to move forward from here and the learning from Medalla.
-> clients should work on better release tracks.
-> clients need strategy to stabilize the codebase and avoid breaking things that used to work.
Thinking on “Mainnet candidate” (if something goes wrong, we reuse the same deposit contract at different genesis time)
Dankrad: we don’t want to lead to complacency. raise the stake?
Launchpad decentralization?
Carl: tracked bug that caused people to have double deposits
Danny: note scammer launchpad and deposit contracts
Danny: standard proposal for slashing protection DB
Danny: how to go from client A to client B is a priority
Lighthouse subscribes to topics while syncing
Prysm subscribes to topics while syncing but ignores them
- but don’t subscribe to subnet until fully sync
Second release of deposit contracts
backported to eth2 spec repo
only diff is metadata
for local testnet
Probably final version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment