mratsim/2020-08-20 Eth2 Call 46.txt

## 2020-08-20 Eth2 Call 46.txt
https://github.com/ethereum/eth2.0-pm/issues/173

Nimbus

Medalla:
1. Losing peers
2. Syncing performance
3. (very) High-memory usage for 2 caches:
    - epoch cache that in particular caches validators public keys
      was way too aggressive after a couple speed optimizations (gigabytes!)
    - fork choice/votes cache that is only pruned at finality every 256 epochs
      and is growing in period of non-finality

Reasons:
1. Losing peers and syncing: At start, we subscribe to gossip but when syncing we can't verify and so propagate the latest attestations blocks and so we get ejected.
2. Similarly, subscribing too early pollutes our quarantine system that stores blocks from the network that can't be attached to the chain yet with blocks thousands epochs in the future.
PR: don't subscribe to gossip to early under review.
3. Caches memory usage:
  - Epoch cache: We had duplicated caches created at each epoch.
  - Fork choice: We wait a long time before pruning the cache to amortize pruning cost. A change in data structure allow for pruning every epoch (protoarray refinement)

Multinet scripts maintenance: small lighthouse-nimbus testnet to debug specific issue

Fallback:
- In case we have a critical nimbus bug, we are preparing to fallback to other client to keep our validators up

———

Medalla

Testing and release updates:
- networking testing in progress
- fork choices started to be updated in the spec test repo

Fuzzing update
- Community fuzzing
- bug report
- Differential fuzzing
  - consensus bug in Prysm, not checking empty when aggregating signature for verification.
- https://blog.sigmaprime.io/beacon-fuzz-07.html

 ———

Client updates

Lighthouse:
Fix syncing from a long finalized slot in the past: more stability.
When lots of blocks and processing, things in a core executor where blocking or deadlocking lighthouse —> switched to a queuing system.
Avoid switching chain head when syncing
Import prysm keystores
interior improvement

Teku
State regeneration, by replaying blocks on top of the cache
—> no control on when and where to regenerate
—> lots of regeneration and even duplicate regeneration on multiple cores in parallel
      when multi requests
—> queueing system with deduplication support
Extra logic to pull hot state regularly instead of doing this at startup which delayed startup a lot.
Deadlock in block import logic after finalisation as the finalised block was dropped out of cache
https://github.com/PegaSysEng/teku/issues/2596

Prysm
More improvement in last 3 days than in last 6 months
Refactoring caches updates, peer scoring improvement to avoid junk peers.
Race condition: saving in cache at finality but finalised root never saved to disk due to timeout —> required everyone to restart
The cleanup operation at finality timed out
—> 47% validators dropped of the network
Using “roughtime” 6 time server but one had a 24h offset which led to chaos as they were using “mean” (instead of median)
-> now using system time
-> Dankrad: if big difference to local time: don’t adjust clock.

Trinity
Can restore fork choice context from database
Network: getting noise to work
Sync perf
Move eth2 trinity in a separate repo

Lodestar
Cannot sync to head during finality incident
-> need deeper look, probably large refactor
Gossipsub 1.1 in devel, need to be added to js-libp2p
BLST: looking for a switch between pure C and WASM code

Nethermind
1 more senior dev full-time on Eth2
Focus at the moment: Eth1 deposit

 ———

Hsiao-Wei: Must-Have and Nice-to-have for mainnet
has been shared privately with client team
will likely be on Github.

Afri: How to move forward from here and the learning from Medalla.

-> clients should work on better release tracks.
-> clients need strategy to stabilize the codebase and avoid breaking things that used to work.

Thinking on “Mainnet candidate” (if something goes wrong, we reuse the same deposit contract at different genesis time)

Dankrad: we don’t want to lead to complacency. raise the stake?
Launchpad decentralization?

Carl: tracked bug that caused people to have double deposits
Danny: note scammer launchpad and deposit contracts

Danny: standard proposal for slashing protection DB


Danny: how to go from client A to client B is a priority

—-
Lighthouse subscribes to topics while syncing
Prysm subscribes to topics while syncing but ignores them
- but don’t subscribe to subnet until fully sync

—-
Second release of deposit contracts
backported to eth2 spec repo
only diff is metadata
Use https://github.com/ethereum/eth2.0-specs/tree/dev/solidity_deposit_contract
for local testnet
Probably final version
	https://github.com/ethereum/eth2.0-pm/issues/173

	Nimbus

	Medalla:
	1. Losing peers
	2. Syncing performance
	3. (very) High-memory usage for 2 caches:
	- epoch cache that in particular caches validators public keys
	was way too aggressive after a couple speed optimizations (gigabytes!)
	- fork choice/votes cache that is only pruned at finality every 256 epochs
	and is growing in period of non-finality

	Reasons:
	1. Losing peers and syncing: At start, we subscribe to gossip but when syncing we can't verify and so propagate the latest attestations blocks and so we get ejected.
	2. Similarly, subscribing too early pollutes our quarantine system that stores blocks from the network that can't be attached to the chain yet with blocks thousands epochs in the future.
	PR: don't subscribe to gossip to early under review.
	3. Caches memory usage:
	- Epoch cache: We had duplicated caches created at each epoch.
	- Fork choice: We wait a long time before pruning the cache to amortize pruning cost. A change in data structure allow for pruning every epoch (protoarray refinement)

	Multinet scripts maintenance: small lighthouse-nimbus testnet to debug specific issue

	Fallback:
	- In case we have a critical nimbus bug, we are preparing to fallback to other client to keep our validators up

	———

	Medalla

	Testing and release updates:
	- networking testing in progress
	- fork choices started to be updated in the spec test repo

	Fuzzing update
	- Community fuzzing
	- bug report
	- Differential fuzzing
	- consensus bug in Prysm, not checking empty when aggregating signature for verification.
	- https://blog.sigmaprime.io/beacon-fuzz-07.html

	———

	Client updates

	Lighthouse:
	Fix syncing from a long finalized slot in the past: more stability.
	When lots of blocks and processing, things in a core executor where blocking or deadlocking lighthouse —> switched to a queuing system.
	Avoid switching chain head when syncing
	Import prysm keystores
	interior improvement

	Teku
	State regeneration, by replaying blocks on top of the cache
	—> no control on when and where to regenerate
	—> lots of regeneration and even duplicate regeneration on multiple cores in parallel
	when multi requests
	—> queueing system with deduplication support
	Extra logic to pull hot state regularly instead of doing this at startup which delayed startup a lot.
	Deadlock in block import logic after finalisation as the finalised block was dropped out of cache
	https://github.com/PegaSysEng/teku/issues/2596

	Prysm
	More improvement in last 3 days than in last 6 months
	Refactoring caches updates, peer scoring improvement to avoid junk peers.
	Race condition: saving in cache at finality but finalised root never saved to disk due to timeout —> required everyone to restart
	The cleanup operation at finality timed out
	—> 47% validators dropped of the network
	Using “roughtime” 6 time server but one had a 24h offset which led to chaos as they were using “mean” (instead of median)
	-> now using system time
	-> Dankrad: if big difference to local time: don’t adjust clock.

	Trinity
	Can restore fork choice context from database
	Network: getting noise to work
	Sync perf
	Move eth2 trinity in a separate repo

	Lodestar
	Cannot sync to head during finality incident
	-> need deeper look, probably large refactor
	Gossipsub 1.1 in devel, need to be added to js-libp2p
	BLST: looking for a switch between pure C and WASM code

	Nethermind
	1 more senior dev full-time on Eth2
	Focus at the moment: Eth1 deposit

	———

	Hsiao-Wei: Must-Have and Nice-to-have for mainnet
	has been shared privately with client team
	will likely be on Github.

	Afri: How to move forward from here and the learning from Medalla.

	-> clients should work on better release tracks.
	-> clients need strategy to stabilize the codebase and avoid breaking things that used to work.

	Thinking on “Mainnet candidate” (if something goes wrong, we reuse the same deposit contract at different genesis time)

	Dankrad: we don’t want to lead to complacency. raise the stake?
	Launchpad decentralization?

	Carl: tracked bug that caused people to have double deposits
	Danny: note scammer launchpad and deposit contracts

	Danny: standard proposal for slashing protection DB


	Danny: how to go from client A to client B is a priority

	—-
	Lighthouse subscribes to topics while syncing
	Prysm subscribes to topics while syncing but ignores them
	- but don’t subscribe to subnet until fully sync

	—-
	Second release of deposit contracts
	backported to eth2 spec repo
	only diff is metadata
	Use https://github.com/ethereum/eth2.0-specs/tree/dev/solidity_deposit_contract
	for local testnet
	Probably final version