An aggregate is spec-defined, for a receiver:
- the signature is received aggregated from the network
- the attesters public key must be aggregated (on BLS G1)
- only attestations are aggregated
Batching is client optimization
- MillerLoop(PublicKey, Message, Signature)
- FinalExponentiation(partial Miller loops)
According to Justin Drake there is an incoming optimization that will make adding public keys 50% faster in BLST and that it's the bottleneck with a large number of signatures, but in my benchmarks, it's only 5% of aggregation verification and it's unused for batching. Is he talking about something else?
Validation is according to gossipsub rebroadcasting rules
- block
- attestations
- voluntary exits, attester slashing, block proposer slashing
Validation requires verifying 1 signature per object
Verification is according to consensus rules
- for blocks:
- state transition can be done using the block
- crypto verification, including all nested crypto objects (attestations, exits, slashings, randao)
- crypto verification for blocks is batched today.
- for attestations:
- 1 signature to verify
- small consistency checks which seem redundant with gossip checks
- voluntary exits, attester slashing, block proposer slashing:
- 1 signature to verify
- small consistency checks which seem redundant with gossip checks
For attestations, exists, slashings (during steady state):
- Gossip -> eth2_processor -> validation + rebroadcast -> a "pool"
For blocks during sync:
- Eth2 RPC -> sync_manager -> SharedBlockQueue -> clearance -> verification -> Candidate ChainDAG
For blocks during steady state:
- Gossip -> eth2_processor -> validation + rebroadcast -> SharedBlockQueue -> clearance -> verification -> Candidate ChainDAG
During sync or steady state, for missing ancestors:
- Eth2 RPC -> request_manager -> SharedBlockQueue -> clearance -> verification -> Candidate ChainDAG
During sync:
- block processing speed
- no mesh or latency expectations from connected peers
During steady state:
- attestation processing speed (may be unaggregated!)
- latency expectations from connected peers
- we can batch during steady state, for validation
- we can batch during steady state or sync, for verification
- we can batch missing ancestors during steady state or sync, for verification
In a well functioning chain we rarely receive more than 1 block per slot at a steady state.
Our opportunity to receive many blocks that we can verify is only during sync or if we are missing a fork.
So it only makes sense to batch the "SharedBlockQueue -> clearance -> verification".
Par exemple by changing the SharedBlockQueue, from AsyncQueue[BlockEntry]
to AsyncQueue[seq[BlockEntry]]
so that inputs from the same sync request or ancestors request are grouped together.
One issue is that verifying blocks is expensive and we may not have enough time to verify 5 missing blocks at once during periods of non-finality without impacting validator duties or gossipsub connectivity requirements.
- we can batch 'eth2_processor -> validation + rebroadcast -> a "pool"' during steady state
- no attestations during sync
We kind of already process multiple at once (without returning control to event loop)
because validation is somewhat cheap (1 signature only, no state transition),
so only thing missing is using crypto batchVerify
If we want to batch blocks, we need to avoid blocking the networking and validator duties (so DAG, forkchoice as well).
- solution 1 requires a threadpool and a way to asyncify waiting for a Flowvar
- solution 2 would split NBC in a networking thread and a verification thread and a way to asyncify waiting for a Channel or AsyncChannel
- solution 3 would split NBC in a networking process and a verification process and communication via IPC and 2 event loops
- solution 4 would split NBC in producer<->consumer services, at least one service will be networking and one will be consensus.
Whatever we use as a solution we need a way to tie in the communication (Channels, Flowvar) with Chronos. AsyncChannel seems stalled due to transporting GC-ed types on thread-local heap. ARC/ORC is out of question for the moment.
I've outlined a way to make both Flowvar (including ones for custom threadpools) or Channels (including Nim channels) work with Chronos, by creating an "async condition variable" (AsyncV): nim-lang/RFCs#304 (comment) So an AsyncChannel or an AsyncFlowvar is just "Channel + AsyncCV" or "Flowvar + AsyncCV".