mratsim/batching_opportunities.md

## batching_opportunities.md

      
    Raw
  

              batching_opportunities.md
            
          
    Batching opportunities overviews

Recap

aggregate != batching

An aggregate is spec-defined, for a receiver:

the signature is received aggregated from the network
the attesters public key must be aggregated (on BLS G1)
only attestations are aggregated

Batching is client optimization

MillerLoop(PublicKey, Message, Signature)
FinalExponentiation(partial Miller loops)

According to Justin Drake there is an incoming optimization that will make
adding public keys 50% faster in BLST and that it's the bottleneck with a large number of signatures,
but in my benchmarks, it's only 5% of aggregation verification and it's unused for batching.
Is he talking about something else?
validation != verification

Validation is according to gossipsub rebroadcasting rules

block
attestations
voluntary exits, attester slashing, block proposer slashing

Validation requires verifying 1 signature per object
Verification is according to consensus rules

for blocks:

state transition can be done using the block
crypto verification, including all nested crypto objects (attestations, exits, slashings, randao)

crypto verification for blocks is batched today.


for attestations:

1 signature to verify
small consistency checks which seem redundant with gossip checks


voluntary exits, attester slashing, block proposer slashing:

1 signature to verify
small consistency checks which seem redundant with gossip checks


Flow

For attestations, exists, slashings (during steady state):

Gossip -> eth2_processor -> validation + rebroadcast -> a "pool"

For blocks during sync:

Eth2 RPC -> sync_manager -> SharedBlockQueue -> clearance -> verification -> Candidate ChainDAG

For blocks during steady state:

Gossip -> eth2_processor -> validation + rebroadcast -> SharedBlockQueue -> clearance -> verification -> Candidate ChainDAG

During sync or steady state, for missing ancestors:

Eth2 RPC -> request_manager -> SharedBlockQueue -> clearance -> verification -> Candidate ChainDAG

Bottlenecks

During sync:

block processing speed
no mesh or latency expectations from connected peers

During steady state:

attestation processing speed (may be unaggregated!)
latency expectations from connected peers

Batching opportunities

For blocks:


we can batch during steady state, for validation
we can batch during steady state or sync, for verification
we can batch missing ancestors during steady state or sync, for verification

Analysis

In a well functioning chain we rarely receive more than 1 block per slot at a steady state.
Our opportunity to receive many blocks that we can verify is only during sync or if we are missing a fork.
So it only makes sense to batch the "SharedBlockQueue -> clearance -> verification".
Par exemple by changing the SharedBlockQueue, from AsyncQueue[BlockEntry] to AsyncQueue[seq[BlockEntry]]
so that inputs from the same sync request or ancestors request are grouped together.
One issue is that verifying blocks is expensive and we may not have enough time to verify 5 missing blocks
at once during periods of non-finality without impacting validator duties or gossipsub connectivity requirements.
For attestations:


we can batch 'eth2_processor -> validation + rebroadcast -> a "pool"' during steady state
no attestations during sync

Analysis

We kind of already process multiple at once (without returning control to event loop)
because validation is somewhat cheap (1 signature only, no state transition),
so only thing missing is using crypto batchVerify
Architecture exploration

If we want to batch blocks, we need to avoid blocking the networking and validator duties (so DAG, forkchoice as well).

solution 1 requires a threadpool and a way to asyncify waiting for a Flowvar
solution 2 would split NBC in a networking thread and a verification thread and a way to asyncify waiting for a Channel or AsyncChannel
solution 3 would split NBC in a networking process and a verification process and communication via IPC and 2 event loops
solution 4 would split NBC in producer<->consumer services, at least one service will be networking and one will be consensus.

Whatever we use as a solution we need a way to tie in the communication (Channels, Flowvar) with Chronos.
AsyncChannel seems stalled due to transporting GC-ed types on thread-local heap. ARC/ORC is out of question for the moment.
I've outlined a way to make both Flowvar (including ones for custom threadpools) or Channels (including Nim channels)
work with Chronos, by creating an "async condition variable" (AsyncV): nim-lang/RFCs#304 (comment)
So an AsyncChannel or an AsyncFlowvar is just "Channel + AsyncCV" or "Flowvar + AsyncCV".