Skip to content

Instantly share code, notes, and snippets.

@zsfelfoldi
Last active May 9, 2023 08:15
Show Gist options
  • Save zsfelfoldi/254d9356632d384a05a56905fa401db6 to your computer and use it in GitHub Desktop.
Save zsfelfoldi/254d9356632d384a05a56905fa401db6 to your computer and use it in GitHub Desktop.

Beacon light sync top level overview

This document gives a top level view of the blsync PR in order to help the review/merge process. It describes the most important mechanisms and data structures and how they are used in this PR (sometimes also how they are going to be used later).

The current PR implements an MVP feature called blsync that can light sync the beacon chain from a beacon node that supports the light_client REST API namespace (Lodestar or Nimbus) and drive an EL node through the engine API. Its components are sometimes more general purpose though as they are also intended to be part of the new full featured PoS capable Geth light mode. Note that a significant part of this PR (more specifically light.LightChain, merkle.MultiProof, sync.HeaderSync, sync.StateSync) are only used here in order to get the finalized block hash out of the beacon state. It is a possible option to strip down the PR even further by removing this feature in the first version as it is maybe not essential and is implemented with a significant amount of code. On the other hand, it is still a nice to have feature and the exact same beacon state syncing mechanism is going to be an essential part of light servers so this feature is also a good test that helps us move toward the final goal.

beacon/light

This package defines passive data structures that represent the actual state of the beacon chain light syncing.

  • CheckpointData is a starting point for light syncing (can be used to initialize a CommitteeChain). It can be obtained based on the beacon header's root hash which is either hardcoded in the client or specified as a command line flag. It contains the sync committee of the given period and its beacon state merkle proof.
  • CommitteeChain holds a validated series of SerializedCommittees and LightClientUpdates. It can validate SignedHeaders once it has been synced up to the required sync period. It is a key component of beacon light clients but servers driven by full beacon nodes will also use it for storing and serving these structures. Though in the current PR a chain of sufficiently good LightClientUpdates is never updated, CommitteeChain is capable of replacing updates with better ones and even reorging if the better update proves a different next update (see comment at ForwardUpdateSync). Hopefully this feature will not be ever needed in practice (at least on mainnet), but still, propagating the best update chain is a good practice and and being able to recover from a serious attack reduces the potential feasibility of such an attack (note that currently the whole light syncing relies on an honest majority assumption so it is less safe than general consensus, though AFAIK there are serious ongoing efforts to make sync committee signature fraud slashable, at least for the finalized chain).
  • LightChain is a beacon header chain with optionally associated partial beacon state proofs which can be added separately, after the header has been added. It keeps track of the canonical header chain which can be externally set by SetHead. It also automatically keeps track of the section of the canonical chain where state proofs are also available.
  • HeadValidator validates SignedHeaders with the current CommitteeChain and also implements a subscription mechanism that allows multiple subscriptions to new validated heads at different signer count levels. Note that currently we only have one subscription at the global signer threshold level (which is a command line parameter of blsync) but light servers will use separate subscriptions for propagating signed heads at signer count levels which are independent from the local threshold setting.

beacon/light/types

This package defines passive data structures used by the light syncing process.

  • SyncCommittee is a set of 512 BLS keys randomly selected by consensus for every 8192 slot sync period. It is required for validating the BLS signature aggregates of SignedHeaders and LightClientUpdates.
  • SerializedCommittee is a serialized version of SyncCommittee.
  • LightClientUpdate proves the root hash of the sync committee of the next period based on a header signed by a sufficient majority of the sync committee of the given period, plus a beacon state merkle proof of the next_sync_committee state field. A light client update is better (has a higher update score) if the header signature aggregate has more participants. The best update is a finalized update that has a supermajority signed header referencing a former header from the same sync period as finalized.
  • Forks is a list of known chain forks that can determine the SigningRoot of any header based on header hash and fork version at the given epoch.
  • Header is a beacon header.
  • SignedHeader is a header signed by a subset of the canonical SyncCommittee for the given period. Note that the structure does not reference the committee itself but the period is determined by the SignatureSlot field.

beacon/light/request

This package is a framework for network requests and syncing mechanisms. In the final light client implementation it will replace some parts of the les package (the request distributor/retriever).

  • Scheduler is the main active component where sync modules and servers are registered. It implements a trigger mechanism that ensures that all sync modules get a chance of making network requests when necessary.
  • Module is an interface for a syncing module. These modules are called whenever triggered by module or server events. They typically have direct references to passive data structures (and sometimes other modules). In each processing round they determine whether they can add new data to the structures or start new network requests whose results can be added if successful. When changes have been made that might make other additions or requests possible, they emit module trigger signals, triggering themselves and/or other modules for the next processing round. Their Process function always receives an Environment which allows starting network requests and makes the current validated head and prefetch head available.
  • Server wraps the abstract RequestServer (which is currently implemented by SyncServer) and adds timing/triggering mechanisms for request timeout and delay. Delay is not used currently but will be used later by a greatly simplified version of the flow control. Whenever a server is found not available for requesting at any moment, it guarantees to send a server event trigger signal whenever it becomes available again.
  • Environment is always passed to Module.Process and allows making network requests to the current set of servers (or a subset which has been recently triggered). It also makes the actual validated and prefetch heads available. The validated head is determined by HeadValidator while the prefetch head comes from HeadTracker.
  • HeadTracker subscribes to the latest and signed head event streams of registered servers. Based on the latest heads it determines the current prefetch head which is the (possibly unvalidated) latest head advertised by the majority of servers. The signed head events are passed to HeadUpdater (which passes them further to HeadValidator when CommitteeChain can validate them)

beacon/light/sync

This package contains sync modules (all of them implement request.Module) that are not only used in the current PR but will also be used by the full-featured light client and/or its server.

  • CheckpointInit checks if the CommitteeChain is initialized. If not, it checks if the necessary CheckpointData is in the database. If not, it checks if it can start a request to retrieve it. Finally it initializes CommitteeChain and emits a module trigger that starts ForwardUpdateSync.
  • ForwardUpdateSync checks if any of the servers, based on their advertised head slots, are supposed to have LightClientUpdates that could be appended to the current CommitteeChain, then requests and adds them if successful. Note that when serving this data will be implemented, servers will also be able to advertise the update scores of their committee chain and there is going to be another sync method that compares the received scores to the local chain and fetches better updates if available.
  • HeadUpdater does not start any requests but is still a sync module so that it can be triggered whenever CommitteeChain is improved. All it does is that it receives SignedHeaders from the individual servers and passes them to HeadValidator when the CommitteeChain is synced.
  • HeaderSync tries to sync up the header chain of LightChain up to the current validated head (which is available through Environment). Once successful, it calls LightChain.SetHead. Once the head is synced, it can also reverse sync the canonical header chain up to an externally set "tail target" slot. Optionally it can also attempt to fetch the prefetch head which is not made canonical yet but allows prefetching the state so that by the time the majority signature is available, all relevant data belonging to the head header is also available. Note that header prefetching is not used in the blsync setup because it prefetches entire beacon blocks and the header is derived from those. Note that the current version always fetches headers one by one based on parent root while reverse syncing older headers could be paralellized by fetching by number and checking parent roots later. This will be added later as this is not essential for the blsync setup which reverse syncs a few hundred slots at most.
  • StateSync fetches partial beacon state proofs with the specified CompactProofFormat for all canonical headers of LightChain and also for the prefetch head.

beacon/light/api

This package implements request functions for the beacon node REST API. Note that in the final light client implementation execution layer requests will also be implemented here (at which point it might be moved under another package and will replace some parts of the current les package).

  • BeaconLightApi implements REST API requests.
  • SyncServer wraps BeaconLightApi and implements request.RequestServer. Note that it is going to have more function once the request delays are used.

beacon/merkle

This package implements merkle proof related tools. Note that these tools do not care about actual data structure definitions but rather about handling, merging, trimming arbitrarily shaped multiproofs. Also note that while they also serve a purpose here, they will make even more sense in the final light client where proof shapes are sometimes procedurally defined, for example when proving the execution block hash of a historical block, through the current beacon state, the historical_roots tree, the old state_roots tree of the given period, and finally the old beacon header and belonging beacon state. Or when servers are syncing up these historical structures from each other, retrieving range proofs for larger sections. On the other hand, they might currently be used unnecessarily in some cases, for example when hashing a Header. In these cases (when there is a fixed and known data structure) github.com/protolambda/zrnt can be used (I want to change this in the current PR).

  • ProofFormat, ProofReader, ProofWriter are abstractions for arbitrarily shaped beacon state proofs.
  • CompactProofFormat is a proof format descriptor defined here and is used both for requesting and storing multiproofs. It is very compact as it only requires two bits per tree node (1/128th the size of the actual nodes) and is very easy to process.
  • MultiProof is a partial beacon state proof with a CompactProofFormat and the corresponding list of tree nodes (Values).

beacon/params

This package defines consensus constant parameters and beacon state field indices.

cmd/blsync

The main package of the blsync executable. The main function creates the chain structures, sets up the scheduler and the sync modules, registers a SyncServer for each beacon API URL specified in the command line and then starts the scheduler.

The two sync modules defined in this package (implement request.Module) are only used by blsync.

  • beaconBlockSync retrieves full beacon blocks for the current validated and prefetch head (typically only the prefetch head if it gets validated later). When successful, it also extracts the Header from the block and adds it to LightChain so that the state proofs can also be prefetched in the ideal case.
  • engineApiUpdater does not make any requests to the REST API but it calls the engine API whenever a new execution block is retrieved and validated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment