This document is the design doc for harmony fastSync.
Note:
- This document inherit a lot of ideas from Ethereum. It would help if you have some knowledge of Ethereum network implementation to better understand this document.
- Please use this document as a collection of potential ideas instead of the final design.
These are the some basic assumptions that have been made for stability of chain consensus.
- Shard committee is not corrupted in a single epoch.
- The Validation of header data can be verified with the combination of header signature and state committee.
- The validation of committee shard can be obtained by syncing beacon chain headers data.
- It could happen that all nodes are malicious. We need to have a way to verify all data based on above three trust basics,
Following is a quick walk through of the overall fast sync logic. The network used for synchronization is libp2p stream (elaborated in section 2) and pubsub messages. The process of state synchronization is better to be implemented with the pipeline pattern to enable non-block processing of sync content to maximize performance.
- The client handshake pick a certain number of nodes from peer discovery in both beacon chain and side chain as host candidates.
- Client also listen to pubsub
node.sync
messages to get the latest blocks from both beacon chain and validating chain.
- In our current implementation, the state of both beacon chain and side chain are synchronized.
- But having a full state in the beacon chain is not necessary for consensus. The only purpose of beacon chain is to provide a valid
stateShard
with proof. - But as long as we hold the three assumptions in chapter 0, we can do validation on side chain with
stateShard
for each epoch. - Which means we only need to synchronize the last beacon header of each epoch, where
stateshard
data can be obtained with verification. - We even don't need crosslinks and staking transactions to synchronize the side chain. Thus what we need from beacon chain is only one header per epoch.
- This will greatly alleviate the network stress for beacon chain synchronization.
Note: This is really a bold move. Need extensive review.
So the overall strategy is as follows:
- Sync the beacon header skeleton which is the last block header in each epoch. (See 1.3)
- Meanwhile, choose to run fastSync or fullSync for downloader synchronization
- If fastSync
- Set the pivot and fast sync the validating chain content. Pivot is a certain number of blocks before the latest block.
- For blocks before the pivot, do not process state, directly insert related data into db. (See 1.4.1)
- For block at the pivot, download state and other content and insert into db. (See 1.4.2)
- For blocks after the pivot, use
insertChain
to process blocks in EVM. (See 1.4.3)
- If fullSync
- Do the logic same as fastSync step 4 for all missing blocks
- If fastSync
Note: Why need both fastSync and fullSync:
Because downloading a full state is costy. Sometimes it may happen that processing all states is cheaper than downloading the whole state. Need to decide the threshold based on some experiment results.
- Beacon Skeleton Header is defined as the last block of an epoch which contains
shardState
data for header verification on both side chain and beacon chain. - Client pick peers with a certain strategy to
- Find the beacon epoch headers that it self don't have (cannot start from the epoch data it possess since a hard fork might have happened)
- Sync the beacon skeleton headers
- Beacon skeleton headers are verified and written to rawdb.
ShardState
obtained by beacon skeleton is also written to rawdb.
-
Pivot is defined as the block which is at a certain number blocks before the latest block.
-
For blocks before the pivot, do not process state. Instead,
- download related data from host directly
- verify header against obtained state shard, and verify rest of them against verified header.
- insert into rawdb directly.
-
Data to be downloaded:
- Header
- Transactions and receipts
- Staking transaction and receipts
- Incoming CX Receipts
-
Data to be written to rawdb
- Header (Canonical chain, e.t.c)
- Crosslinks (from header)
- Slash records (from header)
- VRF, VDF (from header)
- Shard State (from header)
- Block commit sig (from header)
- Epoch Block Number (from header)
- Transactions and receipts (by request)
- Staking transaction and receipts (by request)
- Incoming CXReceipt (by request)
-
All these data can be obtained concurrently from multiple peers.
- Pivot data is crucial. This is the first block which has state.
- It has more data to download compared to block before pivot
- Data to be downloaded (4 of them are same as blocks before pivot):
- Header
- Transactions and receipts
- Staking transaction and receipts
- Incoming and outgoing CX receipts
- Full State
- ValidatorList (TBD)
- Accumulated distributed reward (TBD)
- ValidatorSnapshots with proof in last block at last epoch
- Data to be written to rawdb
- Header (Canonical chain, e.t.c)
- Crosslinks (from header)
- Slash records (from header)
- VRF, VDF (from header)
- Shard State (from header)
- Block commit sig (from header)
- Epoch Block Number (from header)
- Transactions and receipts (by request)
- Staking transaction and receipts (by request)
- Incoming and outgoing CX Receipts (by request)
- ValidatorList (by request)
- ValidatorStat (by computing the diff of the current validatorWrapper against validatorWrapper in last epoch)
- DelegationsByDelegator (by computing from current validatorWrapper)
- Full State must be committed to db.
- After the pivot, each block follows the full synchronization logic, which is to call
blockchain.InsertChain
for block processing. - Data to be downloaded:
- Header
- Transactions
- Staking Transactions
- Incoming CX receipts
- Data to be writtent to rawdb
- Execute the same logic as the current
InsertChain
logic.
- Execute the same logic as the current
Full sync is just the same process as fastSync - Synchronization after pivot (1.4.4)
- After the fastSync, the downloader will change the mode to backupSync.
- In this mode, block synchronization can be done in three cases:
- For shard that is in consensus comittee, the block is directly inserted in consensus module.
- For beacon chain or non-validators node, the block message is get through pubsub. pubsub message will be forwarded to downloader further inserted into chain.
- In other cases that the node realized some missing blocks, downloader start the backupSync task to actively fetch data and feed to downloader to complete synchronization.
- On the other hand, always listen to mainnet pubsub to get the latest committee from beacon chain. If some header is missing from beacon chain, actively request from mainnet nodes.
Any better name for backupSync?
Here we have three options:
- Set the pivot as the block at a certain number before the latest block. E.g 100 blocks.
- Set the pivot as the last block in last epoch.
- Combine 1 and 2, Set the pivot as the larger value of option 1 and 2.
For option 1, since we need validator snapshot to compute APR, it's likely that we also need the data the validatorSnapshot of the block in last epoch. It's possible that we don't have the full state of that block, we need to request the code blob along with the merkle proof from providers. It also means that we need to adopt option 2 in 1.6.2.
For option 2, the good thing is that it does not need to request for additional validatorSnapshot data. But it has two fallbacks:
- The timing is crucial since it might happen that it does not have state data for enough blocks.
- It is also possible that we might need to compute states for the whole epoch. If each state processing takes 100ms, it will take hours process the whole epoch blocks. This is really costy.
For options 3, it get rid of the first fallback from option 2, but still have the same problem as option 2.
I would personally recommend option 1, which might leave use no choice for 1.6.2.
There are two options to compute for the ValidatorList offchain data:
- Iterate over the state to get the validatorList.
- Add a hash of validatorList to the header, then the client can download it from request.
The first solution doesn't need to change anything in code. But the iteration over the whole state will be costy as the state grows larger and larger.
The second solution need to do a hard fork on the header. The good thing is that when finished state, we don't need to iterate over the whole state and still have the data that can be verified.
Accumulated (distributed) reward is used to determine the block reward given in beacon chain. It is recorded in offchain data. The data is not in header and can only be obtained from full state computation.
- Is it true that current new block produced, we don't need the accumulated reward for reward computation?
- It is true that this value is used in some historical blocks?
Based on the answer of the two questions, we have two scenarios:
-
Block reward is not needed even for historical block.
This will be good. Just get rid of the accumulated reward from
Finalize
. -
We don't need it in latest blocks, but it is still needed for historical block.
Thus some blocks can not fastSynced. If the pivot block does not support fastSync (though not likely to happen), we need to disable the fastSync and start fullSync.
This is really important if we can only sync skeleton in beacon chain.
Is it true for all historical blocks that:
- In all historical header versions, committee data exist in last header of the epoch.
- In last header of the epoch, holds the new committee for the next epoch.
In this design, data in network are transfered in two channels:
- Libp2p pubsub
- Libp2p stream
In this part, we will discuss the details of the network data transfer.
Stream and pubsub serves different usage scenarios.
- Stream is used for peer to peer private messages (E.g. state sync from a new comer - The message is needed only for the new comer). No need to flood the whole network with the data which is only meaningful for small number of them.
- Pubsub is used for public messages which expects multiple receivers to handle this message. (E.g. Validator messages for the consensus layer)
Based on different roles, nodes need a different set of network data. Here are a full list of roles. Note one node must be exactly one role.
- Leader
- Validator
- Synced node (non-validator)
- Non-synced node
For each shard nodes, it has the following role transtion:
The protocol is comprised of several types of sub-protocols
Pubsub protocols are just the current protocol running on pubsub. In this design, we will leave pubsub unchanged:
- Consensus.ViewChange / Prepare / Commit...
- Node.Transaction / CXReceipts / Slash...
Providers:
- Consensus messages are provided from validators and leaders
- Other messages are provided by node that receives a corresponding info as block message.
Consumers:
- Node.Sync message is used by all nodes to acknowledge of the latest blocks of a shard.
- Beacon Node.Sync message is also used for other shard nodes to keep track of the latest block of beacon chain.
As described in chapter 1, the downloader module need to fetch a number of blocks to get synced for non-synced nodes, including fast Sync and full Sync.
Providers:
- All validators provide this service for the nodes of validating chain.
- The leader will not provide this service for best performance.
Consumers:
- Node need to validate on a shard chain that it does not have state or possessed state is out of date.
- This include both first started node and node are resharded to another shard.
Used for node sync when node has arrived at the latest state. The sync serves as a backup is case that some blocks are missing from pubsub.
Providers:
- All validators provide this service for the nodes of validating chain.
- The leader will not provide this service for best performance.
Consumers:
- Consumed by nodes that missed some pubsub message and neet to fetch missed body message.
Note that only BackupSync protocol is bidirectionaly - each side can be both provider and consumer.
Used for beacon header sync. Only last header in an epoch on beacon chain is provided.
Providers:
- All validators on beacon chain
- Not beacon leader
Consumers:
- All nodes other than beacon leader.
- Pubsub * 4
- HeavySync Host * 4 shard
- HeavySync Client * 4 shard
- BackupSync * 4
- Ultra-light Host * 1
- Ultra-light Client * 1
This part defines the protocol strategy in POV of each role so that the network reaches expected topology. Note that the number of connection is an illustration. The final parameters need to be tuned according to performance in testing.
Also note that here assume only the validators will provide full sync & fast sync. This is optional for non-validators since other node may have more trust on validator nodes.
Pubsub (side chain) | Pubsub (beacon chain) | HeavySync | BackupSync | Ultra-light (on beacon) | |
---|---|---|---|---|---|
Provide Data | False | / | False | False | False |
Advertise | False | / | False | False | False |
Consume | True | True | True | False | True if validating side |
Num streams | N/A | N/A | 32-64 | 0 | 2~4 |
Pubsub (side chain) | Pubsub (beacon chain) | HeavySync | BackupSync | Ultra-light (on beacon) | |
---|---|---|---|---|---|
Provide Data | True | / | True | True | True if validating beacon |
Advertise | True | / | False | False | False |
Consume | True | True | False | True | True if validating side |
Num streams | N/A | N/A | 0~16 | 2~4 | 2-4 as consumer; 0-128 as provider |
Note: whether to advertise in side chain pubsub is an option here. Need to be determined.
Pubsub (side chain) | Pubsub (beacon chain) | HeavySync | BackupSync | Ultra-light (on beacon) | |
---|---|---|---|---|---|
Provide Data | True | / | True | True | True if validating beacon |
Advertise | True | / | True | True | True if validating beacon |
Consume | True | True | False | True | True if validating side |
Num streams | N/A | N/A | 0~16 | 2~4 | 2-4 as consumer; 0-128 as provider |
Pubsub (side chain) | Pubsub (beacon chain) | FastSync | FullSync | Ultra-light (on beacon) | |
---|---|---|---|---|---|
Provide Data | True | / | False | False | False |
Advertise | True | / | False | False | False |
Consume | True | True | False | False | True if validating side |
Num streams | N/A | N/A | 0 | 0 | 2~4 (only as consumer) |
Note that in this stream, only FullSync protocol is bi-directional - have both incoming and outgoing requests. Other than that, there is always one provider and one consumer for each stream protocol.
- There can not be multiple protocols running on a stream. There must be only one protocol for each stream.
- The stream can switch type. E.g. A stream running fastSync can be transfered to a fullSync protocol when fastSync finishes.
- Why we need to always keep 4~8 fullSync stream? Because it might happen that a node is out of sync and it need to get to the latest block. On the other hand, peer discovery and setting up stream connection is costy. So it would be better to keep a minumun number of stream to rely on for fastSync content.
- Based on the expected network topology in the role POV, we need a
peerManager
to initiate and manage all peer connections with the complicated role based stream policies. This will be discussed in the later chapters.
Synchronization protocol sits on libp2p stream to enable write / read async to maximize the usage of the bandwidth.
Following is a comparison between read/write Sync vs Async.
We can see that if the network latency is the major delay in protocol, reading / writing with async can fully utilize the bandwidth in bi-direction and get higher throughput. Thus read / write async through stream is recommended.
Compared options: 1) use RLP. 2) Use Protobuf.
In this synchronization protocol, RLP is prefered to be used here since:
- RLP can deal with nested structure. There could exist some nested data structure to be serialized in protocol. e.g TrieNode.
- There is some existing RLP encoding / decoding rules has been implemented in codebase. E.g. Header, trie.Node, Block, Transactions, e.t.c.
- Implementing RLP encoding / decoding rule in golang is simpler than using protobuf.
On the other side, Protobuf outweights RLP in:
- Protobuf has better performance.
So for the underlying serialization algorithm in stream data, RLP has more advantage and is recommended in this scenario.
More readings: Ethereum rlp design idea.
When a new stream connection is setup from libp2p host, we can run stream handler function on the stream established. The stream handler does the following:
- After the stream is setup, it is first filtered by middleware, which is basically a blacklist to screen out blacklisted peers.
- Then the stream run a handshake process - both sides send out version metadata, running protocols, e.t.c and verify.
- Then check whether the given metadata is accepted.
- Run the corresponding protocol on the stream which is a forever loop to handle messages. For each loop, the request is also filtered by middleware for limiting requests.
- If an error is detected when running the protocol, close the stream connection and return.
Following we will discuss the components to the stream handler.
Here middleware is serves as two purpose:
- Blacklister to filter blacklisted peers.
- Limit the request calls from a certain peer.
- Access log recorder: update peer metadata (e.g last request for LRU replacement)
Blacklister:
- Blacklister is a module to read and write black listed peers.
- Blacklister manage entries indexed by
libp2p.PeerID
and each is associated with a timestamp, indicating the time when the access of the PeerID is no longer forbidden. (time out of jail) - Blacklister will periodically remove stale entries and persist the data entries to local persistence file.
- Blacklister provide the interface to see whether a give peerID is blacklisted for now.
- Blacklister is called in two cases. Each case should have a predefined jail period.
- Timeout of request from downloader module.
- Content error from downloader module.
- User added nodes should be removed from blacklist and shall not be blacklisted
- Blacklister shall be thread-safe to use.
Limiter:
- Limit the request with some weight by request type.
- The limiter policy is customized for each sub-protocol.
(TODO) it would be great if libp2p have some middleware options. Explore libp2p repo to get an authenticate middleware.
Handshake process is to check the new streams with node that is compatible with self configuration:
- Exchange Harmony stream protocol version and check compatibility.
- Only the overall protocol version is checked.
- Exchange protocol metadata and check compatibility
- Network ID
- Genesis block hash
- Running sub-protocol specifier, version and metadata
- Parse the stream to peer manager to decide whether to run the sub-protocol. If number of peers of the sub-protocol reaches the hard upper limit based on the peer management policy, the peer manager will immediately reject and close the stream.
- Run the corresponding sub-procotol content to handle requests.
Peer management is to ensure that the network topology described in 2.1.4 is achieved.
-
When a new stream is setup, it determines whether to run the subprocotol according to different peer management policies. If the number of peers running a certain sub-protocol reaches the hard limit, do not accept connection. This is handled in handshake process.
-
When a stream is closed, if the left number of nodes running sub-protocol reaches hard lower limit, spin up a maintenance thread.
-
Periodically spin up a maintenance thread to keep number of peers as expected. Illustrated in 2.2.7.
-
Dynamically register peer management policy. Provide public methods to be called to handle role change events. If a role change happens, discard the maintenance in progress and spin up a new maintenance loop according to the new peer management policy.
-
The different management policies for different roles should be implemented as an interface according to 2.1.4.
The maintenance loop can be awaken in six cases:
- A peer stream is closed (from 2.3.3).
- Periodically triggerred every say, 60 seconds.
- A role change event is received.
For each maintenance process, the following logic will be executed:
- For each subprotocol,
- Check whether the peer counts is smaller than soft lower limit. If true, execute a batch of peer discovery tasks to discover more peers to reach the expected connected peers.
- Check whether the peer counts is greater than soft higher maximum. If true, remove peers from from the head of LRU entries to reach the expected connected peers.
All validation steps are done in upper function calls.
In this section, three examples will be given to illustrate the process of peer management:
- New node joining for validating shard 1.
- Node role switching from validator of shard 1 to leader of shard 1.
- Node receives a resharding task from shard 1 to shard 2.
-
When new node starts, a role change event (non-synced) is sent to peer manager.
-
Peer manager spin up the maintenance process.
-
Currently have 0 connected peers, and the node need 32-64 heavy sync connection in shard 1 and 2~4 ultra-light connections.
-
Thus the maintenance will trigger discovery for a number of heavy sync and ultra-light connections.
-
When node stream is set up, emit the new node event to other modules.
- When a view change is triggered and the node found itself is the leader node, inform the network peer manager of the event.
- The peer manager spin up the maintenance process.
- The node current have 2 heavy sync on shard 1, 2 backup sync on shard 1 and 2 ultralight sync stream. (Just for example)
- Checking the peer management policy of the leader, we need to remove 2 heavy sync and 2 backup sync.
- Send a close signal to the StreamHandle function of 4 nodes in total.
- After the stream handler is closed, remove the peer from peer manager and emit a stream close event.
- Resharding inform the network of the resharding event.
- The peer manager spin up the maintenance process.
- The node current have 2 heavy sync, 2 backup sync on shard 1 and 2 ultralight sync stream.
- Checking the new peer management policy, it needs 0
0 heavy sync on shard 1, 00 backup sync on shard 1, 016 heavy sync on shard 2, 24 backup sync on shard 2 and 2 ultra light sync stream. - Thus it will
- terminate 2 heavy sync stream
- terminate 2 backup sync stream
- Peer discover of size 4 with backup sync on shard 2.
- Emit the new peer or peer closed event.
For the peer management policy for each sub-procotol, it is encouraged to have both soft limits and hard limits.
- Hard limit is the limit that shall not reached. If the peer count falls outside of the hard limit, an action is required immediately.
- Soft limit is the limit set for periodic maintenance. It falls within the range of hard limit. It could happen that a certain value reaches soft limit has not reached hard limit. If this happen, will not enforce an immediate action but will do in the periodic maintenance process.
This section will illustrate the how synchronize requests are requested and received.
Pubsub should work just as the current logic:
- Node.Start will register the handler for pubsub messages.
- After validation, the information is dispatched to different modules.
- In this design, the Node.Sync message will be also delivered to downloader module for further handling.
First layout the flow of synchronize requests:
- Downloader needs certain data to process for chain syncing. So it will call some method of
peer
in 2.3 which wraps Stream, to write request data of corresponding types to the Stream. This method is called in downloader module as the initiater of a sync request. - The method is read by host
StreamHandler
. It will then fetch related data from blockchain and write response to the stream. - The response is read by the client. It will dispatch the response to downloader module non-block.
Ultra-light sync protocol is the simplist one - getLastHeaderByEpoch
.
- Client send out requests with a list of epoch ids.
- Host retrieve the last header in the epoch, and send back to client
- Client receive the host's response. Directly deliver to the downloader module.
Backup Sync is also simple. It is needed when a client missed a pubsub message. It is also simple and supports only one requests:
- Client send out the request of a block numbers to obtain the target block.
- Host read the block from blockchain and send via stream.
- Client receive the block and deliver to the downloader module directly.
HeavySync could be either fast Sync or full sync from client's view. But the protocol they are running are actually exactly the same - They are handling the same type of requests. The only difference is that there are some request types the client will never call in full sync mode. So here we will put all request types here and
Here defines all the request and corresponding handling logics over p2p stream connection. All responses received by client will be delivered to downloader module directly.
Parameters: StartBlockNumber, size, interval
Return: A list of headers starting from StartBlockNumber, with interval of target size.
Parameter: header hash
Return:
- Transactions of the block
- Staking transactions of block
- Incoming CX receipts of block
Parameter: header hash
Return:
- Transaction receipts
- Staking transaction receipts
Parameter: a list of node hashes
Return: A list node rap values corresponding to the node hashes in the request.
After the client received the response, directly deliver to the downloader module for downloader module.
Parameter: blockHash
Return: A list of bls public keys for stateShard from the block
Client deliver to downloader module.
This is used for get the validatorSnapshot when the state does not exist.
Parameter: block number, list(address)
Return: A list of the following structure
- State Object {nonce, codeHash, storageHash, balance}
- code blob
- Merkle proof of the state object to the state root.
When the node start up, it does not have the latest shard state, thus it is not able to validate any pubsub message against state shard (there could be malicious message in pubsub). It also shall not ask a random node to get the latest state shard since this will make the node prune to eclipse attack.
Options 1: The solution is that a node only start pubsub message handling when it has finished syncing beacon skeleton. This also have some correlation with the consensus initialize. I will leave these topics to experts.
Option 2: Still use a centralized service - provide a set of trusted nodes for latest shard state to get bumped in.
Looking for better names:
- Chain that only have last headers of epoches. Candidates:
- Skeleton Chain (Sounds creepy)
- Sub-protocol used for synchronization in beacon headers
- Skeleton Sync (creepy)
- Ultra-light Sync (too long)
- Sub-protocol used once missing blocks from pubsub
- Backup Sync
Protobuf / RLP
If we are using Protobuf, most of the data fields are still RLP encoded data.
It is true that a node would prefer a validator node to a non-validator node for synchronization. Thus we can enforce only validators can provide HeavySync and Ultra-light sync.
The gain is huge:
This will make the eclipse attack super expensive and we can almost get rid of this attack approach.
In order to do this, we need some updates:
- (Sanity) Non-validators does not provide heavySync, ultraSync service and do not advertise.
- Some BackupSync protocol might become host/client model. Need extra logic to deal with this sync.
- Enforce a bls key signature check for client during the handshake process.
But the drawback is also huge:
- Large amount of work.
- Node has no way to know about the latest state shard without ultra-light sync. Thus we cannot enforce validator identity on the initial ultra-light sync connections. (This is a paradox of chicken and egg).
- Node cannot setup the HeavySync streams until it has finished beacon syncing. Otherwise, these two can be done concurrently.
- Network layer also need to deal with an additional event - epoch change events. If a state shard is changed, some connected peers can be unelected and the stream need to be closed.
- An additional BLS sign and verification step is needed in stream setup, which will open some opportunity for CPU intensive DDOS attack.
... TO BE CONTINUED
- Chain logic change
3.1 Skeleton Chain
3.2 Header Chain
3.3 Receipt Chain
3.4 Offchain data
- Syncer
4.1 Overview
4.2 Skeleton Syncer
4.3 Full Syncer
4.3.1 Detailed process
4.3.2 header fetcher
In an effort to decoupling.