JackyWYX/Design.md

## Design.md

      
    Raw
  

              Design.md
            
          
    Harmony FastSync Design (Overview and Network)

This document is the design doc for harmony FastSync.
0 Overview

0.1 Blockchain state

Blockchain is a chain of blocks. Each block has a certain state to preserve all account data under that block. For example, account balance, nonce, code, storage.

In order to validate the block, we must have the latest state to validate and process transactions.
0.2 Different ways to sync

Currently in our implementation, there are two ways to get to the latest state:

Starting from genesis block, process all transactions for all blocks till the latest.
Using rclone solution, download the database snapshot from other nodes and only process the latest block.

These two solutions has limitations:

The first solution, apparently is time consuming. Might take days to get to the latest block.
For the rclone solution, since there is no validation on the rclone data, it requires trust over the centralized service.

In order to address these limitations of these two approach, we propose fastSync, which is basically:

Pick a pivot point that is not too long before the latest block.
Download and store the headers, transactions, receipts, e.t.c. Don't process the transactions to save time.
Download the full state for block at pivot with validation through decentralized network and do verification.
Process transactions for the last mile blocks.

Thus the fastSync can have the following property:

It's a decentralized service with validation. No trust is assumed.
The pivot is automatically configured. So there is no need to manually do the snapshot (although this is still needed for passive sync)

0.3 FastSync and Resharding

FastSync is also needed for resharding. Resharding is the mechanism to address the corruption of the shard committee. Once a node is resharded at the end of an epoch, the bls keys of that node will be assigned to a different shard, thus need to quickly catch up with the latest block in the newly assigned shard. So here is the fast sync coming in - FastSync serves the purpose to quickly sync to the chain a new shard.
0.4 Assumptions

These are the three basic assumptions that have been made for the stable chain consensus.

Shard committee is not corrupted in a single epoch.
The Validation of header data can be verified with the combination of header signature and state committee.
The validation of committee shard can be obtained by syncing beacon chain headers data.

1 Sync logic walk through

Following is a quick walk through of the overall fast sync logic. The network used for synchronization is libp2p stream (elaborated in section 2) and pubsub messages. The process of state synchronization is better to be implemented with the pipeline pattern to enable non-block processing of sync content to maximize performance.
1.1 Sync process start up


The client handshake pick a certain number of nodes from peer discovery in both beacon chain and side chain as host candidates.
Client also listen to pubsub node.sync messages from beacon chain to get the latest state shard.

1.2 Syncing beacon chain


In our current implementation, the state of both beacon chain and shard chain are synchronized.
But having a full state in the beacon chain is not necessary for consensus. The only purpose of beacon chain is to provide a valid stateShard with proof.
But as long as we hold the three assumptions in chapter 0, we can do validation on side chain with stateShard for each epoch.
Which means we only need to synchronize the last beacon header of each epoch, where stateshard data can be obtained with verification.
We even don't need crosslinks and staking transactions to synchronize the side chain. Thus what we need from beacon chain is only one header per epoch.
This will greatly alleviate the network stress for beacon chain synchronization.


Note: This is really a bold move. Need extensive review.

So the overall strategy is as follows:
Overall logic


Sync the beacon header skeleton which is the last block header in each epoch. (See 1.3)
Meanwhile, choose to run ActiveSync protocol - fastSync or fullSync for downloader synchronization

If fastSync

Set the pivot and fast sync the validating chain content. Pivot is the last block of the previous epoch.
For blocks before the pivot, do not process state, directly insert related data into db. (See 1.4.1)
For block at the pivot, download state and other content and then insert into db. (See 1.4.2)
For blocks after the pivot, use insertChain to process block transactions in EVM. (See 1.4.3)


If fullSync

Do the logic same as fastSync step 4 for all missing blocks


Note: Why need both fastSync and fullSync:
Because downloading a full state is costy. Sometimes it may happen that processing all states is cheaper than downloading the whole state. Need to decide the threshold based on some experiment results.

1.3 Sync beacon header skeleton


Beacon Skeleton Header is defined as the last block of an epoch which contains shardState data for header verification on both shard chain and beacon chain.
Client pick peers with a certain strategy to

Find the beacon epoch headers that itself don't have (cannot start from the epoch data it possess since a hard fork might have happened)
Sync the beacon skeleton headers


Beacon skeleton headers are verified and written to rawdb.
ShardState obtained by beacon skeleton is also written to rawdb.

1.4 Active Sync

Active Sync is started as a node has blocks way behind the latest blocks.
There are two sync modes for active sync: fastSync and fullSync.
1.4.1 FastSync

1.4.1.1 Synchronization before pivot


Pivot is defined as the block which is at a certain number blocks before the latest block.


For blocks before the pivot, do not process state. Instead,

download related data from host directly
verify header against obtained state shard, and verify rest of them against verified header.
insert into rawdb directly.


Data to be downloaded:

Header
Transactions and receipts
Staking transaction and receipts
Incoming CX Receipts


Data to be written to rawdb

Header (Canonical chain, e.t.c)
Crosslinks (from header)
Slash records (from header)
VRF, VDF (from header)
Shard State (from header)
Block commit sig (from header)
Epoch Block Number (from header)
Transactions and receipts (by request)
Staking transaction and receipts (by request)
Incoming CXReceipt (by request)


All these data can be obtained concurrently from multiple peers.


1.4.1.2 Synchronization at pivot


Pivot data is crucial. This is the first block which has state.
It has more data to download compared to block before pivot
Data to be downloaded (4 of them are same as blocks before pivot):

Header
Transactions and receipts
Staking transaction and receipts
Incoming and outgoing CX receipts
Full State


Data to be written to rawdb

Header (Canonical chain, e.t.c)
Crosslinks (from header)
Slash records (from header)
VRF, VDF (from header)
Shard State (from header)
Block commit sig (from header)
Epoch Block Number (from header)
Transactions and receipts (by request)
Staking transaction and receipts (by request)
Incoming and outgoing CX Receipts (by request)
ValidatorList (by iterating over state)
ValidatorStat (by computing the diff of the current validatorWrapper against validatorWrapper in last epoch)
DelegationsByDelegator (by computing from current validatorWrapper)
Full State must be committed to db.


1.4.1.3 Synchronization after pivot


After the pivot, each block follows the full synchronization logic, which is to call blockchain.InsertChain for block processing.
Data to be downloaded:

Header
Transactions
Staking Transactions
Incoming CX receipts


Data to be written to rawdb

Execute the same logic as the current InsertChain logic.


1.4.2 Full Sync

Full sync is just the same process as fastSync - Synchronization after pivot (1.4.4)
1.5 Post ActiveSync - PassiveSync


After the ActiveSync, the downloader will change the mode to passiveSync.
In this mode, block synchronization is passively synced :

For nodes running validator, the block is directly inserted in consensus module.
If some blocks are missing, a download request will be sent from the consensus module to downloader process the missing blocks.


On the other hand, always listen to mainnet beacon pubsub to get the latest committee from beacon chain. If some header is missing from beacon chain, actively request from mainnet nodes.

1.6 Additional topics

1.6.1 How do we determine the pivot?

Decision

Set the pivot as the last block in last epoch.
Options

Here we have three options:

Set the pivot as the block at a certain number before the latest block. E.g 100 blocks.
Set the pivot as the last block in last epoch.
Combine 1 and 2, Set the pivot as the larger value of option 1 and 2.

For option 1, since we need validator snapshot to compute APR, it's likely that we also need the data the validatorSnapshot of the block in last epoch. It's possible that we don't have the full state of that block, we need to request the code blob along with the merkle proof from providers. It also means that we need to adopt option 2 in 1.6.2.
For option 2, the good thing is that it does not need to request for additional validatorSnapshot data. But it has two fallbacks:

The timing is crucial since it might happen that it does not have state data for enough blocks.
It is also possible that we might need to compute states for the whole epoch. If each state processing takes 100ms, it will take hours process the whole epoch blocks. This is really costy.

For options 3, it get rid of the first fallback from option 2, but still have the same problem as option 2.
1.6.2 Do we need ValidatorListHash in header?

Decision

We don't need to convert validator list from offchain to on chain. It's way too risky.
Options

There are two options to compute for the ValidatorList offchain data:

Iterate over the state to get the validatorList.
Add a hash of validatorList to the header, then the client can download it from request.

The first solution doesn't need to change anything in code. But the iteration over the whole state will be costy as the state grows larger and larger.
The second solution need to do a hard fork on the header. The good thing is that when finished state, we don't need to iterate over the whole state and still have the data that can be verified.
1.6.3 Do we need accumulated reward in header?

Decision

No need to concern accumulated rewards.
There is no fork change. it was a design change before the launch
Options

Accumulated (distributed) reward is used to determine the block reward given in beacon chain. It is recorded in offchain data. The data is not in header and can only be obtained from full state computation.

Is it true that current new block produced, we don't need the accumulated reward for reward computation?
It is true that this value is used in some historical blocks?

Based on the answer of the two questions, we have two scenarios:


Block reward is not needed even for historical block.
This will be good. Just get rid of the accumulated reward from Finalize.


We don't need it in latest blocks, but it is still needed for historical block.
Thus some blocks can not fastSynced. If the pivot block does not support fastSync (though not likely to happen), we need to disable the fastSync and start fullSync.


1.6.4 Need confirm: last beacon header in epoch forms a valid chain with proof.

This is really important if we can only sync skeleton in beacon chain.
Is it true for all historical blocks that:

In all historical header versions, committee data exist in last header of the epoch.
In last header of the epoch, holds the new committee for the next epoch.

Answer:

yes, these are all true. Basically, you can get the last block of each epoch and get the committee for next epoch to verify the next epoch block.
2. Harmony full-node protocol

In this design, data in network are transfered in two channels:

Libp2p pubsub
Libp2p stream

In this part, we will discuss the details of the network data transfer.
2.1 Overview

2.1.1 Two network data: stream and pubsub

Stream and pubsub serves different usage scenarios.

Stream is used for peer to peer private messages (E.g. state sync from a new comer - The message is needed only for the new comer). No need to flood the whole network with the data which is only meaningful for small number of them.
Pubsub is used for public messages which expects multiple receivers to handle this message. (E.g. Validator messages for the consensus layer)

2.1.2 How many different roles in network?

Based on different roles, nodes need a different set of network data. Here are a full list of roles. Note one node must be exactly one role.

Leader
Validator
Synced node (non-validator)
Non-synced node

For each shard nodes, it has the following role transtion:

2.1.3 sub-protocols

The protocol is comprised of several types of sub-protocols.
1. PubSub protocols

Pubsub protocols are just the current protocol running on pubsub. In this design, we will leave pubsub unchanged:

Consensus.ViewChange / Prepare / Commit...
Node.Transaction / CXReceipts / Slash...

Providers:

Consensus messages are provided from validators and leaders
Other messages are provided by node that receives a corresponding info as block message.

Consumers:

Node.Sync message is used by all nodes to acknowledge of the latest blocks of a shard.
Beacon Node.Sync message is also used for other shard nodes to keep track of the latest block of beacon chain.

2.  ActiveSync protocols (Stream)

As described in chapter 1, the downloader module need to fetch a number of blocks to get synced for non-synced nodes, including fast Sync and full Sync.
Providers:

All validators provide this service for the nodes of validating chain.
The leader will not provide this service for best performance.

Consumers:

Node need to validate on a shard chain that it does not have state or possessed state is out of date.
This include both first started node and node are resharded to another shard.

3. PassiveSync protocol (Stream)

Used for node sync when node has arrived at the latest state. The sync serves as a backup in case that some blocks are missing from pubsub.
Providers:

All validators provide this service for the nodes of validating chain.
The leader will not provide this service for best performance.

Consumers:

Consumed by nodes that missed some pubsub message and neet to fetch missed body message.


Note that only PassiveSync protocol is bidirectionaly - each side can be both provider and consumer.

4. BeaconSync protocol (Stream)

Used for beacon header sync. Only last header in an epoch on beacon chain is requested and provided.
Providers:

All validators on beacon chain
Not beacon leader

Consumers:

All nodes other than beacon leader.

In total, we have sub-protocols:


Pubsub * 4 shard
ActiveSync Host * 4 shard
ActiveSync Client * 4 shard
PassiveSync * 4
BeaconSync Host * 1
BeaconSync Client * 1

2.1.4 Protocol role POV

This part defines the protocol strategy in POV of each role so that the network reaches expected topology. Note that the number of connection is an illustration. The final parameters need to be tuned according to performance in experiments.
Also note that here assume only the validators will provide full sync & fast sync. This is optional for non-validators since other node may have more trust on validator nodes.
1. Non-synced


Pubsub (side chain)
Pubsub (beacon chain)
ActiveSync
PassiveSync
BeaconSync


Provide Data
False
/
False
False
False


Advertise
False
/
False
False
False


Consume
True
True
True
False
True if validating side


Num streams
N/A
N/A
32-64
0
2-4


2. Non-Validator


Pubsub (side chain)
Pubsub (beacon chain)
ActiveSync
PassiveSync
BeaconSync


Provide Data
True
/
True
True
True if validating beacon


Advertise
True
/
True
True
True if validating beacon


Consume
True
True
False
True
True if validating shard


Num streams
N/A
N/A
0~16
2~4
24 as consumer; 0128 as provider


Note: whether to advertise in side chain pubsub is an option here. Need to be determined.
3. Validator


Pubsub (side chain)
Pubsub (beacon chain)
ActiveSync
PassiveSync
BeaconSync


Provide Data
True
/
True
True
True if validating beacon


Advertise
True
/
True
True
True if validating beacon


Consume
True
True
False
True
True if validating shard chain


Num streams
N/A
N/A
0-16
2-4
2-4 as consumer; 0-128 as provider


4. Leader


Pubsub (side chain)
Pubsub (beacon chain)
ActiveSync
PassiveSync
BeaconSync


Provide Data
True
/
False
False
False


Advertise
True
/
False
False
False


Consume
True
True
False
False
True if validating shard chain


Num streams
N/A
N/A
0
0
2-4 (only as consumer)


Note that in this stream, only PassiveSync protocol is bi-directional - having both incoming and outgoing requests. Other than that, there is always one provider and one consumer for each stream protocol.
2.1.4 More specifications on the stream


There is only one protocol running on each stream.
Why we need to always keep 2-4 PassiveSync stream? Because it might happen that a node is out of sync and it need to get to the latest block. On the other hand, peer discovery and setting up stream connection is costy. So it would be better to keep a minumun number of stream to rely on for syncing content.
Based on the expected network topology in the role POV, we need a peerManager to initiate and manage all peer connections with the complicated role based stream policies. This will be discussed in the later chapters.

2.2 Stream Handler

2.2.1 Read / Write Async

Synchronization protocol sits on libp2p stream to enable write / read async to maximize the usage of the bandwidth.
Following is a comparison between read/write Sync vs Async.


We can see that if the network latency is the major delay in protocol, reading / writing with async can fully utilize the bandwidth in bi-direction and get higher throughput. Thus read / write async through stream is recommended.
2.2.2 Serialization

Compared options: 1) use RLP. 2) Use Protobuf.
In this synchronization protocol, RLP is prefered to be used here since:

RLP can deal with nested structure. There could exist some nested data structure to be serialized in protocol. e.g TrieNode.
There is some existing RLP encoding / decoding rules has been implemented in codebase. E.g. Header, trie.Node, Block, Transactions, e.t.c.
Implementing RLP encoding / decoding rule in golang is simpler than using protobuf.

On the other side, Protobuf outweights RLP in:

Protobuf has better performance.

More readings: Ethereum rlp design idea.
Decision

For data structures that's already have RLP encoding/decoding rules, use the existing RLP rules (E.g. Headers, trie.Node, e.t.c).
Other than that use protobuf as possible.
2.2.3 Stream handler

When a new stream connection is setup from libp2p host, we can run stream handler function on the stream established. The stream handler does the following:

After the stream is setup, it is first filtered by middleware, which is basically a blacklist to screen out blacklisted peers.
Then the stream run a handshake process - both sides send out version metadata, running protocols, e.t.c and verify.
Then check whether the given metadata is accepted.
Run the corresponding protocol on the stream which is a forever loop to handle messages. For each loop, the request is also filtered by middleware for limiting requests.
If an error is detected when running the protocol, close the stream connection and return.

Following we will discuss the components in the stream handler.
2.2.4 Middleware

Here middleware is serves as two purpose:

Blacklister to filter blacklisted peers.
Limit the request calls from a certain peer.
Access log recorder: update peer metadata (e.g last request for LRU replacement)

Blacklister:

Blacklister is a module to read and write black listed peers.
Blacklister manage entries indexed by libp2p.PeerID and each is associated with a timestamp, indicating the time when the access of the PeerID is no longer forbidden. (time out of jail)
Blacklister will periodically remove stale entries and persist the data entries to local persistence file.
Blacklister provide the interface to see whether a give peerID is blacklisted for now.
Blacklister is called in two cases. Each case should have a predefined jail period.

Timeout of request from downloader module.
Content error from downloader module.
User added nodes should be removed from blacklist and shall not be blacklisted


Blacklister shall be thread-safe to use.

Limiter:

Limit the request with some weight by request type.
The limiter policy is customized for each sub-protocol.


(TODO) it would be great if libp2p have some middleware options. Investigate libp2p repo to get an authenticated middleware.

2.2.5 Handshake

Handshake process is to check the new streams with node that is compatible with self configuration:

Exchange Harmony stream protocol version and check compatibility.

Only the overall protocol version is checked.


Exchange protocol metadata and check compatibility

Network ID
Genesis block hash
Running sub-protocol specifier, version and metadata


Parse the stream to peer manager to decide whether to run the sub-protocol. If number of peers of the sub-protocol reaches the hard upper limit based on the peer management policy, the peer manager will immediately reject and close the stream.
Run the corresponding sub-procotol content to handle requests.

2.2.6 Peer management

Peer management is to ensure that the network topology described in 2.1.4 is achieved.

When a new stream is setup, it determines whether to run the subprocotol according to different peer management policies. If the number of peers running a certain sub-protocol reaches the hard limit, do not accept connection. This is handled in handshake process.
When a stream is closed, if the left number of nodes running sub-protocol reaches hard lower limit, spin up a maintenance thread.
Periodically spin up a maintenance thread to keep number of peers as expected. Illustrated in 2.2.7.
Dynamically register peer management policy. Provide public methods to be called to handle role change events. If a role change happens, discard the maintenance in progress and spin up a new maintenance loop according to the new peer management policy.
The different management policies for different roles should be implemented as an interface according to 2.1.4.

2.2.7 Maintenance policies

The maintenance loop can be awaken in six cases:

A peer stream is closed (from 2.3.3).
Periodically triggerred every say, 60 seconds.
A role change event is received.

For each maintenance process, the following logic will be executed:

For each subprotocol,

Check whether the peer counts is smaller than soft lower limit. If true, execute a batch of peer discovery tasks to discover more peers to reach the expected connected peers.
Check whether the peer counts is greater than soft higher maximum. If true, remove peers from from the head of LRU entries to reach the expected connected peers.


2.2.9 Error handling

All validation steps are done in upper function calls. If an error is found, the error pass to peerManager, and the peerManager will stop the connection for the error peers.
2.2.8 Example for node type transfer

In this section, three examples will be given to illustrate the process of peer management:

New node joining for validating shard 1.
Node role switching from validator of shard 1 to leader of shard 1.
Node receives a resharding task from shard 1 to shard 2.

2.2.8.1 New node joining shard 1


When new node starts, a role change event (non-synced) is sent to peer manager.


Peer manager spin up the maintenance process.


Currently have 0 connected peers, and the node need 32-64 active sync connection in shard 1 and 2~4 BeaconSync connections.


Thus the maintenance will trigger discovery for a number of active sync and BeaconSync connections.


When node stream is set up, emit the new node event to other modules.


2.2.8.2 Node role switching from validator to leader (shard 1)


When a view change is triggered and the node found itself is the leader node, inform the network peer manager of the event.
The peer manager spin up the maintenance process.
The node current have 2 active sync on shard 1, 2 passive sync on shard 1 and 2 BeaconSync stream. (Just for example)
Checking the peer management policy of the leader, we need to remove 2 active sync and 2 passive sync.
Send a close signal to the StreamHandle function of 4 nodes in total.
After the stream handler is closed, remove the peer from peer manager and emit a stream close event.

2.2.8.3 Node role switching from validator shard 1 to validator in shard 2


Resharding inform the network of the resharding event.
The peer manager spin up the maintenance process.
The node current have 2 active sync, 2 passive sync on shard 1 and 2 BeaconSync stream.
Checking the peer management policy of the leader, we need to remove 2 active sync and 2 passive sync.
Thus it will

terminate 2 active sync stream
terminate 2 passive sync stream
Peer discover of size 4 with passive sync on shard 2.


Emit the new peer or peer closed event.

2.2.8.4 Soft limits and hard limits

For the peer management policy for each sub-procotol, it is encouraged to have both soft limits and hard limits.

Hard limit is the limit that shall not reached. If the peer count falls outside of the hard limit, an action is required immediately.
Soft limit is the limit set for periodic maintenance. It falls within the range of hard limit. It could happen that a certain value reaches soft limit has not reached hard limit. If this happen, will not enforce an immediate action but will do in the periodic maintenance process.

2.3 Sub-protocols

This section will illustrate the how synchronize requests are requested and received.
2.3.1 Pubsub protocol

Pubsub should work just as the current logic:

Node.Start will register the handler for pubsub messages.
After validation, the information is dispatched to different modules.
In this design, the Node.Sync message will be also delivered to downloader module for further handling.

2.3.2 Stream requests flow

First layout the flow of synchronize requests:

Downloader needs certain data to process for chain syncing. So it will call some method of peer in 2.3 which wraps Stream, to write request data of corresponding types to the Stream. This method is called in downloader module as the initiater of a sync request.
The method is read by host StreamHandler. It will then fetch related data from blockchain and write response to the stream.
The response is read by the client. It will dispatch the response to downloader module non-block.

2.4.2 Beacon sync

Beacon sync protocol is the simplist one - getLastHeaderByEpoch.
2.4.2.1 getLastHeadersByEpoch


Client send out requests with a list of epoch ids.
Host retrieve the last header in the epoch, and send back to client
Client receive the host's response. Directly deliver to the downloader module.

2.4.3 Passive Sync

Passive Sync is also simple. It is needed when a client missed a pubsub message. It is also simple and supports only one requests:
2.4.3.1 getBlockByNumber


Client send out the request of a block numbers to obtain the target block.
Host read the block from blockchain and send via stream.
Client receive the block and deliver to the downloader module directly.

2.4.4 Active Sync

ActiveSync could be either fast Sync or full sync from client's view. But the protocol they are running are actually exactly the same - They are handling the same type of requests. The only difference is that there are some request types the client will never call in full sync mode. So here we will put all request types here and
2.4.3 Protocols

Here defines all the request and corresponding handling logics over p2p stream connection. All responses received by client will be delivered to downloader module directly.
2.4.3.1 getHeadersByNumber

Parameters: StartBlockNumber, size, interval
Return: A list of headers starting from StartBlockNumber, with interval of target size.
2.4.3.2 getBlockBody

Parameter: header hash
Return:

Transactions of the block
Staking transactions of block
Incoming CX receipts of block

2.4.3.3 getReceipts

Parameter: header hash
Return:

Transaction receipts
Staking transaction receipts

2.4.3.4 getStateNode

Parameter: a list of node hashes
Return: A list node rap values corresponding to the node hashes in the request.
After the client received the response, directly deliver to the downloader module for downloader module.
2.5 Topics to be discussed

2.5.1 Defer pubsub message handling

When the node start up, it does not have the latest shard state, thus it is not able to validate any pubsub message against state shard (there could be malicious message in pubsub). It also shall not ask a random node to get the latest state shard since this will make the node prune to eclipse attack.
Options 1: The solution is that a node only start pubsub message handling when it has finished syncing beacon skeleton. This also have some correlation with the consensus initialize. I will leave these topics to experts.
Option 2: Still use a centralized service - provide a set of trusted nodes for latest shard state to get bumped in.
2.5.2 Stream Serialization


If both protobuf and rlp works, we would prefer protobuf.
But there do exist some well-defined RLP rules. In this case, we are still using RLP.

2.5.3 Migration plans

Backward compatibility have to be enabled for new network structure. Thus in order to enable backward compatibility, we need to:

Still enable the GRPC sync service in DNS host's end.
Remove all GRPC client sync service. (DoSyncing and DoBeaconSyncing)
Keep the existing Pubsub message content.
Add the new sync logic to replace GRPC syncing.

After these steps, there should be any compatibility issues.
	Pubsub (side chain)	Pubsub (beacon chain)	ActiveSync	PassiveSync	BeaconSync
Provide Data	False	/	False	False	False
Advertise	False	/	False	False	False
Consume	True	True	True	False	True if validating side
Num streams	N/A	N/A	32-64	0	2-4