Skip to content

Instantly share code, notes, and snippets.

@JackyWYX
Last active June 2, 2020 23:52
Show Gist options
  • Save JackyWYX/d994b19a7607a450d41fa00c6ccdc30a to your computer and use it in GitHub Desktop.
Save JackyWYX/d994b19a7607a450d41fa00c6ccdc30a to your computer and use it in GitHub Desktop.
Harmony fast sync research

HMY consensus walk through for fastSync

Description

In order to refactor the syncing module of the harmony code, several questions need to be answered:

  1. What is the current network structure?
  2. What is the role of crosslink in block header/body verification?
  3. What is the role of view change in block header/body verification?
  4. What is the role of slashing and how it interact with offchain data?
  5. What is the mechanism of cross shard transaction?
  6. What is the role of offchain data in block header/body verification (EVM data dependancy)?

Thus need to walk through all related logic and concern logics concerning fastSync. Repo commit hash: 607c71a5f6081194d2ebf46fc4a512094bc56426

1. Harmony network walk through

1.1 Node start up

  1. Setup the initial consensus (cmd/harmony/main.go:464)

  2. Start beacon syncing (cmd/harmony/main.go:724)

  3. Start shard chain syncing (cmd/harmony/main.go:763)

  4. Register and start services for validators (node/service_setup.go:69)

    1. Discover service (Network Info) (node/service_setup.go:20)
    2. Consensus service (for validators) (node/service_setup.go:27)
    3. Block proposal service (for leader) (node/service_setup.go:32)
  5. Start p2p pubsub message handler (node/node.go:396)

1.2 Network message types

1.2.1 Peer discovery (libp2p)

This service is used as node discovery and is started as NetworkInfo Service (node/service_setup.go:20). Both validator and explorer node will start this service. Both incoming and outgoing messages are handled and sent under this service.

  1. Connect to bootstrap peers then advertise during initialization (api/service/networkinfo/service.go:146)

  2. Advertise sa time ticks (api/service/networkinfo/service.go:199)

  3. Every 60 seconds, peer discovery of shard group id (api/service/networkinfo/service.go:207)

  4. Inform other modules about new peer event through s.peerChan (api/service/networkinfo/service.go:254) Note: This channel is never used.

1.2.2 PubSub messages

These messages are sent through libp2p pubsub, and are used for common messages of a certain shard. Note that here only messages about the shard that related is discussed. If received message from beacon chain and current syncing on shard chain, the message will not be process on consensus layer, Instead it will process through syncing logic.

  1. Consensus related messages

    • Block Proposal

      • Once the consensus is ready, constantly check whether self is leader and propose new block (node/node_newblock.go:52)
      • Send the proposed block to node.BlockChannel (node/node_newblock.go:69)
      • Proposed block handled in consensus loop and then announced with consensus message type MessageType_ANNOUNCE (consensus/construct.go:38, consensus/leader.go:78)
      • Message caught by client node.�HandleMessage and parse it to consensus for block preparation (node/node_handler.go:533)
    • Block prepare

      • After the validator received the announce message, verify the sanity of block proposal and sign. (consensus/validator.go:63)
      • Broadcast the signed prepared message to group (consensus/validator.go:73)
      • Message caught by node.HandleMessage and parse it to consensus (consensus/consensus_v2.go:84)
      • Only leader deal with the message. After received enough aggregated signature, send out message of type MessageType_PREPARED (consensus/threshold.go:72)
    • Block prepared

      • After the validator received the prepared message, execute transactions and sign. (consensus/validator.go:145, consensus/validator.go:205)
      • Send out the message with type MessageType_COMMIT (consensus/validator.go:211)
    • Block commit

      • Leader received and deal with commit message (consensus/consensus_v2.go:88)
      • After received enough signatures, add to aggregated signature and then parse the block to consensus.commitFinishChan after 2 seconds when collected enough signature or collected all signatures (consensus/leader.go:276)
      • Broadcast message with type MessageType_COMMITTED (consensus/consensus_v2.go:109)
      • After 8 seconds, leader start the next block announce session (consensus/consensus_v2.go:184)
    • Block Committed

      • Process and parse to consensus.
      • Commit new block (consensus/consensus_v2.go:300) and parse block to consensus.VerifiedNewBlock (consensus/consensus_v2.go:304)
      • Push new block through downloader sync to connected peers (api/service/syncing/downloader/client.go:109)
    • New View and View change message

      • This will be further discussed in part 3. Skipped for now
  2. Other common messages

    • transactions & Staking transactions
      • Upon the APIBackend received hmy_sendRawTransaction (internal/hmyapi/apiv1/transactionpool.go:222) or hey_sendRawStakingTransaction from HTTP requests, the transaction will be broadcasted to the shard. (node/node.go:263)
      • After receiving transactions for other nodes, the transaction will be added to transaction pool. (node/node_handler.go:180)
    • Blocks
      • When consensus is processed, the new block will be broadcasted to the whole network by leader and 1% of validators, specified by {nodeB, blockB, syncB}.
      • Nodes will receive new sync blocks from both beacon chain and shard chain.
      • On beaconChain new sync blocks, block will be directly inserted into beacon chain (node/node_syncing.go:167)
      • If it's on shardChain, do not do anything since it is already synced in consensus.
    • Slashes
      • During the commit phase in validator, double sign will be checked (consensus/leader.go:207)
      • If a double sign is discovered, send it to SlashChan (consensus/double_sign.go:109)
      • Double sign evidence is then composed and broadcasted (node/node.go:550)
      • Upon receiving the doubleSign, will append to consensus
    • Receipts
      • Receipt is broadcasted in two cases:
        1. During consensus done, leader and 1% of validators will broadcast all receipts.
        2. During RPC call RecendCx (internal/hmyapi/apiv2/blockchain.go:418)
        3. Other nodes received receipts, validate the receipt and add to node.pendingCXReceipts
    • CrossLink
      • This will be further discussed in chapter 2.

1.2.3 GRPC messages

GRPC connection handles the message for blockchain synchronization.

  1. Beacon chain

    Beacon chain has two goroutines to deal with chain message:

    • First way is to process through pubsub (listen passively)

      1. Listen to pubsub block message (node/node_syncing.go:165)

      2. Add the block message to lastMileBlock (node/node_syncing.go:171)

      3. Insert into chain in next process one by one (api/service/syncing/syncing.go:680)

    • Another one is to actively fetch from dns nodes as time ticks (actively)

      1. api/service/syncing/syncing.go:826
  2. Self chain

    • Server started at initialize (node/node_syncing.go:274), and is handled as GRPC protocol (api/service/syncing/downloader/proto/downloader.pb.go:354)
    • Go download and process the sync content. (api/service/syncing/syncing.go:801)

1.2.4 Sync / consensus interactions

Both sync and consensus logic will check for whether current is synced to latest block and inform each other. For syncing on beacon chain, only syncing is processed.

  1. Beacon syncing - NO interactions

    Since we do not run any consensus on beacon chain if validating side chain, So there is no interactions for beacon chain. Just do following 2 things:

    • Receive blocks from pubsub, insert it to last mile queue
    • Do sync every 60 seconds.
  2. Sync informing consensus

    • When client is doing sycning every 60 seconds, it will check out of sync status (node/node_syncing.go:246).
    • If the local height does not match remote height, inform consensus through syncNotReadyChan channel. After Sync finished, inform consensus through syncReadyChan
    • When consensus receive not ready signal, set Mode to Syncing. When received ready signal, recalculate and set the new mode. (consensus/consensus_v2.go:394, consensus/consensus_v2.go:400)
  3. Consensus informing sync

    • During consensus.onCommitted, if the difference of current block and consensus block is greater or equal to 2 blocks, set mode to syncing and trigger doSync logic (consensus/validator.go:238).
    • Improve suggestion: Should also inform sync when committing block?

2. Crosslink logic

Cross link is a special message type to link beacon chain and shard chain.

2.1 Logic walk through

2.1.1 Send out crosslink

  1. Crosslink message is construct and broadcasted by non-beacon leader at the time they receive a pubsub new block message from beacon chain. (node/node_syncing.go:174)
  2. The non-beacon leader will send out a certain number (3~6 for one shard) of crosslink data from headers whose crosslink is not committed (node/node_handler.go:295).
  3. The message is broadcasted to beacon chain through pubsub (node/node_handler.go:310)

2.1.2 Beacon validator handling crosslink

  1. Beacon validator node will handle the crosslink in node.HandleMessage (node/node_handler.go:50)
  2. Verify the crosslink against beacon shard state. (node/node_cross_link.go:112)
  3. Add the pending crosslinks to blockchain (core/blockchain.go:2081), which will finally saved at rawdb as offchain data.

2.1.3 Propose using pending crosslink

  1. The pending crosslink is only used when proposing new blocks for beacon leader and add to block proposal. (node/node_newblock.go:182)
  2. In beacon chain finalize, the reward for side chain will be distributed according to block crosslink data (internal/chain/reward.go:240)
  3. After the block is finally inserted into chain, the pending crosslink will be deleted during CommitOffChainData (core/offchain.go:172) and write to committed ones.

2.1.4 Committed crosslink

Note all validators have committed crosslinks since they all execute beaconchain.InsertChain.

  • Crosslink data is never deleted, and grows as chain become longer (no usage on DeleteCrossLinks core/blockchain.go:1905)
  • In step 2.1.1, the side chain leader is sending out the crosslink starting at last committed block.
  • In step 2.1.2, check crosslink in status pending against committed.
  • During consensus validator committing, check crosslink in status pending against committed.

2.2 Crosslink against fastSync

Offchain crosslink consist of two parts: pending and committed.

2.2.1 Pending crosslinks

  1. The pending crosslink is only used for new beacon block proposal. So if a node just finished sync and being selected as leader, he will have no pending crosslinks and thus make a block with empty crosslink.
  2. But the good thing is that side chain leader will constantly broadcast uncommitted crosslinks upon receiving blocks from beacon chain, thus the beacon leader will get the pending crosslink very soon.
  3. So there is nothing to do with pending crosslinks for beacon validator.

In other usages, in CommitOffchainData, when delete committed crosslinks from pending crosslinks, it will return an error. But it has already been silently dealt with (core/offchain.go:176). So no more modifications.

2.2.2 Committed crosslinks

  • The chain leader relies on the committed crosslink data to tell whether the crosslink has been processed or not. (node/node_newblock.go:182)
  • Thus it's necessary to sync all committed crosslink data to prevent double spending on crosslink.

If we do not commit crosslink data during fast sync, consider the following attack scene:

  1. Leader is fastSynced to the latest block.
  2. The attacker send an out-dated crosslink to the beacon pubsub. The crosslink is verified by blockchain long time ago so that the leader does not have that crosslink data.
  3. The crosslink is added to pending crosslinks by leader and created in new block proposal.
  4. If 2/3 validators have the data, the block proposal will be blocked. Else the history block is rewarded one more time.
  5. So in this attack scene, the attacker does not risk anything but get a potential benefit from double rewarded from the same block.

TODO when fastSyncing:

  • The crosslink data lives in header.
  • When inserting header, also commit the crosslink data.
  • Function update: HeaderChain.WriteHeader (core/headerchain.go:136)

3. View change

View change is the event that validators haven't finished committed block mining from leader until timeout 60 seconds.

3.1 Logic walk through

3.1.1 View change trigger

  1. Started by validators, check every 3 seconds, when find there is a consensus timeout, start view change (consensus/consensus_v2.go:380)
  2. Construct view change message with the new leader and send to the pubsub of the specific group ID (consensus/view_change.go:139)

3.1.2 Handle view change message

  1. All nodes received the message, but only the new leader deal with the view change message. (consensus/view_change.go:162)
  2. The new leader collect the signatures of the view change event. (consensus/view_change.go:370)
  3. Once gather enough validators, construct and send new view message (consensus/view_change.go:444) The new view message can have two type: one have block finalized and one don't have block finalized.
  4. If not have block finalized, in step 3 leader also inform new block channel for to enter the announce logic (consensus/view_change.go:386)
  5. Receiving view change message after sent out new view message, neglect upcoming view change messages (consensus/view_change.go:167)

3.1.3 Handle new view message

  1. All validators receive the new view message and handle it in consensus (consensus/view_change.go:471)
  2. If the new view message does not have a prepared block, switch to normal announce phase and wait for the leader to send the announce message. (consensus/consensus_service.go:176) (Early prepare phase)
  3. If the message have a block, after verification, send out the COMMIT message with signature (consensus/view_change.go:618) and switch to commit phase. Next will wait for leader COMMITTED message. (Late commit phase)

3.2 View change against fastSync

  • All view change related message is stored in memory. There is no offchain data.

  • Furthermore, a newly joined validator may not be aware of the view change event, but it will finally receive the view change message from other validators and finally align with other validators about the view change event.

  • No further actions needed for updating view change event against fastSync.

4. Slashing

Slashing is caused by a malicious validator double signing on different blocks.

4.1 Logic walk through

  • When the leader handles commit message from validators, it will check double sign (consensus/leader.go:207)
  • The leader will compare the commit signature against the signatures stored in PBFT log. If a double slashing is found, the leader will send out the slash candidate through pubsub (consensus/double_sign.go:109)
  • After the nodes received the slash candidate through pubsub, write both to memory and rawdb (core/blockchain.go:2077)
  • When the leader proposes a block, the slashes will be collected from memory and be added to block content. (node/node_newblock.go:216)
  • During nodes insertChain, the committed slashes will be removed from pending slashes (core/offchain.go:301)

4.2 Slashing against FastSync

4.2.1 Pending slashing

Pending slashes only works for leader. Given that the consensus is started after initializing consensus module for 30s, the newly fastSynced node should have enough time to catch up with the pending slashes. So there is nothing need to do for pending slashes.

4.2.2 Committed slashing

Unlike crosslinks, committed slashing doesn't need to be committed to rawdb since double spending is address in another way. If a validator is double slashing, he will be banned in validator wrapper in state, which can be done only once. So no need to save slashing record to rawdb. And neither to take any actions for fastSync.

5. Cross shard transactions and receipts

5.1 Overall logic walk through

5.1.1 Sending transaction

  • Transactions are sent through RPC and call backend AddPendingTransaction. (node/node.go:283)
  • Transaction is added to local tx pool, and then is broadcasted and handled by all validators in node handle message (node/node_handler.go:101)
  • Transaction adding to transaction pool waiting to be packed into a block (node/node.go:226)
  • Leader add transactions into a block when proposing a new block (node/node_newblock.go:122)

5.1.2 Miner/Validator work with transaction

Both will trigger ApplyTransaction logic.

  • For cross shard transactions, balance is subtracted on from shard, marked as SubtractionOnly (core/state_processor.go:163)
  • In EVM execution, the balance is only subracted. (core/evm.go:112)

Cross shard transaction has two phases:

  1. Slash balance in from shard.
  2. Add balance in to shard.

5.1.1 Slash balance in fromShard

  • Transactions are sent through RPC and call backend AddPendingTransaction. (node/node.go:283)
  • Transaction is added to local tx pool, and then is broadcasted and handled by all validators in node handle message (node/node_handler.go:101)
  • Both leader and validator will process the transaction on EVM with ApplyTransaction (core/state_processor.go:198)
  • If the transaction is cross shard, only from is subtracted. (core/evm.go:112)
  • The outgoing receipts hash root will be recorded in block header (core/types/block.go:364)
  • When writing state, the pending cross transaction receipt will be written to offchain data (core/blockchain.go:1187)

5.1.2 Add balance in toShard

  • Another API call is sent to node in fromShard from hmy-cli with the transaction hash. (hmy/api_backend.go:341)
  • The node check the tx against offchain data and add to local CxPool (hmy/api_backend.go:364)
  • All nodes will constantly take out the pending entries from CXPool, construct the proof (node/node_cross_shard.go:63) and then broadcast to toShard group (node/node_handler.go:470)
  • Leader in toShard received the receipt (node/node_handler.go:39) and verify the signature (node/node.go:320).
  • Leader add it to in memory structure node.pendingCXReceipts (node/node.go:362)
  • The leader propose new block with the pending CX receipts (node/node_newblock.go:242) and add balance to the CX receiver (core/state_processor.go:306). Receipts root are recorded in header (core/types/block.go:368)
  • The committed data write to offchain rawdb (core/offchain.go:53)

5.2 Cross shard transaction against fastSync

During fast sync, both incoming and outgoing cross shard receipts should be written to the offchain db to prevent double spending.

6. Offchain data

Here is a list of different types of offchain data:

  1. Block and headers
  2. Block additional offchain data
    1. ShardState: Epoch committee record.
    2. Block commit sig
    3. Epoch block number
    4. VRF, VDF
    5. BlockRewardAccumulator
  3. Crosslinks: See part 2, including pending and committed.
  4. CXReceipts: See part 5, including incoming, outgoing receipts and txLoopUp
  5. Transaction and receipts
  6. Staking related field
    1. Validator snapshot, validatorList
    2. ValidatorStats
    3. Delegator related fields

Whether these fields are written and how to write is discussed here.

6.1 Blocks and headers

Original Ethereum block and header fields. These fields should be kept written when committing block or header.

  • All contents in core/rawdb/accessor_chain.go should be remained same.

6.2 Additional offchain data

6.2.1 ShardState by epoch

  • Shard state holds all the signatures that is prerequisites for verifying header. (internal/chain/engine.go:200)
  • So the beacon chain shardState of epoches must be first fetched before verifying the shard block header.
  • Based on different version of headers, shardState might be requested differently.

6.2.2 Block commit sig

  • Block commit sig data is the sig map of consensus
  • Determined by whether it can be yielded from aggreg signature, we should decide whether client should ask for this data explicitly when syncing block body. Need further research.
  • Need to write to offchain db.

6.2.3 Epoch block number

  • Epoch can be fetched from block header
  • Commit epoch block number when insertHeaderChain

6.2.4 VRF, VDF

  • VRF and VDF can be achieved from header
  • Commit VRF and VDF when insertHeaderChain

6.2.5 BlockRewardAccumulator

  • Block reward accumulator is the accumulated reward so far which is used for calculate the reward for upcoming blocks
  • Currently, reward could be only obtained from state processing. Cannot obtained from any other approach which can be validated.
  • I would recommend to add a new field block reward in block header which requires a hard fork operation.

6.3 Crosslinks

As discussed in part 2:

  • Committed crosslinks need to be commited to db to prevent double spending.
  • Write crosslinks when insertHeaderChain

6.4 CXReceipts

6.4.1 incoming and outgoing CXReceipts

  • Committed incoming CXReceipts need to be written to db to prevent double spending.
  • Committed outgoing CXReceipts need to be written to db to provide CX query from other nodes.
  • Both need to be committed to rawdb for CX transactions finalized in block.
  • During fastSync, fetch both incoming and outgoing CXReceipts when fetching block body and commit to db. (core/rawdb/accessors_offchain.go:102, core/blockchain.go:2187, core/rawdb/accessors_indexes.go:180)
  • Question: Seems that incoming and outgoing CXReceipts are already stored in block. So why store the content again in rawdb? Can we store the index instead? (core/rawdb/accessors_offchain.go:102)

6.5 Transactions, staking and receipts

  • Transactions, stakings and receipts should be written to offchain db when insertReceiptChain

6.6 Staking related field

6.6.1 ValidatorSnapshot and ValidatorList

  • Currently validatorSnapshot and ValidatorList is updated as block grows. (Update these two fields as a new block added. core/blockchain.go:2588)
  • In order to do this for fastSync, we have two options.
  • Option 1: After synchronized the state, iterate all addresses to get all validators and write validatorSnapshot. This can be potentially costly.
  • Option 2: Add a validatorListHash in header field, which can be used to validate the validatorList. Client download and validate the validatorList from peer. Using the downloaded validator list to update validatorSnapshot. For this solution, a hardfork might be needed.

6.6.2 ValidatorStat

  • This field is for the APR calculation. In the current design, the calculation requires the validatorSnapshot in the last epoch.
  • So in fastSync design
    • We should also sync the validatorSnapshot in last epoch when we finish state syncing. This can be validated by the combination of validatorList and merkle proof of block in last epoch.
    • After synced the current state, validator list, validatorSnapshot in last epoch, start calculate the validatorStat.
    • After computed state, flush it to offchain db.

6.6.3 DelegationsByDelegator

  • When updating validatorSnapshot, we should also update the offchain data for delegator, which can be easily got from validatorSnapshot
  • core/rawdb/accessors_offchain.go:260

7. Conclusion

  1. In this document, the major process of code base is walked through. This doc shall wrap up the research phase of the project.
  2. The changes have to be made in current code base for fastSync adaption is investigated. Some may need extensive review.
    • A very important is to decide whether we should engage a hard fork for additional data fields in headers.
    • One field is the total block reward given in a block.
    • Second field is the hash of offchain validator list

TODO in design phase

  1. Layout the overall process design for the fast sync, which include
    1. Network design
    2. Chain logic change for fastSync
    3. Downloader server strategic
    4. Downloader client strategic
    5. Acceptance test scenarios
    6. Review this document
  2. Modulized feature list and interface defination
@rlan35
Copy link

rlan35 commented Jun 2, 2020

  1. What is the current network structure?
    A: Nodes join the p2p network and connect to up to hundreds of peers across 4 shards. Leader and validators in the same shard broadcast messages to each other for consensus. For cross-shard txns, crosslinks, beacon sync, the nodes will send data across shards to communicate the corresponding messages.

  2. What is the role of crosslink in block header/body verification?
    Crosslinks are added in the beacon chain block header and the verification of crosslinks is part of the block header verification. The header is not valid is the signatures on the crosslinks are not valid. However, for fast sync, the verification of crosslink can be skipped as long as the signatures on the block header is valid.

  3. What is the role of view change in block header/body verification?
    View change will only affect the view ID in the block header. The view ID should be increasing between blocks and that's the only criteria for verification of View ID (caveat is that in our current mainnet, due to initial view change bugs, some of the early blocks (< 1M in height) has view ID that's smaller than previous block.

  4. What is the role of slashing and how it interact with offchain data?
    Slashing record is put in the header and similar to crosslink, it could be verified in full sync, and skipped in fast sync.

  5. What is the mechanism of cross shard transaction?
    Mostly correct as you've written down based on the code.

  6. What is the role of offchain data in block header/body verification (EVM data dependancy)?
    The offchain data can be recalculated by traversing the whole state trie and processing all the validator wrapper information. So after a full state is sync'ed, the staking offchain data can be calculated.

@JackyWYX
Copy link
Author

JackyWYX commented Jun 2, 2020

Some comments on the 6th question.

  1. The validator list is one of the offchain data and can be calculated with a full state. But in order to get this, we will have to iterate over the whole entries in state and retrieve the validator snapshot. This might be extremely expansive. Do you think the following will be a reasonable update logic - adding a hash of validator list in header so a client finished full sync, directly ask host to give the validator list which can be validated against the validator list hash?
  2. The beacon chain total reward distributed (which is also in offchain data) cannot be simply obtained from a single full state. It is a result which can only be obtained by processing all historical beacon state. In order to address this, two options:
    1. Add the total reward distributed in header.
    2. Write this data under a specific address as a code (move this offchain data to onchain)
      Which idea do you prefer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment