Skip to content

Instantly share code, notes, and snippets.

@nategraf
Last active September 24, 2020 02:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nategraf/2fc2ef0e1a11da09da00cd7694c86da6 to your computer and use it in GitHub Desktop.
Save nategraf/2fc2ef0e1a11da09da00cd7694c86da6 to your computer and use it in GitHub Desktop.
Investigation into Celo Network Instability on September 22, 2020

Investigation notes for September 22, 2020

Which blocks were delayed and why?

  • 2647788 incurred 4 round changes
    • Proposer 0x90684BC3ded2f69d8853D791bDC57eEa0a84C9d0 skipped
    • Proposer 0x2765162CC4ad257a956F0411675Bc45257E6CB30 skipped
    • Proposer 0x81c885A48Fa2e7D3609E98b44f5Ca1080dD43626 skipped
    • Proposer 0x89457B24524c94f97C0cC03F794c3e80B3DFD30B skipped
  • .....90 incurred 2 round changes
    • Proposer 0x4CB90Ebba92141eD3021F5dC4e6C8bb642095846 skipped
    • Proposer 0x53a5C63E4655dA36364614A1A0a1bC095f251Fa0 skipped
  • .....91 incurred 1 round change
    • 0xd3b544dD9DE0B5D935453418E581Fa6A2A8f33A7
  • ....811 incurred 1 round change
    • 0x31380a55b22A375dAC0183717e14aeA42F3D2419
  • .....15 ...
    • 0x6e751D71387cbA9e531934A24187e3dCa8bDF187
  • .....16 ...
    • 0xe922f3c6A729242B4067460aCBCD9AE0Be7e8Dcd
  • .....18 ...
    • 0x2Eb79345089cA6F703F3b3C4235315CbeAaD6D3C
  • .....20 ...
    • 0x9778c47192E8875EfB2da2f31081Faae574f96e2
  • .....25 ...
    • 0x8210b17Bf48A805D9a799eB71D3277E7a1f4C617

Interestingly, block 2647788 experience 4 round changes and took 35 seconds to finalize. It was finally proposed by the 5th proposer. It was signed by 2 of the 4 that missed their own opportunities to propose. It's difficult to tell if the bottleneck was because the proposers went offline, or if a quorum was not available to issue prepare messages and so none of the proposals that did get made stuck.

After the initial delayed block, the 9 validators listed above as having missed a chance to propose a block also appears not to be signing blocks for some time after the initial incident. Each was one of 16 validators that experienced a more than 1 minute of non-signing, and one of 12 that experienced 2 minutes of non-signing, as shown in https://cauldron.pretoriaresearchlab.io/block-map

Notes on the transactions

This incident was triggered by the submission of 4038 transactions by EOA 0x14E56CE52c7b72Fc9C906b62BC1335a026bbCB97 to contract 0x69ef304B255449855EcAe5466d721D45387Ad2D5 with input 0xa68a76cc.

Here is a representative sample: https://explorer.celo.org/tx/0xba4abce26598334ee9cbd7e0cc65766ceb5dcc3b47a79d5ff25802cba9d06d52/internal_transactions

Each of the transactions:

  • Called function selector 0xa68a76cc with no arguments.
  • Resulted in the creation of a contract.
  • Used 324301 gas at 10 Gwei with a limit of 2M gas.

The first of the transactions was submitted with block 2647789 (Likely sent to the transaction pool during the consensus on block 2647788) and the last of the transactions was included in block 2648033. This is a period of 244 blocks, which is 20.3 minutes at 5 seconds per block. The last of these transaction took a total of 19.5 minutes to be mined. Maximally full blocks in the interim used 8107525

Note: Although all transactions were submitted very quickly at the start of the incident period, and each block could contain up to 30 of these transactions, each block instead contained at most 25. One reason for this may have been that because the gas limit was 2M, even though the transaction took 300k, the validator nodes did not attempt to include it, as it's maximum usage would have exceeded the gas limit for the block. As a result, although these transactions could have been cleared in 137 blocks (~12 minutes) the actual time was closer to 20 minutes. Additionally some blocks contained no transactions at all during this period. It is unclear why this is the case.

Note: In this period, 17 other transactions occurred, 6 of which were from the same address and 11 were from other addresses.

Based on the transaction trace, the result of each of these transactions was the creation of a contract. (i.e. The called method is a factory) An example of the contract created is here https://explorer.celo.org/address/0x4f7702d47521e17502da3895fa10daf640d29410/transactions

Timeline of validator recoveries

Block Event
2647788 Delayed block
802 All but 12 validators have returned to normal signing
865 0xFFbCF262C1d5c4392ef469BA79f2CD195d2AffDa recovers
870 0x4CB90Ebba92141eD3021F5dC4e6C8bb642095846 recovers
972 0x3ed95D6D4Ce36Ea7B349cD401e324316D956331a recovers
8010 0x2Eb79345089cA6F703F3b3C4235315CbeAaD6D3C recovers
33 Last of the 4k transactions is mined.
259 0x9778c47192E8875EfB2da2f31081Faae574f96e2 recovers
352 0x8210b17Bf48A805D9a799eB71D3277E7a1f4C617 recovers
504 0x31380a55b22A375dAC0183717e14aeA42F3D2419 recovers
616 0xd3b544dD9DE0B5D935453418E581Fa6A2A8f33A7 recovers
638 0xA90640bF05711e5674daCF5A4368Aa6348031e01 recovers
700 0xe922f3c6A729242B4067460aCBCD9AE0Be7e8Dcd recovers
50278 0x53a5C63E4655dA36364614A1A0a1bC095f251Fa0 recovers
4061 0x6e751D71387cbA9e531934A24187e3dCa8bDF187 recovers. Block signed by all validators.

After the initially delayed block, 22 validators experienced some initial instability in signing for at least 2 blocks. After this some experienced instability for up to 1 minute, then recovered to normal signing rates excluding blocks which experienced a round change due to an offline proposer.

Note: For 244 blocks, gas usage, and thus block processing overhead for consensus remained high, around 8M gas for many blocks. Signature percentages returned to close to normal fairly quickly, within 12 blocks. It seems unlikely then that network instability was related to block processing overhead, and instead it must be related to the initial shock of transaction being added to the transaction pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment