Skip to content

Instantly share code, notes, and snippets.

@chimera-defi
Last active September 9, 2021 12:41
Show Gist options
  • Save chimera-defi/efd86249461dd92bae1951303d6b3a04 to your computer and use it in GitHub Desktop.
Save chimera-defi/efd86249461dd92bae1951303d6b3a04 to your computer and use it in GitHub Desktop.
Prysm benchmarks

Benchmarks for Prysm 1.4.4 undergoing initial sync

This document shows Prysm benchmarks on disk & network I/O and CPU & memory usage undergoing initial sync.
Total time taken to sync up to the current epoch and slot as of Sept 9th, to slot 2028735 was approximately 24 hours.
We tried a few different settings but overall found default is the best.
The final state size of the beacon chain was ~25GB.
Overall it seemed that the block-batch-limit had the most effect on the sync speed and success & error rate.
Error rate seemed to increase as we got closer to the head of the chain.
We recommend reducing the block-batch-limit to 32 from the default of 64 near the chain head if the error rate is very high.

Benchmarks from 2 settings are documented below.

Environment

Benchmarks were carried out a heavily overprivision instance.
An AWS EC2 m5.2xlarge was used, with 4 vCPUs and 32GB of RAM.
Additionally for storage we used a AWS GP3 EBS SSD with 8000 IOPS and 500mb/s throughput.

Settings - Default

This is the default settings for Prysm.

Network

Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
Average:         ens5    174.07    136.55    143.83     18.41      0.00      0.00      0.00      0.00

Disk I/O

Average:          DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
Average:     dev259-0    406.37      0.00   4136.79      0.00     10.18      0.01      0.92      4.80

CPU & Memory snapshot via top

  17602 ubuntu    20   0   10.6g   2.6g   1.0g S 354.5   8.3   8:44.38 beacon-chain-v1

Blocks per second

Observation length: ~1hr

[2021-09-08 12:32:08]  INFO initial-sync: Processing block batch of size 31 starting from  0x860d9237... 893313/2023358 - estimated time remaining 202h31m1s blocksPerSecond=1.6 peers=9

[2021-09-08 13:27:10]  INFO initial-sync: Processing block batch of size 64 starting from  0x6680b10c... 981344/2023633 - estimated time remaining 10h10m35s blocksPerSecond=28.4 peers=47

Blocks per second varied from 12-32. Cumulative average was around 24 blocks per second.

Settings - theoretically optimal

This setting showed the best consistent blocks per second of 50 in the streaming output but cumulative average was 27. The theory was:

  • High slots-per-archive-point to reduce disk I/O and storage space requirements, and prevent any halting for disk writes
  • Higher p2p-max-peers to improve chances of getting new blocks quickly
  • High block-batch-limit which showed a consistent 50 blocks per second being processed, better than a higher limit of 786 and a lower limit of 384.
  • Increase max-goroutines to benefit from any possible concurrency boost
block-batch-limit: 512
p2p-max-peers: 500
slots-per-archive-point: 8192
max-goroutines: 40000

Network I/O

Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
Average:         ens5    932.33   1055.31    107.40    168.37      0.00      0.00      0.00      0.00

Disk I/O

Average:          DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
Average:     dev259-0    451.45      0.00   4678.72      0.00     10.36      0.01      0.95      4.93

CPU & Memory snapshot via top

15329 ubuntu    20   0   10.6g   6.9g   4.7g S 223.3  22.4 810:29.98 beacon-chain-v1

We saw a CPU usage of ~223-400, approxiomately 25-100% of the available capacity. This was more or less the average.
We saw a memory footprint of 6.9GB used on average. This is about 3x what was used with the default configuration.

Blocks per second

We observed this setting for around 4 hours. While most streaming outputs claimed a 50 blocks per second were being processed, the real rate was 27 blocks per second.
Attribute this to more errors with the higher block batch limit with block batches not being processed due to a "no good block in batch" error.

Errors

There were 2 types of errors that popped up.

  1. No good block in batch - This would make the syncing process discard the block of batches it had just fetched and retry for a new batch. This was more visible with higher block-batch-limit
  2. No parent found in DB for node ... - A slighlty more cryptic error. Sometimes blocks would be fetched and then discarded if a parent node was not found in the database. This seemed independent of any settings but may be related to having a high peer count.

New beaconchain.db state size:

-rw------- 1 ubuntu ubuntu 24804524032 Sep  9 06:43 beaconchain.db

~23.1GB

Old DB state size:

-rw------- 1 ubuntu ubuntu 180844036096 Sep  9 06:44 beaconchain.db

~168.4GB

Benchmarks for Prysm 1.4.4 undergoing normal functioning

The following are benchmarks via sar for a normal Prysm node.
A beacon-chain is running, connected remotely to Alchemy with an Infura backup.
A validator is running with 500 validating keys.

CPU

Via sar -u 3

Linux 5.11.0-1017-aws (ip) 	09/09/21 	_x86_64_	(4 CPU)

09:26:57        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all     63.60      0.00      2.28      0.64      0.00     33.48

Network

Via sar -n DEV 3

Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s   %ifutil
Average:           lo     18.61     18.61      9.26      9.26      0.00      0.00      0.00      0.00
Average:         ens5   5289.61   3459.83   1927.83   1722.38      0.00      0.00      0.00      0.00

Disk

Via sar -d 3

Average:          DEV       tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util
Average:     dev259-0     24.25      3.51   1947.41      0.00     80.46      0.05      2.16      1.52

Memory

Via sar -r 3

 09:34:39    kbmemfree   kbavail kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
 Average:      8260245  12685871   3022840     18.79     27217   4588160   4652009     28.91    276959   7255164       154

Summary

Based on these observations:

  • We will continue to use a m5.xlarge instance and will look into changing to a m5a.xlarge instance. A t4g instance was considered as it may reduce expenditure but it will cost more if average CPU utilization is above 40-50% which seems to be the case here
  • Due to the high CPU usage, we will continue to use 16GB of memory as they are linked and related to the instance type. Otherwise 8GB of memory would suffice.
  • A AWS GP2 EBS SSD is sufficient with an IOPS of 500-768 (size dependent)
  • Network I/O never seems to exceed 10mbps so the regular uplink on AWS of upto 10Gbps is more than sufficient
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment