Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save fmoledina/bd1de4f71e7d81ef7bd70e7f37f56fe6 to your computer and use it in GitHub Desktop.
Save fmoledina/bd1de4f71e7d81ef7bd70e7f37f56fe6 to your computer and use it in GitHub Desktop.

Geth Archive Node using Rocket Pool Docker Setup with lvmcache

Overview

The storage requirements for a geth archive node with a full sync from genesis are >10TB. The initial sync of the chain requires high IOPS that are usually provided by flash storage such as SSDs or NVMEs to process each block. As the archive node stores the entire state of the chain, the space requirements are drastically higher than a regularly-pruned full node.

For many users, having >10TB of SSD or NVME storage is not feasible. It is conceivable that the vast majority of state data is at rest on the node, and as such may benefit from a tiered cache approach where enough IOPS are provided for syncing, with a large HDD backend for ultimate data storage.

Method

This method uses lvmcache with two physical disks:

  • 1 TB NVME - /dev/disk/by-id/nvme-1TB - used for the cache volume
  • 20 TB HDD - /dev/disk/by-id/ata-HDD - used for the origin, or data volume

N.B. to reap the benefits of the NVME IOPS during syncing, which includes a significant number of writes to the drive, the node will use writeback cache mode. This method delays the writes from the cache volume to the origin. The loss of the cache device may result in the loss of data.

Create the lvmcache volume and ext4 filesystem

Create physical volumes (PV) using each disk, create a volume group (VG) including both disks, and create logical volumes (LV) for the origin and cache volumes, assigning them to their respective disks.

Create PVs:

sudo pvcreate /dev/disk/by-id/nvme-1TB
sudo pvcreate /dev/disk/by-id/ata-HDD

Create VG called ethvol:

sudo vgcreate ethvol /dev/disk/by-id/ata-HDD /dev/disk/by-id/nvme-1TB

Create origin LV called rpdata_lv on the HDD using 100% of its capacity:

sudo lvcreate -n rpdata_lv -l +100%FREE ethvol /dev/disk/by-id/ata-HDD

Create cache volumes on the NVME. This consists of cache data and metadata volumes that are added to a cache pool. The cache metadata volume will be 500 MB, and the cache data volume will use (almost) the remaining space. (The lvcreate for cache_data_lv uses -l +99%FREE as the subsequent conversion to a cache pool requires some additional space.)

sudo lvcreate -n cache_metadata_lv -L 500M ethvol /dev/disk/by-id/nvme-1TB
sudo lvcreate -n cache_data_lv -l +99%FREE ethvol /dev/disk/by-id/nvme-1TB

Create the cache pool.

sudo lvconvert --type cache-pool --poolmetadata ethvol/cache_metadata_lv ethvol/cache_data_lv

Add the cache pool to the origin HDD volume.

sudo lvconvert --type cache --cachepool ethvol/cache_data_lv ethvol/rpdata_lv

The default cache mode is writethrough. Convert to writeback.

sudo lvchange --cachemode writeback ethvol/rpdata_lv

Create ext4 filesystem on the rpdata_lv volume, using the Rocket Pool documentation as a guide.

sudo mkfs.ext4 -m 0 -L rocketarchive /dev/ethvol/rpdata_lv

Grab the UUID for the created filesystem.

sudo blkid | grep rpdata_lv ### grab the UUID

Edit /etc/fstab to include the new volume to be mounted at /mnt/rpdata:

### /etc/fstab
/dev/disk/by-uuid/<uuid from blkid>  /mnt/rpdata  ext4  defaults  0  0

Mount the drive

sudo mount /mnt/rpdata

Check IOPS. This operation should just hit the cache drive and the IOPS should be equivalent to testing the NVME disk directly.

cd /mnt/rpdata
sudo fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

Results:

test: (groupid=0, jobs=1): err= 0: pid=1957977: Sun Aug 21 11:21:10 2022
  read: IOPS=29.4k, BW=115MiB/s (120MB/s)(3070MiB/26731msec)
   bw (  KiB/s): min=   32, max=229088, per=100.00%, avg=117628.15, stdev=63351.55, samples=53
   iops        : min=    8, max=57272, avg=29406.96, stdev=15837.93, samples=53
  write: IOPS=9825, BW=38.4MiB/s (40.2MB/s)(1026MiB/26731msec); 0 zone resets
   bw (  KiB/s): min=    8, max=76816, per=100.00%, avg=39310.42, stdev=21235.16, samples=53
   iops        : min=    2, max=19204, avg=9827.45, stdev=5308.75, samples=53
  cpu          : usr=6.83%, sys=33.93%, ctx=98809, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Change Docker Storage Location

At this point, the instructions simply follow the Rocket Pool documentation for a Docker install. The new mountpoint /mnt/rpdata will be used for Docker data and can be set up using the instructions at Configuring Docker's Storage Location.

Edit /etc/docker/daemon.json to include the following:

{
    "data-root": "/mnt/rpdata/docker"
}

Create the Docker dir and restart the Docker daemon:

sudo mkdir /mnt/rpdata/docker
sudo systemctl restart docker

Configure Rocket Pool

Finally, configure the Rocket Pool Smartnode to run geth in archive mode by providing the --syncmode full --gcmode archive flags to geth in the Execution Client settings.

Rocket Pool Smartnode Config

Conclusion

At the time of this writing, the archive node sync is in progress.

Geth logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment