Skip to content

Instantly share code, notes, and snippets.

@dansku
Last active July 18, 2023 16:18
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save dansku/62491247a07b6b9127b6650d8aa29751 to your computer and use it in GitHub Desktop.
Save dansku/62491247a07b6b9127b6650d8aa29751 to your computer and use it in GitHub Desktop.
Helium ETL Setup Tutorial

Setting up a Helium ETL

This documents explains a quick intro how to setup your own ETL server for the Helium Blockchain. Running an ETL is not an easy task, but hopefully this document will help you get started, and by the end you will have the blockchain up and running.

Server Requirements

  • 2TB SSD (NVME prefered) disk

Install and configure PostgreSQL + PostGis

Install required packages

sudo apt -y install postgresql-12 postgresql-client-12 postgis postgresql-12-postgis-3 postgresql-12-postgis-3-scripts

Create the ETL database and user

Here you can change the user, password, and database names.

sudo su - postgres psql
CREATE DATABASE etl;
CREATE USER etl WITH ENCRYPTED PASSWORD '{PASSWORD}';
GRANT ALL PRIVILEGES ON DATABASE etl TO etl;

Enable access to the database via md5 password with editing /etc/postgresql/12/main/pg_hba.conf the file, change:

#change 
local   all             all                                     peer

#to 
local   all             all                                     md5

Install and configure Blockchain-ETL

You must use Erlang version 22, newer versions won't work! (The data bellow is taken from the official repository.)

Installing Erlang 22 on Ubuntu

wget https://packages.erlang-solutions.com/erlang-solutions_2.0_all.deb
sudo dpkg -i erlang-solutions_2.0_all.deb
sudo apt-get update
sudo apt install esl-erlang=1:22.3.4.1-1 cmake libsodium-dev libssl-dev build-essential

Install rust

Go to rustup.rs and follow the steps from the website.

Close the blockchain-etl repository

  • Clone the blockchain-etl repository to your server.

  • Create .env file by copying .env.template and editing it to reflect your postgres and other keys and credentials

  • Run make release in the top level folder

  • Run make reset to initialize the database and reset the ledger. You will need to run a make reset every time the release notes indicate to do so. This should be very rare. .

    Running a make reset will keep the existing downloaded blocks but replay the ledger so the application can re-play the blocks into the database. Again, only do this when indicated in the release notes since a replay can take a long time.

  • Run make start to start the application. Logs will be at _build/dev/rel/blockchain_etl/log/*.


Start sync from scratch

If you configured the .env file from the blockchain-etl correctly, you can run the make start from within the blockchain-etl folder and the sync should start. You can follow the process by following the console.log with tail -F ~/blockchain-etl/_build/dev/rel/blockchain_etl/log/console.log

Keep in mind that syncing from the beginning can take quite some time.

Setting up from the DeWi Snapshots

If you want to have things up and running faster, you can download the snapshots provided by DeWi. There are two files to download, one is the postgres database dump and another one is blockchain-etl database dump.

Go to etl-snapshots.dewi.org and download the database_snapshot, which is the postgres db and etl_snapshot, which is the blockchain-etl database ready.

etl_snapshot

After downloading the file, just unzip it to the folder you want to run it from. From within the file, edit the .env file to point the postgres database.

database_snapshot

After dowloading the database_snapshot, unzip it somewhere on your server, it will be a big file. And import it with the command pg_restore -d etl -U postgres -W -Fd folder/, changing the folder address to the path of the unzipped files.

This will take a few hours to finish, and once it's done importing you can start the make start from the blockchain-etl folder.


PS: This is a work-in-progress file, but you can hit me up on the helium discord to @spillere with improvements and suggestions.

Updating the blockchain-etl

To update the blockchain-etl to the latest version, go to its folder and run

git pull
make stop
make release
make migrations
make start`

This will download the latest update, build the software, run any migrations if there are any and start it again.

Fixes

./blockchain_etl backfill gateway_payers
./blockchain_etl backfill location_geometry
./blockchain_etl backfill gateway_location_hex

Suggested Postgres Configuration

Thanks @mfalkvidd for the info.

shared_buffers = 4GB                    # min 128kB, default 128MB. Not recommended to set larger than 25% of the server's total ram .
work_mem = 4GB                          # min 64kB, default 4MB. Not recommended to set larger than 25% of the server's total ram .
maintenance_work_mem = 4GB              # min 1MB, default 64MB. Not recommended to set larger than 25% of the server's total ram .
checkpoint_timeout = 120min             # range 30s-1d, default 5min
max_wal_size = 2GB                      # default 1GB
fsync = off                             # flush data to disk for crash safety, default=on
@the-wildcard
Copy link

the-wildcard commented Nov 3, 2021

Some learnings:

With the current blockchain size (2021-11-03), you need a minimum of 2TB of SSD
If you get errors that mention a CMakeCache mismatch when trying to update to a new release::
CMake Error: The source "/home/ubuntu/root/blockchain-etl/_build/default/lib/h3/c_src/CMakeLists.txt" does not match the source "/root/blockchain-etl/_build/default/lib/h3/c_src/CMakeLists.txt" used to generate cache. Re-run cmake with a different source directory. ===> Hook for compile failed!
You need to run rm -rf _build/default/lib/h3 to fix

@dansku
Copy link
Author

dansku commented Nov 4, 2021

Indeed the network is growing quite fast!!

@mfalkvidd
Copy link

I got the following error:

pg_restore: connecting to database for restore
Password:
pg_restore: creating EXTENSION "pg_trgm"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3; 3079 50360079 EXTENSION pg_trgm (no owner)
pg_restore: error: could not execute query: ERROR:  permission denied to create extension "pg_trgm"
HINT:  Must be superuser to create this extension.
Command was: CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA public;

when running

pg_restore -v -j6 -d etl -U etl -W -Fd export/database/

Seems to work better with the postgres user:

pg_restore -v -j6 -d etl -U postgres -W -Fd export/database/
               change here--^

@the-wildcard
Copy link

I got the following error:

pg_restore: connecting to database for restore
Password:
pg_restore: creating EXTENSION "pg_trgm"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 3; 3079 50360079 EXTENSION pg_trgm (no owner)
pg_restore: error: could not execute query: ERROR:  permission denied to create extension "pg_trgm"
HINT:  Must be superuser to create this extension.
Command was: CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA public;

when running

pg_restore -v -j6 -d etl -U etl -W -Fd export/database/

Seems to work better with the postgres user:

pg_restore -v -j6 -d etl -U postgres -W -Fd export/database/
               change here--^

Had the same issue here. I made etl a superuser during the initial restore to get around it.

@dansku
Copy link
Author

dansku commented Nov 20, 2021

Great, I will update to use postgres user

@mfalkvidd
Copy link

I ran pg_restore against a postgres database with default settings. After 3 days, it had created 9 of 25 indexes. I cancelled the restore, dropped the database and tweaked the postgres settings. Ran the restore again. It completed in 44 hours.

These are the settings I tweaked (in /var/lib/postgresql/data/pgdata/postgresql.conf ):

shared_buffers = 4GB                    # min 128kB, default 128MB. Not recommended to set larger than 25% of the server's total ram .
work_mem = 4GB                          # min 64kB, default 4MB. Not recommended to set larger than 25% of the server's total ram .
maintenance_work_mem = 4GB              # min 1MB, default 64MB. Not recommended to set larger than 25% of the server's total ram .
checkpoint_timeout = 120min             # range 30s-1d, default 5min
max_wal_size = 2GB                      # default 1GB
fsync = off                             # flush data to disk for crash safety, default=on

postgres needs to be restarted for the changes to take effect. Set fsync=on after the import has completed to ensure data integrity.

My hardware, for reference:

11th Gen Intel Core i5-11400 12 cores
64GB DDR4
SAMSUNG MZVLB512HAJQ-000L7 500GB NVME (40GB boot, 435GB lvmcache)
2x TOSHIBA HDWD260 6TB 5400rpm

The lvmcache had roughly 50% read hits and 50% write hits during the import.

@dansku
Copy link
Author

dansku commented Nov 30, 2021

Great addition @mfalkvidd, I will add it to the docs.

@disk91
Copy link

disk91 commented Dec 3, 2021

In the documentation, there are missing steps to add the postgress repository

wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" |sudo tee  /etc/apt/sources.list.d/pgdg.list
apt update

Version of ubuntu / debian used should also good to know as Ubuntu 21 does not allow to install erlang easiliy (https://elixirforum.com/t/install-fails-for-ubuntu-21-04/39596/7)

@disk91
Copy link

disk91 commented Dec 10, 2021

As the pg_restore is really long (over 24h for me) you should propose a restore command with more trace on terminal

pg_restore -l -d etl -W  -f list.toc -Fd .

I'm not sure of that line as the import is already in progress and I don't really want to restart a 24h process

@disk91
Copy link

disk91 commented Dec 10, 2021

2G on max_wal_size seem to be not enough

LOG:  checkpoints are occurring too frequently (9 seconds apart)
HINT:  Consider increasing the configuration parameter "max_wal_size".

@dansku
Copy link
Author

dansku commented Dec 10, 2021

@disk91 the import time is indeed takes a long time. 2GB wal size is fine.
Could improve, but for now I believe it's ok.

@disk91
Copy link

disk91 commented Dec 11, 2021

The dewi blockchain-etl dump is not compatible with Ubuntu LTS
The procedure to include the data from the dump into the previously compliled ETL would be great

@disk91
Copy link

disk91 commented Dec 11, 2021

You should also add a line to select the branch instead of head that could be unstable

git fetch
git checkout 1.1.129
make release 

currently

@disk91
Copy link

disk91 commented Dec 11, 2021

I also had to change the ulimit for opening files when running blockchain-etl

ulimit -Sn 65536

@pimboli
Copy link

pimboli commented Jun 30, 2022

Hi,

I'am trying to setup a full node, downloaded the 2 backups, imported db, started blockchain-etl (backup one)
but after some time I have crash with 2 error messages:

2022-06-30 23:15:54.066 [error] emulator Error in process <0.15453.2> on node 'blockchain_etl@127.0.0.1' with exit value:
{badarg,[{ets,lookup,[blockchain_swarm,shutdown],[]},{libp2p_swarm,is_stopping,1,[{file,"/root/blockchain-etl/_build/default/lib/libp2p/src/libp2p_swarm.erl"},{line,94}]},{l
ibp2p_transport,connect_to,4,[{file,"/root/blockchain-etl/_build/default/lib/libp2p/src/libp2p_transport.erl"},{line,100}]},{libp2p_swarm,dial_framed_stream,7,[{file,"/root/
blockchain-etl/_build/default/lib/libp2p/src/libp2p_swarm.erl"},{line,350}]},{libp2p_group_worker,'-dial/7-Dial/1-0-',7,[{file,"/root/blockchain-etl/_build/default/lib/libp2
p/src/group/libp2p_group_worker.erl"},{line,642}]},{libp2p_group_worker,'-connecting/3-fun-0-',7,[{file,"/root/blockchain-etl/_build/default/lib/libp2p/src/group/libp2p_grou
p_worker.erl"},{line,274}]}]}

and

2022-06-30 23:16:56.451 [error] <0.1341.0> Supervisor be_sup had child db_follower started with blockchain_follower:start_link([{follower_module,{be_db_follower,[{base_dir,"
data"}]}}]) at <0.1571.0> exit with reason reached_max_restart_intensity in context shutdown
2022-06-30 23:16:56.462 [warning] <0.1418.0>@blockchain_event:terminate:108 terminating remove_handler
2022-06-30 23:16:56.462 [warning] <0.1418.0>@blockchain_event:terminate:108 terminating remove_handler
2022-06-30 23:16:56.462 [error] <0.1389.0> gen_server <0.1389.0> terminated with reason: killed
2022-06-30 23:16:56.462 [error] <0.1875.0> gen_statem <0.1875.0> in state established terminated with reason: {killed,handle_event_function}
2022-06-30 23:16:56.467 [info] <0.1128.0> Application blockchain_etl exited with reason: shutdown

(if I start blockchain-eth with fresh install, problem does not appear)

Would be amazing if someone of you would have a hint for me :)

cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment