Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Jiab77/4dc1f8bed339336e02b70c7b0b135a11 to your computer and use it in GitHub Desktop.
Save Jiab77/4dc1f8bed339336e02b70c7b0b135a11 to your computer and use it in GitHub Desktop.
Create an expandable beowulf cluster with SLURM, MPICH and MPI4PY

Create an expandable beowulf cluster with SLURM, MPICH and MPI4PY

The goal of this gist is to explain how I have build a Beowulf cluster based on Raspberry Pi models 3B / 3B+ and 4B.

The cluster will be expendable with other device types. Any working computer can be added to the cluster.

Adding the following packages: Ansible, SSHFS, Slurm, MPICH and MPI4PY will give me to possiblity to run parallel computing tasks using all cluster nodes.

The final goal will be to create a low cost computing platform that can be managed with a web interface that I will create soon. Adding remote clients and compute nodes is also a part of this project.

Thanks to Garret Mills for his article where this gist is based on, with some changes for my own needs.

In this gist I will start with Raspberry Pi nodes then add several different nodes later.

Potential use

I'll just list here the potential usage of this cluster:

Prepare everything

I've used the following materials:

  • 2x Raspberry Pi 4B (compute nodes)
  • 2x Raspberry Pi 3B+ (compute nodes)
  • 1x Raspberry Pi 3B+ (master / control node)
  • 1x Gigabit Ethernet Switch
  • 1x Serious heat cooling solution (otherwise you will just kill your RPis)

Recommended materials:

  • Stacking cases
  • USB Powerbank
  • PoE Hat
  • A really serious dedicated cooling solution per cluster nodes

Setup

To run Rosetta@Home workunits, it is required to use a 64 bit OS. In this case, I have used Ubuntu Server 18.04.4 ARM64 image.

As I want to be able to run as more as possible BOINC projects compatible with ARM platforms, I'll later add some ARM 32 bit nodes with the Ubuntu Server 18.04.4 ARMHF image.

Flash SD Cards

I have used gnome-disk-utility tool to flash the image on the SD cards.

You can use any other tool of your choice for doing same.

Decrease GPU RAM

This has to be done only on Raspberry Pi 3B+. It is not necessary for Raspberry Pi 4B.

Edit the config.txt boot file to decrease the memory allocated to the GPU in order to increase the memory allocated to the CPU. 😅

# When not running:
nano /boot/config.txt

# When already running:
sudo nano /boot/firmware/config.txt

Then add or change the gpu_mem config value to:

  • 16 32 MB for Raspberry Pi 3B/3B+ (moved back to 32MB for stability reasons)
  • 32 MB for Raspberry Pi 4B

I've seen people going to 8 MB on Raspberry Pi 3B/B+ but I won't do that and I prefer stay to the lowest value recommended per the documentation here: https://www.raspberrypi.org/documentation/configuration/config-txt/memory.md

I had to move back to 32 MB as it was very unstable.

[all]
arm_64bit=1
device_tree_address=0x03000000
gpu_mem=32

Save the file and reboot the nodes to apply the changes.

Increase RAM

Even if the Raspberry Pi 4B has enough RAM to be a good cluster node, it will help to have more workunits per nodes.

On the Raspberry Pi 3B+, it is necessary to enable Zram memory compression to increase the available memory size.

Now, let's go technical! 😁

Create the loading script:

sudo nano /usr/bin/zram.sh

And place this content:

#!/bin/bash

echo -e "\nExpanding available memory with zRAM...\n"
cores=$(nproc --all)
modprobe zram num_devices=$cores
modprobe zstd
modprobe lz4hc_compress

swapoff -a

totalmem=`free | grep -e "^Mem:" | awk '{print $2}'`
#mem=$(( ($totalmem / $cores)* 1024 ))
mem=$(( ($totalmem * 4 / 3 / $cores)* 1024 ))

core=0
while [ $core -lt $cores ]; do
    echo zstd > /sys/block/zram$core/comp_algorithm 2>/dev/null ||
    echo lz4hc > /sys/block/zram$core/comp_algorithm 2>/dev/null ||
    echo lz4 > /sys/block/zram$core/comp_algorithm 2>/dev/null
    echo $mem > /sys/block/zram$core/disksize
    mkswap /dev/zram$core
    swapon -p 5 /dev/zram$core
    let core=core+1
done

The zstd compression algorithm has been used for better performance results.

It might not be supported on all systems, that's why I've added some other compression algorithms.

Then save it with [Ctrl+O] and [Ctrl+X].

Make it executable:

sudo chmod -v +x /usr/bin/zram.sh

Then create the boot script:

sudo nano /etc/rc.local

And place this content:

#!/bin/bash

/usr/bin/zram.sh &

exit 0

Then save it with [Ctrl+O] and [Ctrl+X].

Make it executable:

sudo chmod -v +x /etc/rc.local

To finish, run the script to create the additional memory. To see the available memory and the compression stats, run the following commands:

# Manual start
sudo /usr/bin/zram.sh

# Show memory compression stats
zramctl

# Show available memory
free -mlht

Memory compression results

Here you can see the advantage of using Zram to create additional memory.

Raspberry Pi 4B

$ free -mlht
              total        used        free      shared  buff/cache   available
Mem:           3.7G        2.6G         54M        2.1M        1.1G        1.1G
Low:           3.7G        3.6G         54M
High:            0B          0B          0B
Swap:          3.7G         22M        3.7G
Total:         7.4G        2.6G        3.7G

$ zramctl
NAME       ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram3             947.9M  5.3M  1.3M  1.8M       4 [SWAP]
/dev/zram2             947.9M  5.2M  1.3M  1.8M       4 [SWAP]
/dev/zram1             947.9M  5.5M  1.3M  1.8M       4 [SWAP]
/dev/zram0             947.9M  5.5M  1.4M  1.9M       4 [SWAP]

This is with the standard LZO-RLE compression algorithm.

Raspberry Pi 3B+

$ free -mlht
              total        used        free      shared  buff/cache   available
Mem:           957M        720M         24M        2.1M        212M        219M
Low:           957M        933M         24M
High:            0B          0B          0B
Swap:          1.2G         47M        1.2G
Total:         2.2G        767M        1.2G

$ zramctl
NAME       ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram3             319.3M 11.5M    2M  2.4M       4 [SWAP]
/dev/zram2             319.3M 11.5M  2.2M  2.7M       4 [SWAP]
/dev/zram1             319.3M 11.7M  2.1M  2.6M       4 [SWAP]
/dev/zram0             319.3M 11.8M  2.1M  2.5M       4 [SWAP]

This is with the standard ZSTD compression algorithm.

Install required packages

In order to control all cluster nodes easily, I have used Ansible from my work computer. It will also be installed on the master node.

Master node

# Add Ansible repository
sudo apt-add-repository --yes --update ppa:ansible/ansible

# Install required packages for master node
sudo apt install wireless-tools wavemon bmon nmon boinctui ntpdate ansible sshfs slurm-wlm mpich python3-pip r-base

# Install required python3 dependency
pip3 install cython

# Install required python3 packages
pip3 install numpy mpi4py

Compute nodes

# Install required packages for compute nodes
sudo apt install wireless-tools wavemon bmon nmon boinc-client slurmd slurm-client ntpdate sshfs mpich python3-pip r-base

# Install required python3 dependency
pip3 install cython

# Install required python3 packages
pip3 install numpy mpi4py

# Add host entry to the master node
echo "[IP] rpi-master node01" | sudo tee -a /etc/hosts

Replace [IP] with the IP address of the master node.

Generate keys

To be able to access to each cluster nodes without password, we'll generate ssh keys for each nodes, including the master node.

# Generate ssh key (reply to all questions)
ssh-keygen

Once done, copy the cluster node key to the node:

# Copy the node key to the master node
ssh-copy-id -i .ssh/id_rsa.pub ubuntu@node01

Repeat this step for each cluster nodes.

Setup master node

Now that all required packages are installed, we can start the cluster configuration.

Define hostnames

To be able to manage the cluster, all hostnames needs to be added into the /etc/hosts file.

# Open the system hosts file
sudo nano /etc/hosts

# Add all nodes hostnames
[IP] rpi-master node01
[IP] rpi-4b-01 node02
[IP] rpi-4b-02 node03
[IP] rpi-3bp-01 node04
[IP] rpi-3bp-02 node05

Then save the file with [Ctrl + X].

Replace [IP] by the IP address of each nodes.

I've used two different hostnames:

  • The first one is related to the node itself
  • The second one is related to the cluster member id

Copy ssh key

To be able to access to each cluster nodes without password from the master node, we'll need to send the master node key to all other cluster nodes.

# Generate ssh key (reply to all questions)
ssh-keygen

Once done, copy the master node key to each cluster nodes:

# Copy the master node key to each cluster nodes
ssh-copy-id -i .ssh/id_rsa.pub ubuntu@node0X

Replace node0X by the real node name.

Repeat this step for each cluster nodes.

Create cluster filesystem

On the master node, create the shared folder:

# Create the shared folder
sudo mkdir -v /clusterfs

# Change permissions (define the permissions you want)
sudo chown -Rv nobody:nogroup /clusterfs
sudo chmod -Rv 777 /clusterfs

The defined permissions are very relaxed and you should not used on production networks.

They are used only to avoid permission issues during the setup procedure.

Configure Ansible

To manage the cluster I'll use Ansible. You can configure it that way:

# Open the ansible hosts file
sudo nano /etc/ansible/hosts

# Add cluster nodes hostnames
[rpis]
rpi-master ansible_python_interpreter=python3 ansible_user=ubuntu
rpi-4b-01 ansible_python_interpreter=python3 ansible_user=ubuntu
rpi-4b-02 ansible_python_interpreter=python3 ansible_user=ubuntu
rpi-3bp-01 ansible_python_interpreter=python3 ansible_user=ubuntu
rpi-3bp-02 ansible_python_interpreter=python3 ansible_user=ubuntu

[ghettocluster]
rpi-master
rpi-4b-01
rpi-4b-02
rpi-3bp-01
rpi-3bp-02

Then save the file with [Ctrl + X].

Some explanations:

  • The first host group [rpis] is used to define settings related to the host itself.
  • The second host group [ghettocluster] is the internal name of the cluster, it contains all cluster nodes.

Now you can test the cluster that way:

# Ping all nodes
ansible ghettocluster -m ping

# Show memory info from all nodes
ansible ghettocluster -a "free -mlht"

# Show hostname of all nodes
ansible ghettocluster -a "hostname"

# Synchronize time of all nodes
ansible ghettocluster -a "sudo ntpdate ch.pool.ntp.org"

# Check timesync of all nodes
ansible ghettocluster -a "date"

Configure Slurm

Before configuring Slurm, we need to copy the sample config first.

# Go to the Slurm config directory
cd /etc/slurm-llnl

# Copy the config sample archive
sudo cp /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz .

# Uncompress the config sample archive
gzip -d slurm.conf.simple.gz

# Rename the config sample file
mv slurm.conf.simple slurm.conf

Then open the slurm.conf file with sudo nano slurm.conf and set the following settings:

# Define master node
ControlMachine=rpi-master
ControlAddr=<node-ip>

# Configure scheduler algorithm
SelectType=select/cons_res
SelectTypeParameters=CR_Core

# Define cluster name
ClusterName=ghettocluster

# Add nodes
# Remove the sample config to the end
NodeName=node01 NodeAddr=<ip addr node01> CPUs=4 State=UNKNOWN
NodeName=node02 NodeAddr=<ip addr node02> CPUs=4 State=UNKNOWN
NodeName=node03 NodeAddr=<ip addr node03> CPUs=4 State=UNKNOWN
NodeName=node04 NodeAddr=<ip addr node04> CPUs=4 State=UNKNOWN
NodeName=node05 NodeAddr=<ip addr node05> CPUs=4 State=UNKNOWN

# Create cluster partition and add it all nodes
PartitionName=ghettocluster Nodes=node[02-05] Default=YES MaxTime=INFINITE State=UP

Replace <node-ip> by your master node IP address.

Save the file with [Ctrl + X].

Now we have to configure the cgroups kernel isolation. Create the file /etc/slurm-llnl/cgroup.conf and add the following content:

CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm-llnl/cgroup"
AllowedDevicesFile="/etc/slurm-llnl/cgroup_allowed_devices_file.conf"
ConstrainCores=no
TaskAffinity=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=no
ConstrainDevices=no
AllowedRamSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30

Save the file with [Ctrl + X].

Now we have to whitelist some system devices by creating the file /etc/slurm-llnl/cgroup_allowed_devices_file.conf and adding the following content:

/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/clusterfs*

Save the file with [Ctrl + X].

To finish, copy the configuration files to the shared storage:

# Copy cluster config
sudo cp -v slurm.conf cgroup.conf cgroup_allowed_devices_file.conf /clusterfs

# Copy authentication key
sudo cp -v /etc/munge/munge.key /clusterfs

More details about Munge:

Munge is the access system that SLURM uses to run commands and processes on the other nodes. Similar to key-based SSH, it uses a private key on all the nodes, then requests are timestamp-encrypted and sent to the node, which decrypts them using the identical key. This is why it is so important that the system times be in sync, and that they all have the munge.key file.

Now we can enable the required services:

# Enable and start Munge
sudo systemctl enable --now munge ; systemctl status munge

# Enable and start Slurm daemon
sudo systemctl enable --now slurmd ; systemctl status slurmd

# Enable and start Slurm control daemon
sudo systemctl enable --now slurmctld ; systemctl status slurmctld

Reboot if you are having problems with Munge authentication, or your nodes can’t communicate with the SLURM controller, try rebooting it.

Configure MPICH

Instructions will come soon.

Setup compute nodes

This section will detail required steps to prepare the compute nodes.

Configure hosts file

In order to be able to communicate all together, you need to define nodes hostnames in the file /etc/hosts.

# Open the system hosts file
sudo nano /etc/hosts

# Add all cluster nodes
[IP] rpi-master node01
[IP] rpi-4b-01 node02
[IP] rpi-4b-02 node03
[IP] rpi-3bp-01 node04
[IP] rpi-3bp-02 node05

The master node has two different hostnames:

  • The first one is related to the node itself
  • The second one is related to the cluster member id

Add shared storage

The actual shared cluster storage is hosted on the master node. I'll use the /etc/fstab file to mount the shared storage at boot.

# Create the shared folder
sudo mkdir -v /clusterfs

# Change permissions (define the permissions you want)
sudo chown -Rv nobody:nogroup /clusterfs
sudo chmod -Rv 777 /clusterfs

# Patch fuse config
sudo sed -e 's/#user_allow_other/user_allow_other/' -i /etc/fuse.conf

# Manual connection test
sshfs ubuntu@rpi-master:/clusterfs /clusterfs -o allow_other,reconnect,cache=yes,kernel_cache,compression=no,IdentityFile=/home/ubuntu/.ssh/id_rsa

# Verify the mountpoint
findmnt | grep clusterfs

# Open the file
sudo nano /etc/fstab

# Add this line
ubuntu@rpi-master:/clusterfs  /clusterfs   fuse.sshfs    allow_other,_netdev,delay_connect,reconnect,cache=yes,kernel_cache,compression=no,IdentityFile=/home/ubuntu/.ssh/id_rsa    0  0

Change the user_id and group_id according to the permission you have set on the master node.

Once done, save the file with [Ctrl + X] then reboot to apply the change. When restarted, run this command to verify that the shared storage is mounted correctly:

# Verify the mountpoint
$ findmnt | grep clusterfs
└─/clusterfs                          ubuntu@rpi-master:/clusterfs fuse.sshfs rw,relatime,user_id=0,group_id=0,allow_other

Now reboot to be able to use the shared storage correctly.

Copy config from master node

Now that we have everything to connect to the master node, we can copy the config files:

sudo cp -v /clusterfs/config/munge.key /etc/munge/munge.key
sudo cp -v /clusterfs/config/slurm.conf /etc/slurm-llnl/slurm.conf
sudo cp -v /clusterfs/config/cgroup* /etc/slurm-llnl/

Synchronize node time

In order to make the slurm part of the cluster working correctly, each nodes need to be synchronized to a time server.

# Check the actual date
date

# Change the time server to anyone you want
sudo ntpdate ch.pool.ntp.org

# Check the defined time zone
timedatectl

# Get the time zones list
timedatectl list-timezones

# Change the defined time zone if not correct
sudo timedatectl set-timezone your_time_zone

Configure Munge

Now we will test if the Munge key has been copied correctly and if the SLURM controller can successfully authenticate with the client nodes.

We need to start the service first:

# Enable and start Munge
sudo systemctl enable --now munge ; systemctl status munge

Now we can test the communication with the master node:

ssh ubuntu@rpi-master munge -n | unmunge

If it works, you should see something like this:

STATUS:           Success (0)
ENCODE_HOST:      rpi-master (REDACTED)
ENCODE_TIME:      2020-04-27 04:27:54 +0200 (1587954474)
DECODE_TIME:      2020-04-27 04:27:54 +0200 (1587954474)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha256 (5)
ZIP:              none (0)
UID:              ubuntu (1000)
GID:              ubuntu (1000)
LENGTH:           0

If you get an error, copy again the munge key from /clusterfs/config then restart the munge service. It should work after that.

You can also try to synchronize their time using ntpdate.

If everything has worked correctly so far, you can now start the Slurm daemon:

# Enable and start Slurm daemon
sudo systemctl enable --now slurmd ; systemctl status slurmd

Repeat these steps on each cluster nodes except the master node.

Test our Slurm cluster

Connect on the master node and run sinfo:

# Get cluster info
sinfo

You should get something similar:

PARTITION        AVAIL  TIMELIMIT  NODES  STATE NODELIST
ghettocluster*      up   infinite      4   idle node[02-05]

If the new node appears DOWN, run this command on the master node:

sudo scontrol update NodeName=<node_name> State=RESUME

Then run sinfo again. It should appear as idle.

Run a test command on all nodes:

# This will execute the `hostname` command on each cluster nodes
srun --nodes=4 hostname

If it works, you should get something similar:

node02
node03
node04
node05

Here are some useful commands:

# Get cluster nodes info
scontrol show nodes

# Get cluster partition info
scontrol show partition

Fix node state after reboot

When a node is restarting, sometimes it will appear as down in the sinfo output. In this case I've found two solutions to fix this issues, you can try both of them from the master node.

  1. Synchronize all cluster nodes:
    • ansible cluster -a "sudo ntpdate ch.pool.ntp.org"
  2. Update node status:
    • sudo scontrol update NodeName=<node_name> State=RESUME

Write a comment in this gist if both commands did not worked.

Add a project

I have used boinctui on my work computer and added all cluster nodes to the hosts list. Once done, I have added the Rosetta@Home project on each cluster nodes from boinctui.

To change the computing preferences, I have used BOINC Manager from my work computer. It was easier to the fix memory issues I got with the Raspberry Pi 3B+ nodes.

I did the mistake to blindly follow some memory settings found on the BOINC forums and it was wrong, this added to the reduced memory, some catastrophic system hang and forced reset.

Here is the working config for all nodes, including the for the Raspberry Pi 3B / 3B+:

  • Compute:
    • Used: 100%
    • Idle: 100%
    • Stop when usage is at:
      • 4B: 25%
      • 3B / 3B+: 15%
  • Network:
    • Always available
  • Memory:
    • Used: 90%
    • Idle: 90%
    • Swap: 90%

If you have some issues with the Raspberry Pi 3B+ and Rosetta@Home workunits, try to suspend them all and resume them. You should be able to run at least two workunits in same time.

Stoppping Netdata for a moment with sudo systemctl stop netdata might free up some memory.

Avoid overheating

In order to avoid the overheating of the Raspberry Pi nodes, I've developped a script based on lm-sensors and cron to automatically control the BOINC workload based on the host temperature.

When the host is overheating, the client service is stopped, this will stop the computing process. Once the host is fresh enough, the client service is restarted and computing process is relaunched.

You can find the required instructions here: https://gist.github.com/Jiab77/1b9c32d550ebb93c471a8fa5b92cf2bf.

You will have repeat the steps on each cluster nodes, not only the Raspberry Pi's.

You should seriously advice you to use this script as sometimes the nodes can be very hot and it can cause some serious issues. The script will be kept updated.

Cluster performances

On the Raspberry Pi 4B nodes:

image

On the Raspberry Pi 3B+ nodes:

image

Output from boinctui.

Total workload:

image

Output from nmon.

Network bandwith:

image

Output from bmon.

Cluster preview

I'll post some previews during the cluster evolution.

Cluster v1

The cluster itself (without master node):

image image

Results

On the Master node:

# Show ansible cluster nodes
$ ansible ghettocluster -m ping

# Result
rpi-4b-01 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
rpi-4b-02 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
rpi-3bp-02 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
rpi-3bp-01 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
rpi-master | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}

# Show slurm cluster nodes
$ sinfo -Nl

# Result
NODELIST   NODES      PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
node02         1 ghettocluster*        idle    4    1:4:1      1        0      1   (null) none                
node03         1 ghettocluster*        idle    4    1:4:1      1        0      1   (null) none                
node04         1 ghettocluster*        idle    4    1:4:1      1        0      1   (null) none                
node05         1 ghettocluster*        idle    4    1:4:1      1        0      1   (null) none 

# Show cluster partition
$ scontrol show partition

# Result
PartitionName=ghettocluster
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=node[02-05]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=16 TotalNodes=4 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Cluster v2

The cluster itself with master node + two laptops:

image image image image image

Results

On the Master node:

# Show ansible cluster nodes
$ ansible ghettocluster -m ping

# Result
rpi-4b-02 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
rpi-3bp-01 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
rpi-4b-01 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
rpi-master | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
rpi-3bp-02 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
lenovo-01 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}
dell-01 | SUCCESS => {
    "changed": false, 
    "ping": "pong"
}

# Show slurm cluster nodes
$ sinfo -Nl

# Result
NODELIST   NODES      PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
node02         1 ghettocluster*        idle    4    1:4:1      1        0      1   (null) none                
node03         1 ghettocluster*        idle    4    1:4:1      1        0      1   (null) none                
node04         1 ghettocluster*        idle    4    1:4:1      1        0      1   (null) none                
node05         1 ghettocluster*        idle    4    1:4:1      1        0      1   (null) none                
node06         1 ghettocluster*        idle    2    1:2:1      1        0      1   (null) none    

# Show cluster partition
$ scontrol show partition

# Result
PartitionName=ghettocluster
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=node[02-06]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=18 TotalNodes=5 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

Cluster v3

Pictures will coming soon.

Bonus: Add netdata for monitoring the performance

I have used Netdata for monitoring the temperature and the global performance of each cluster nodes.

# Install everything
bash <(curl -Ss https://my-netdata.io/kickstart.sh) all --dont-wait

# Enable KSM (to reduce memory consumption)
sudo su -c 'echo 1 >/sys/kernel/mm/ksm/run'
sudo su -c 'echo 1000 >/sys/kernel/mm/ksm/sleep_millisecs'

# Patch the sensors config (only required for RPis)
sudo /etc/netdata/edit-config charts.d.conf

Add sensors=force at the end of the file then save it with [Ctrl+O] and [Ctrl+X].

Restart the service to apply the change:

sudo systemctl restart netdata

References

Contribute

Feel free to create a comment to contribute on this gist.

Contact

You can reach me on Twitter by using @Jiab77.

@binary-diver
Copy link

Hi,
I am just newbie here and tried your instructions for Raspberry Pi clustering to use BOINC Rosetta@home. I have total three raspi’s running and was able to assemble and run the 'scontrol show nodes' command following your instructions. But going forward for 'Add a project' section (https://gist.github.com/Jiab77/4dc1f8bed339336e02b70c7b0b135a11#add-a-project) I was a little confused that if this is executing Rosetta tasks as cluster or performing as individual nodes. My results were just executing tasks as individual nodes not as a cluster.

But according to your instructions from beginning to before section 'Add a project', I think your were trying to achieve cluster computing. Do you have more detail instructions before adding a project? For example, I cannot find the purpose for installing mpi4py to run Rosetta@home as cluster. I believe there should be a certain command before going forward. I appreciate if you can help me.

Best regards,

@Jiab77
Copy link
Author

Jiab77 commented Nov 22, 2023

@binary-diver, sorry for the misunderstanding. when you add all the cluster nodes into the boinctui config, you can see the whole workload from each individual cluster nodes but you are right on the fact that projects must be added individually per cluster nodes.

You can also use boincmgr for doing same with a graphical interface. I think that for processing projects as a cluster, you might need to build something that uses the RPC interface in order to control all the cluster nodes or rely on a middleware that will act between slurm and boinc or any other solution you may use to control and manage resources and projects across each cluster nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment