The goal of this gist is to explain how I have build a Beowulf cluster based on Raspberry Pi models 3B / 3B+ and 4B.
The cluster will be expendable with other device types. Any working computer can be added to the cluster.
The cluster will be initially used to compute workunits from the Rosetta@Home project.
In this gist I will stay with Raspberry Pi nodes.
I've used the following materials:
- 2x Raspberry Pi 4B (compute nodes)
- 2x Raspberry Pi 3B+ (compute nodes)
- 1x Raspberry Pi 3B+ (master / control node)
- 1x Gigabit Ethernet Switch
- 1x Serious heat cooling solution (otherwise you will just kill your RPis)
Recommended materials:
- USB Powerbank
- PoE Hat
- Dedicated cooling solution per cluster nodes
To run Rosetta@Home workunits, it is required to use a 64 bit OS. In this case, I have used Ubuntu Server 18.04.4 ARM64 image.
I have used gnome-disk-utility
tool to flash the image on the SD cards.
You can use any other tool of your choice for doing same.
In order to control all cluster nodes easily, I have used Ansible from my work computer. It will also be installed on the master node.
# Add Ansible repository
sudo apt-add-repository --yes --update ppa:ansible/ansible
# Install required packages for master node
sudo apt install wireless-tools wavemon bmon nmon boinctui ansible
# Install required packages for compute nodes
sudo apt install wireless-tools wavemon bmon nmon boinc-client
This section will be added later as I've used my work computer to control the cluster nodes because I was not sure that it would work initially...
This section will detail required steps to prepare the compute nodes.
This has to be done only on Raspberry Pi 3B+. It is not necessary for Raspberry Pi 4B.
Edit the config.txt
boot file to decrease the memory allocated to the GPU in order to increase the memory allocated to the CPU. 😅
# When not running:
nano /boot/config.txt
# When already running:
sudo nano /boot/firmware/config.txt
Then add or change the gpu_mem
config value to:
- 16 MB for Raspberry Pi 3B/3B+
- 32 MB for Raspberry Pi 4B
I've seen people going to 8 MB on Raspberry Pi 3B/B+ but I won't do that and I prefer stay to the lowest value recommended per the documentation here: https://www.raspberrypi.org/documentation/configuration/config-txt/memory.md
[all]
arm_64bit=1
device_tree_address=0x03000000
gpu_mem=16
Save the file and reboot the nodes to apply the changes.
Even if the Raspberry Pi 4B has enough RAM to be a good cluster node, it will help to have more workunits per nodes.
On the Raspberry Pi 3B+, it is necessary to enable Zram memory compression to increase the available memory size.
Now, let's go technical! 😁
Create the loading script:
sudo nano /usr/bin/zram.sh
And place this content:
#!/bin/bash
cores=$(nproc --all)
modprobe zram num_devices=$cores
modprobe zstd
modprobe lz4hc_compress
swapoff -a
totalmem=`free | grep -e "^Mem:" | awk '{print $2}'`
#mem=$(( ($totalmem / $cores)* 1024 ))
mem=$(( ($totalmem * 4 / 3 / $cores)* 1024 ))
core=0
while [ $core -lt $cores ]; do
echo zstd > /sys/block/zram$core/comp_algorithm ||
echo lz4hc > /sys/block/zram$core/comp_algorithm ||
echo lz4 > /sys/block/zram$core/comp_algorithm
echo $mem > /sys/block/zram$core/disksize
mkswap /dev/zram$core
swapon -p 5 /dev/zram$core
let core=core+1
done
The zstd compression algorithm has been used for better performance results.
Then save it with [Ctrl+O]
and [Ctrl+X]
.
Make it executable:
sudo chmod -v +x /usr/bin/zram.sh
Then create the boot script:
sudo nano /etc/rc.local
And place this content:
#!/bin/bash
/usr/bin/zram.sh &
exit 0
Then save it with [Ctrl+O]
and [Ctrl+X]
.
Make it executable:
sudo chmod -v +x /etc/rc.local
To finish, run the script to create the additional memory. To see the available memory and the compression stats, run the following commands:
# Show available memory
free -mlht
# Show memory compression stats
zramctl
Here you can see the advantage of using Zram to create additional memory.
$ free -mlht
total used free shared buff/cache available
Mem: 3.7G 2.6G 54M 2.1M 1.1G 1.1G
Low: 3.7G 3.6G 54M
High: 0B 0B 0B
Swap: 3.7G 22M 3.7G
Total: 7.4G 2.6G 3.7G
$ zramctl
NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram3 947.9M 5.3M 1.3M 1.8M 4 [SWAP]
/dev/zram2 947.9M 5.2M 1.3M 1.8M 4 [SWAP]
/dev/zram1 947.9M 5.5M 1.3M 1.8M 4 [SWAP]
/dev/zram0 947.9M 5.5M 1.4M 1.9M 4 [SWAP]
This is with the standard LZO-RLE compression algorithm.
$ free -mlht
total used free shared buff/cache available
Mem: 957M 720M 24M 2.1M 212M 219M
Low: 957M 933M 24M
High: 0B 0B 0B
Swap: 1.2G 47M 1.2G
Total: 2.2G 767M 1.2G
$ zramctl
NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram3 319.3M 11.5M 2M 2.4M 4 [SWAP]
/dev/zram2 319.3M 11.5M 2.2M 2.7M 4 [SWAP]
/dev/zram1 319.3M 11.7M 2.1M 2.6M 4 [SWAP]
/dev/zram0 319.3M 11.8M 2.1M 2.5M 4 [SWAP]
This is with the standard ZSTD compression algorithm.
I have used boinctui
on my work computer and added all cluster nodes to the hosts list. Once done, I have added the Rosetta@Home project on each cluster nodes from boinctui
.
To change the computing preferences, I have used BOINC Manager from my work computer. It was easier to the fix memory issues I got with the Raspberry Pi 3B+ nodes.
If you have some issues with the Raspberry Pi 3B+ and Rosetta@Home workunits, try to suspend them all and resume them. You should be able to run at least two workunits in same time.
On the Raspberry Pi 4B nodes:
On the Raspberry Pi 3B+ nodes:
Output from
boinctui
.
Total workload:
Output from
nmon
.
Network bandwith:
Output from
bmon
.
The cluster itself (without master / control node):
Closer look:
In order to avoid the overheating of the Raspberry Pi nodes, I've developped a script based on lm-sensors
and cron
to automatically control the BOINC workload based on the host temperature.
When the host is overheating, the client service is stopped, this will stop the computing process. Once the host is fresh enough, the client service is restarted and computing process is relaunched.
You can find the required instructions here: https://gist.github.com/Jiab77/1b9c32d550ebb93c471a8fa5b92cf2bf.
You will have repeat the steps on each cluster nodes, not only the Raspberry Pi's.
I have used Netdata for monitoring the temperature and the global performance of each cluster nodes.
# Install everything
bash <(curl -Ss https://my-netdata.io/kickstart.sh) all --dont-wait
# Enable KSM (to reduce memory consumption)
sudo su -c 'echo 1 >/sys/kernel/mm/ksm/run'
sudo su -c 'echo 1000 >/sys/kernel/mm/ksm/sleep_millisecs'
# Patch the sensors config (only required for RPis)
sudo nano /etc/netdata/edit-conf charts.d.conf
Add sensors=force
at the end of the file then save it with [Ctrl+O] and [Ctrl+X].
Restart the service to apply the change:
sudo systemctl restart netdata
I will edit this gist later to add details about the master / control node. I will also add the necessary packages for Slurm, MPICH and MPI4PY.
This will give me to possiblity to run parallel computing tasks over all cluster nodes.
The final goal will be to create a low cost computing platform that can be managed with a web interface that I will create soon.
You can read the progress here: https://gist.github.com/Jiab77/4dc1f8bed339336e02b70c7b0b135a11
- https://www.element14.com/community/thread/75254/l/set-your-processors-to-analyse-for-the-covid-19-virus-with-foldinghome-or-rosettahome?displayFullThread=true
- https://github.com/novaspirit/rpi_zram
- https://www.raspberrypi.org/documentation/configuration/config-txt/memory.md
- https://en.wikipedia.org/wiki/Zram
- https://www.reddit.com/r/BOINC/comments/g0r0wa/running_rosetta_covid19_workunits_on_raspberry_pi/
- https://www.phoronix.com/forums/forum/software/mobile-linux/1086709-zram-will-see-greater-performance-on-linux-5-1-it-changed-its-default-compressor?p=1172884#post1172884
- https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13795
- https://www.kernel.org/doc/html/latest/admin-guide/blockdev/zram.html
- https://github.com/facebook/zstd
- https://gist.github.com/Jiab77/1b9c32d550ebb93c471a8fa5b92cf2bf
Feel free to create a comment to contribute on this gist.
You can reach me on Twitter by using @Jiab77.