The goal of this gist is to explain how I have build a Beowulf cluster based on Raspberry Pi models 3B / 3B+ and 4B.
The cluster will be expendable with other device types. Any working computer can be added to the cluster.
Adding the following packages: Ansible, SSHFS, Slurm, MPICH and MPI4PY will give me to possiblity to run parallel computing tasks using all cluster nodes.
The final goal will be to create a low cost computing platform that can be managed with a web interface that I will create soon. Adding remote clients and compute nodes is also a part of this project.
Thanks to Garret Mills for his article where this gist is based on, with some changes for my own needs.
In this gist I will start with Raspberry Pi nodes then add several different nodes later.
I'll just list here the potential usage of this cluster:
- BOINC computing (done)
- Parallel tasks execution with SLURM / MPICH / MPI4PY (done)
- Password cracking
- Distributed compilation
- Bitcoin mining
- Create a Kubernetes cluster
- Scientific data processing
- Create an open computing platform
I've used the following materials:
- 2x Raspberry Pi 4B (compute nodes)
- 2x Raspberry Pi 3B+ (compute nodes)
- 1x Raspberry Pi 3B+ (master / control node)
- 1x Gigabit Ethernet Switch
- 1x Serious heat cooling solution (otherwise you will just kill your RPis)
Recommended materials:
- Stacking cases
- USB Powerbank
- PoE Hat
- A really serious dedicated cooling solution per cluster nodes
To run Rosetta@Home workunits, it is required to use a 64 bit OS. In this case, I have used Ubuntu Server 18.04.4 ARM64 image.
As I want to be able to run as more as possible BOINC projects compatible with ARM platforms, I'll later add some ARM 32 bit nodes with the Ubuntu Server 18.04.4 ARMHF image.
I have used gnome-disk-utility
tool to flash the image on the SD cards.
You can use any other tool of your choice for doing same.
This has to be done only on Raspberry Pi 3B+. It is not necessary for Raspberry Pi 4B.
Edit the config.txt
boot file to decrease the memory allocated to the GPU in order to increase the memory allocated to the CPU. 😅
# When not running:
nano /boot/config.txt
# When already running:
sudo nano /boot/firmware/config.txt
Then add or change the gpu_mem
config value to:
1632 MB for Raspberry Pi 3B/3B+ (moved back to 32MB for stability reasons)- 32 MB for Raspberry Pi 4B
I've seen people going to 8 MB on Raspberry Pi 3B/B+ but I won't do that and I prefer stay to the lowest value recommended per the documentation here: https://www.raspberrypi.org/documentation/configuration/config-txt/memory.md
I had to move back to 32 MB as it was very unstable.
[all]
arm_64bit=1
device_tree_address=0x03000000
gpu_mem=32
Save the file and reboot the nodes to apply the changes.
Even if the Raspberry Pi 4B has enough RAM to be a good cluster node, it will help to have more workunits per nodes.
On the Raspberry Pi 3B+, it is necessary to enable Zram memory compression to increase the available memory size.
Now, let's go technical! 😁
Create the loading script:
sudo nano /usr/bin/zram.sh
And place this content:
#!/bin/bash
echo -e "\nExpanding available memory with zRAM...\n"
cores=$(nproc --all)
modprobe zram num_devices=$cores
modprobe zstd
modprobe lz4hc_compress
swapoff -a
totalmem=`free | grep -e "^Mem:" | awk '{print $2}'`
#mem=$(( ($totalmem / $cores)* 1024 ))
mem=$(( ($totalmem * 4 / 3 / $cores)* 1024 ))
core=0
while [ $core -lt $cores ]; do
echo zstd > /sys/block/zram$core/comp_algorithm 2>/dev/null ||
echo lz4hc > /sys/block/zram$core/comp_algorithm 2>/dev/null ||
echo lz4 > /sys/block/zram$core/comp_algorithm 2>/dev/null
echo $mem > /sys/block/zram$core/disksize
mkswap /dev/zram$core
swapon -p 5 /dev/zram$core
let core=core+1
done
The zstd compression algorithm has been used for better performance results.
It might not be supported on all systems, that's why I've added some other compression algorithms.
Then save it with [Ctrl+O]
and [Ctrl+X]
.
Make it executable:
sudo chmod -v +x /usr/bin/zram.sh
Then create the boot script:
sudo nano /etc/rc.local
And place this content:
#!/bin/bash
/usr/bin/zram.sh &
exit 0
Then save it with [Ctrl+O]
and [Ctrl+X]
.
Make it executable:
sudo chmod -v +x /etc/rc.local
To finish, run the script to create the additional memory. To see the available memory and the compression stats, run the following commands:
# Manual start
sudo /usr/bin/zram.sh
# Show memory compression stats
zramctl
# Show available memory
free -mlht
Here you can see the advantage of using Zram to create additional memory.
$ free -mlht
total used free shared buff/cache available
Mem: 3.7G 2.6G 54M 2.1M 1.1G 1.1G
Low: 3.7G 3.6G 54M
High: 0B 0B 0B
Swap: 3.7G 22M 3.7G
Total: 7.4G 2.6G 3.7G
$ zramctl
NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram3 947.9M 5.3M 1.3M 1.8M 4 [SWAP]
/dev/zram2 947.9M 5.2M 1.3M 1.8M 4 [SWAP]
/dev/zram1 947.9M 5.5M 1.3M 1.8M 4 [SWAP]
/dev/zram0 947.9M 5.5M 1.4M 1.9M 4 [SWAP]
This is with the standard LZO-RLE compression algorithm.
$ free -mlht
total used free shared buff/cache available
Mem: 957M 720M 24M 2.1M 212M 219M
Low: 957M 933M 24M
High: 0B 0B 0B
Swap: 1.2G 47M 1.2G
Total: 2.2G 767M 1.2G
$ zramctl
NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram3 319.3M 11.5M 2M 2.4M 4 [SWAP]
/dev/zram2 319.3M 11.5M 2.2M 2.7M 4 [SWAP]
/dev/zram1 319.3M 11.7M 2.1M 2.6M 4 [SWAP]
/dev/zram0 319.3M 11.8M 2.1M 2.5M 4 [SWAP]
This is with the standard ZSTD compression algorithm.
In order to control all cluster nodes easily, I have used Ansible from my work computer. It will also be installed on the master node.
# Add Ansible repository
sudo apt-add-repository --yes --update ppa:ansible/ansible
# Install required packages for master node
sudo apt install wireless-tools wavemon bmon nmon boinctui ntpdate ansible sshfs slurm-wlm mpich python3-pip r-base
# Install required python3 dependency
pip3 install cython
# Install required python3 packages
pip3 install numpy mpi4py
# Install required packages for compute nodes
sudo apt install wireless-tools wavemon bmon nmon boinc-client slurmd slurm-client ntpdate sshfs mpich python3-pip r-base
# Install required python3 dependency
pip3 install cython
# Install required python3 packages
pip3 install numpy mpi4py
# Add host entry to the master node
echo "[IP] rpi-master node01" | sudo tee -a /etc/hosts
Replace
[IP]
with the IP address of the master node.
To be able to access to each cluster nodes without password, we'll generate ssh
keys for each nodes, including the master node.
# Generate ssh key (reply to all questions)
ssh-keygen
Once done, copy the cluster node key to the node:
# Copy the node key to the master node
ssh-copy-id -i .ssh/id_rsa.pub ubuntu@node01
Repeat this step for each cluster nodes.
Now that all required packages are installed, we can start the cluster configuration.
To be able to manage the cluster, all hostnames needs to be added into the /etc/hosts
file.
# Open the system hosts file
sudo nano /etc/hosts
# Add all nodes hostnames
[IP] rpi-master node01
[IP] rpi-4b-01 node02
[IP] rpi-4b-02 node03
[IP] rpi-3bp-01 node04
[IP] rpi-3bp-02 node05
Then save the file with [Ctrl + X]
.
Replace
[IP]
by the IP address of each nodes.I've used two different hostnames:
- The first one is related to the node itself
- The second one is related to the cluster member id
To be able to access to each cluster nodes without password from the master node, we'll need to send the master node key to all other cluster nodes.
# Generate ssh key (reply to all questions)
ssh-keygen
Once done, copy the master node key to each cluster nodes:
# Copy the master node key to each cluster nodes
ssh-copy-id -i .ssh/id_rsa.pub ubuntu@node0X
Replace
node0X
by the real node name.
Repeat this step for each cluster nodes.
On the master node, create the shared folder:
# Create the shared folder
sudo mkdir -v /clusterfs
# Change permissions (define the permissions you want)
sudo chown -Rv nobody:nogroup /clusterfs
sudo chmod -Rv 777 /clusterfs
The defined permissions are very relaxed and you should not used on production networks.
They are used only to avoid permission issues during the setup procedure.
To manage the cluster I'll use Ansible. You can configure it that way:
# Open the ansible hosts file
sudo nano /etc/ansible/hosts
# Add cluster nodes hostnames
[rpis]
rpi-master ansible_python_interpreter=python3 ansible_user=ubuntu
rpi-4b-01 ansible_python_interpreter=python3 ansible_user=ubuntu
rpi-4b-02 ansible_python_interpreter=python3 ansible_user=ubuntu
rpi-3bp-01 ansible_python_interpreter=python3 ansible_user=ubuntu
rpi-3bp-02 ansible_python_interpreter=python3 ansible_user=ubuntu
[ghettocluster]
rpi-master
rpi-4b-01
rpi-4b-02
rpi-3bp-01
rpi-3bp-02
Then save the file with [Ctrl + X]
.
Some explanations:
- The first host group
[rpis]
is used to define settings related to the host itself. - The second host group
[ghettocluster]
is the internal name of the cluster, it contains all cluster nodes.
Now you can test the cluster that way:
# Ping all nodes
ansible ghettocluster -m ping
# Show memory info from all nodes
ansible ghettocluster -a "free -mlht"
# Show hostname of all nodes
ansible ghettocluster -a "hostname"
# Synchronize time of all nodes
ansible ghettocluster -a "sudo ntpdate ch.pool.ntp.org"
# Check timesync of all nodes
ansible ghettocluster -a "date"
Before configuring Slurm, we need to copy the sample config first.
# Go to the Slurm config directory
cd /etc/slurm-llnl
# Copy the config sample archive
sudo cp /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz .
# Uncompress the config sample archive
gzip -d slurm.conf.simple.gz
# Rename the config sample file
mv slurm.conf.simple slurm.conf
Then open the slurm.conf
file with sudo nano slurm.conf
and set the following settings:
# Define master node
ControlMachine=rpi-master
ControlAddr=<node-ip>
# Configure scheduler algorithm
SelectType=select/cons_res
SelectTypeParameters=CR_Core
# Define cluster name
ClusterName=ghettocluster
# Add nodes
# Remove the sample config to the end
NodeName=node01 NodeAddr=<ip addr node01> CPUs=4 State=UNKNOWN
NodeName=node02 NodeAddr=<ip addr node02> CPUs=4 State=UNKNOWN
NodeName=node03 NodeAddr=<ip addr node03> CPUs=4 State=UNKNOWN
NodeName=node04 NodeAddr=<ip addr node04> CPUs=4 State=UNKNOWN
NodeName=node05 NodeAddr=<ip addr node05> CPUs=4 State=UNKNOWN
# Create cluster partition and add it all nodes
PartitionName=ghettocluster Nodes=node[02-05] Default=YES MaxTime=INFINITE State=UP
Replace
<node-ip>
by your master node IP address.
Save the file with [Ctrl + X]
.
Now we have to configure the cgroups kernel isolation. Create the file /etc/slurm-llnl/cgroup.conf
and add the following content:
CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm-llnl/cgroup"
AllowedDevicesFile="/etc/slurm-llnl/cgroup_allowed_devices_file.conf"
ConstrainCores=no
TaskAffinity=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=no
ConstrainDevices=no
AllowedRamSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30
Save the file with [Ctrl + X]
.
Now we have to whitelist some system devices by creating the file /etc/slurm-llnl/cgroup_allowed_devices_file.conf
and adding the following content:
/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/clusterfs*
Save the file with [Ctrl + X]
.
To finish, copy the configuration files to the shared storage:
# Copy cluster config
sudo cp -v slurm.conf cgroup.conf cgroup_allowed_devices_file.conf /clusterfs
# Copy authentication key
sudo cp -v /etc/munge/munge.key /clusterfs
More details about Munge:
Munge is the access system that SLURM uses to run commands and processes on the other nodes. Similar to key-based SSH, it uses a private key on all the nodes, then requests are timestamp-encrypted and sent to the node, which decrypts them using the identical key. This is why it is so important that the system times be in sync, and that they all have the munge.key file.
Now we can enable the required services:
# Enable and start Munge
sudo systemctl enable --now munge ; systemctl status munge
# Enable and start Slurm daemon
sudo systemctl enable --now slurmd ; systemctl status slurmd
# Enable and start Slurm control daemon
sudo systemctl enable --now slurmctld ; systemctl status slurmctld
Reboot if you are having problems with Munge authentication, or your nodes can’t communicate with the SLURM controller, try rebooting it.
Instructions will come soon.
This section will detail required steps to prepare the compute nodes.
In order to be able to communicate all together, you need to define nodes hostnames in the file /etc/hosts
.
# Open the system hosts file
sudo nano /etc/hosts
# Add all cluster nodes
[IP] rpi-master node01
[IP] rpi-4b-01 node02
[IP] rpi-4b-02 node03
[IP] rpi-3bp-01 node04
[IP] rpi-3bp-02 node05
The master node has two different hostnames:
- The first one is related to the node itself
- The second one is related to the cluster member id
The actual shared cluster storage is hosted on the master node. I'll use the /etc/fstab
file to mount the shared storage at boot.
# Create the shared folder
sudo mkdir -v /clusterfs
# Change permissions (define the permissions you want)
sudo chown -Rv nobody:nogroup /clusterfs
sudo chmod -Rv 777 /clusterfs
# Patch fuse config
sudo sed -e 's/#user_allow_other/user_allow_other/' -i /etc/fuse.conf
# Manual connection test
sshfs ubuntu@rpi-master:/clusterfs /clusterfs -o allow_other,reconnect,cache=yes,kernel_cache,compression=no,IdentityFile=/home/ubuntu/.ssh/id_rsa
# Verify the mountpoint
findmnt | grep clusterfs
# Open the file
sudo nano /etc/fstab
# Add this line
ubuntu@rpi-master:/clusterfs /clusterfs fuse.sshfs allow_other,_netdev,delay_connect,reconnect,cache=yes,kernel_cache,compression=no,IdentityFile=/home/ubuntu/.ssh/id_rsa 0 0
Change the
user_id
andgroup_id
according to the permission you have set on the master node.
Once done, save the file with [Ctrl + X]
then reboot to apply the change. When restarted, run this command to verify that the shared storage is mounted correctly:
# Verify the mountpoint
$ findmnt | grep clusterfs
└─/clusterfs ubuntu@rpi-master:/clusterfs fuse.sshfs rw,relatime,user_id=0,group_id=0,allow_other
Now reboot to be able to use the shared storage correctly.
Now that we have everything to connect to the master node, we can copy the config files:
sudo cp -v /clusterfs/config/munge.key /etc/munge/munge.key
sudo cp -v /clusterfs/config/slurm.conf /etc/slurm-llnl/slurm.conf
sudo cp -v /clusterfs/config/cgroup* /etc/slurm-llnl/
In order to make the slurm
part of the cluster working correctly, each nodes need to be synchronized to a time server.
# Check the actual date
date
# Change the time server to anyone you want
sudo ntpdate ch.pool.ntp.org
# Check the defined time zone
timedatectl
# Get the time zones list
timedatectl list-timezones
# Change the defined time zone if not correct
sudo timedatectl set-timezone your_time_zone
Now we will test if the Munge key has been copied correctly and if the SLURM controller can successfully authenticate with the client nodes.
We need to start the service first:
# Enable and start Munge
sudo systemctl enable --now munge ; systemctl status munge
Now we can test the communication with the master node:
ssh ubuntu@rpi-master munge -n | unmunge
If it works, you should see something like this:
STATUS: Success (0)
ENCODE_HOST: rpi-master (REDACTED)
ENCODE_TIME: 2020-04-27 04:27:54 +0200 (1587954474)
DECODE_TIME: 2020-04-27 04:27:54 +0200 (1587954474)
TTL: 300
CIPHER: aes128 (4)
MAC: sha256 (5)
ZIP: none (0)
UID: ubuntu (1000)
GID: ubuntu (1000)
LENGTH: 0
If you get an error, copy again the munge
key from /clusterfs/config
then restart the munge
service. It should work after that.
You can also try to synchronize their time using ntpdate
.
If everything has worked correctly so far, you can now start the Slurm daemon:
# Enable and start Slurm daemon
sudo systemctl enable --now slurmd ; systemctl status slurmd
Repeat these steps on each cluster nodes except the master node.
Connect on the master node and run sinfo
:
# Get cluster info
sinfo
You should get something similar:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
ghettocluster* up infinite 4 idle node[02-05]
If the new node appears
DOWN
, run this command on the master node:
sudo scontrol update NodeName=<node_name> State=RESUME
Then run
sinfo
again. It should appear asidle
.
Run a test command on all nodes:
# This will execute the `hostname` command on each cluster nodes
srun --nodes=4 hostname
If it works, you should get something similar:
node02
node03
node04
node05
Here are some useful commands:
# Get cluster nodes info
scontrol show nodes
# Get cluster partition info
scontrol show partition
When a node is restarting, sometimes it will appear as down in the sinfo
output. In this case I've found two solutions to fix this issues, you can try both of them from the master node.
- Synchronize all cluster nodes:
ansible cluster -a "sudo ntpdate ch.pool.ntp.org"
- Update node status:
sudo scontrol update NodeName=<node_name> State=RESUME
Write a comment in this gist if both commands did not worked.
I have used boinctui
on my work computer and added all cluster nodes to the hosts list. Once done, I have added the Rosetta@Home project on each cluster nodes from boinctui
.
To change the computing preferences, I have used BOINC Manager from my work computer. It was easier to the fix memory issues I got with the Raspberry Pi 3B+ nodes.
I did the mistake to blindly follow some memory settings found on the BOINC forums and it was wrong, this added to the reduced memory, some catastrophic system hang and forced reset.
Here is the working config for all nodes, including the for the Raspberry Pi 3B / 3B+:
- Compute:
- Used: 100%
- Idle: 100%
- Stop when usage is at:
- 4B: 25%
- 3B / 3B+: 15%
- Network:
- Always available
- Memory:
- Used: 90%
- Idle: 90%
- Swap: 90%
If you have some issues with the Raspberry Pi 3B+ and Rosetta@Home workunits, try to suspend them all and resume them. You should be able to run at least two workunits in same time.
Stoppping Netdata for a moment with
sudo systemctl stop netdata
might free up some memory.
In order to avoid the overheating of the Raspberry Pi nodes, I've developped a script based on lm-sensors
and cron
to automatically control the BOINC workload based on the host temperature.
When the host is overheating, the client service is stopped, this will stop the computing process. Once the host is fresh enough, the client service is restarted and computing process is relaunched.
You can find the required instructions here: https://gist.github.com/Jiab77/1b9c32d550ebb93c471a8fa5b92cf2bf.
You will have repeat the steps on each cluster nodes, not only the Raspberry Pi's.
You should seriously advice you to use this script as sometimes the nodes can be very hot and it can cause some serious issues. The script will be kept updated.
On the Raspberry Pi 4B nodes:
On the Raspberry Pi 3B+ nodes:
Output from
boinctui
.
Total workload:
Output from
nmon
.
Network bandwith:
Output from
bmon
.
I'll post some previews during the cluster evolution.
The cluster itself (without master node):
On the Master node:
# Show ansible cluster nodes
$ ansible ghettocluster -m ping
# Result
rpi-4b-01 | SUCCESS => {
"changed": false,
"ping": "pong"
}
rpi-4b-02 | SUCCESS => {
"changed": false,
"ping": "pong"
}
rpi-3bp-02 | SUCCESS => {
"changed": false,
"ping": "pong"
}
rpi-3bp-01 | SUCCESS => {
"changed": false,
"ping": "pong"
}
rpi-master | SUCCESS => {
"changed": false,
"ping": "pong"
}
# Show slurm cluster nodes
$ sinfo -Nl
# Result
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
node02 1 ghettocluster* idle 4 1:4:1 1 0 1 (null) none
node03 1 ghettocluster* idle 4 1:4:1 1 0 1 (null) none
node04 1 ghettocluster* idle 4 1:4:1 1 0 1 (null) none
node05 1 ghettocluster* idle 4 1:4:1 1 0 1 (null) none
# Show cluster partition
$ scontrol show partition
# Result
PartitionName=ghettocluster
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=node[02-05]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=16 TotalNodes=4 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
The cluster itself with master node + two laptops:
On the Master node:
# Show ansible cluster nodes
$ ansible ghettocluster -m ping
# Result
rpi-4b-02 | SUCCESS => {
"changed": false,
"ping": "pong"
}
rpi-3bp-01 | SUCCESS => {
"changed": false,
"ping": "pong"
}
rpi-4b-01 | SUCCESS => {
"changed": false,
"ping": "pong"
}
rpi-master | SUCCESS => {
"changed": false,
"ping": "pong"
}
rpi-3bp-02 | SUCCESS => {
"changed": false,
"ping": "pong"
}
lenovo-01 | SUCCESS => {
"changed": false,
"ping": "pong"
}
dell-01 | SUCCESS => {
"changed": false,
"ping": "pong"
}
# Show slurm cluster nodes
$ sinfo -Nl
# Result
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
node02 1 ghettocluster* idle 4 1:4:1 1 0 1 (null) none
node03 1 ghettocluster* idle 4 1:4:1 1 0 1 (null) none
node04 1 ghettocluster* idle 4 1:4:1 1 0 1 (null) none
node05 1 ghettocluster* idle 4 1:4:1 1 0 1 (null) none
node06 1 ghettocluster* idle 2 1:2:1 1 0 1 (null) none
# Show cluster partition
$ scontrol show partition
# Result
PartitionName=ghettocluster
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=node[02-06]
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=18 TotalNodes=5 SelectTypeParameters=NONE
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
Pictures will coming soon.
I have used Netdata for monitoring the temperature and the global performance of each cluster nodes.
# Install everything
bash <(curl -Ss https://my-netdata.io/kickstart.sh) all --dont-wait
# Enable KSM (to reduce memory consumption)
sudo su -c 'echo 1 >/sys/kernel/mm/ksm/run'
sudo su -c 'echo 1000 >/sys/kernel/mm/ksm/sleep_millisecs'
# Patch the sensors config (only required for RPis)
sudo /etc/netdata/edit-config charts.d.conf
Add sensors=force
at the end of the file then save it with [Ctrl+O] and [Ctrl+X].
Restart the service to apply the change:
sudo systemctl restart netdata
- https://www.element14.com/community/thread/75254/l/set-your-processors-to-analyse-for-the-covid-19-virus-with-foldinghome-or-rosettahome?displayFullThread=true
- https://github.com/novaspirit/rpi_zram
- https://www.raspberrypi.org/documentation/configuration/config-txt/memory.md
- https://en.wikipedia.org/wiki/Zram
- https://www.reddit.com/r/BOINC/comments/g0r0wa/running_rosetta_covid19_workunits_on_raspberry_pi/
- https://www.phoronix.com/forums/forum/software/mobile-linux/1086709-zram-will-see-greater-performance-on-linux-5-1-it-changed-its-default-compressor?p=1172884#post1172884
- https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13795
- https://www.kernel.org/doc/html/latest/admin-guide/blockdev/zram.html
- https://github.com/facebook/zstd
- https://medium.com/@glmdev/building-a-raspberry-pi-cluster-784f0df9afbd
- https://medium.com/@glmdev/building-a-raspberry-pi-cluster-aaa8d1f3d2ca
- https://medium.com/@glmdev/building-a-raspberry-pi-cluster-f5f2446702e8
- https://gist.github.com/Jiab77/97451d3d1fd94040b2e8d88263529794
- https://gist.github.com/Jiab77/1b9c32d550ebb93c471a8fa5b92cf2bf
- https://askubuntu.com/questions/326977/sshfs-is-not-mounting-automatically-at-boot-despite-etc-fstab-configuration/695302#695302
Feel free to create a comment to contribute on this gist.
You can reach me on Twitter by using @Jiab77.
Hi,
I am just newbie here and tried your instructions for Raspberry Pi clustering to use BOINC Rosetta@home. I have total three raspi’s running and was able to assemble and run the 'scontrol show nodes' command following your instructions. But going forward for 'Add a project' section (https://gist.github.com/Jiab77/4dc1f8bed339336e02b70c7b0b135a11#add-a-project) I was a little confused that if this is executing Rosetta tasks as cluster or performing as individual nodes. My results were just executing tasks as individual nodes not as a cluster.
But according to your instructions from beginning to before section 'Add a project', I think your were trying to achieve cluster computing. Do you have more detail instructions before adding a project? For example, I cannot find the purpose for installing mpi4py to run Rosetta@home as cluster. I believe there should be a certain command before going forward. I appreciate if you can help me.
Best regards,