Skip to content

Instantly share code, notes, and snippets.

@arifsisman
Last active April 29, 2024 11:51
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save arifsisman/bcb6409fcb6150099cb8ddd044e28060 to your computer and use it in GitHub Desktop.
Save arifsisman/bcb6409fcb6150099cb8ddd044e28060 to your computer and use it in GitHub Desktop.
Beowulf Cluster Setup

Beowulf Cluster Setup

1. Introduction

A Beowulf cluster is a computer cluster of similar computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from the inexpensive personal computer hardware.

This project aims to create a Beowulf Cluster which computes bigram analysis on a news data set. 2 virtual worker nodes and 1 virtual master node used for the cluster, all operating with Ubuntu Server 16.04.

  • Each computer in the cluster configured for

    • Message Passing Interface
    • Network File System
    • Hydra Process Manager
  • Raspberry Pi is used to create an Access Point to connect different computers on the same subnet.

2. Cluster Configuration Steps

2.1. Edit /etc/hosts file for all nodes

Following configuration need to be done in all ​nodes.

127.0.0.1       localhost
192.168.42.50   master
192.168.42.51   node1
192.168.42.52   node2

2.2. Create a user for running MPI jobs

Following configuration need to be done in all nodes.

$ sudo adduser mpiuser --uid 999

Every computer need an MPI user with same ​UUID, because;

  • MPICH uses SSH for communication between nodes. Passwordless login is easier with same usernames and no needed to track authorized keys, all keys shared with NFS shared directory.
  • NFS directory is accessible for MPI users only. All MPI users need to have the same UUID for NFS communication.

2.3. Install and setup the Network File System

Network File System (NFS) enables you to mount part of a remote file system so MPI users can access it. Files and programs used for computing bigram analysis needed to be in every node.

To install NFS, run the following command on the master node:

master:~$ sudo apt-get install nfs-kernel-server

All compute nodes needs to be install the following package:

$ sudo apt-get install nfs-common

I have used NFS to share /home/mpiuser ​directory with compute nodes. This directory must be owned by the mpiuser so that all MPI users can access this directory. But since I created this home directory with the adduser command earlier, it is already owned by the mpiuser.

For checking if the needed directory owned by ​mpiuser:

master:~$ ls -l /home/ | grep mpiuser
drwxr-xr-x 7 mpiuser mpiuser 4096 May 30 05:55 mpiuser

If you want to share another directory rather than mpiuser’s home directory, you must change its ownership with following command:

master:~$ sudo chown mpiuser:mpiuser /path/to/shared/dir

I share the /home/mpiuser​ directory of the master node with all other nodes. For this the file /etc/exports​ on the master node need to be edited. Add the following line to /etc/exports​

/home/mpiuser *(rw,sync,no_subtree_check)

After the first install, you may need to restart the NFS daemon

master:~$ sudo service nfs-kernel-server restart

For testing NFS following command may be useful

$ showmount -e master

In this case, this should print the path /home/mpiuser

I have a disabled firewall but, if you want to access from another subnet with enabled firewall, you need to allow connections with command

master:~$ sudo ufw allow from 192.168.42.0/24

You should then be able to mount ​master:/home/mpiuser​ on the compute nodes. Run the following commands to test this,

node1:~$ sudo mount master:/home/mpiuser /home/mpiuser
node2:~$ sudo mount master:/home/mpiuser /home/mpiuser

If this command hangs or fails, you need to check your configuration steps. This configuration can be tested with creating a new file under /home/mpiuser

If mounting the NFS shared directory works, I can make it so that the master:/home/mpiuser directory is automatically mounted when the compute nodes are booted. For this, the file /etc/fstab needs to be edited. Add the following line to the fstab file of all ​compute nodes

master:/home/mpiuser /home/mpiuser nfs

2.4. Install SSH

SSH needs to be installed if even exists because I have used Ubuntu Server and sshd service must be accessible from other nodes. First, install the SSH server on all nodes:

$ sudo apt-get install ssh

In any ​node, you need to generate SSH keys with ​mpiuser​,​ ​generated keys automatically shared with other nodes with NFS.

$ su mpiuser
$ ssh-keygen

When asked for a passphrase, leave it empty (hence passwordless SSH). Run the following commands on the master node as user mpiuser

mpiuser@master:~$ ssh-copy-id localhost

If this configuration is done correctly, you need to be able to access other nodes with SSH

mpiuser@master:~$ ssh node1
mpiuser@node1:~$ echo $HOSTNAME
node1

2.5. Setup Hydra Process Manager

Hydra is needed to manage processes between compute nodes. The process manager is included with the MPICH package, so start by installing MPICH on all nodes with

$ sudo apt-get install mpich

To set up Hydra, I need to create one file on the master node. This file contains all the hostnames of the compute nodes

mpiuser@master:~$ cd ~
mpiuser@master:~$ touch hosts

To be able to send out jobs to the other nodes in the network, add the hostnames of all compute nodes to the hosts file, for example

node1
node2

You may choose to include master in this file, which would mean that the master node would also act as a compute node.

3. Access Point Configuration Steps

I have used a Raspberry Pi as an Access Point with ethernet interface because I need to connect any other computers to virtual machines in the cluster. Computers connected to Raspberry (with IP 192.168.42.x), can be joined to Beowulf Cluster, they all connected to the same subnetwork.

  • Edit /etc/network/interfaces
auto eth
iface eth0 inet static
address 192.168.42.1
netmask 255.255.255.0
network 192.168.42.0
broadcast 192.168.42.255
gateway 192.168.42.1
dns-nameservers 192.168.42.1 8.8.8.8
  • Edit /etc/resolv.conf
nameserver 192.168.42.1
nameserver 8.8.8.8
  • Restart Networking Service
$ sudo /etc/init.d/networking restart
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment