Skip to content

Instantly share code, notes, and snippets.

@coin8086
Last active November 3, 2025 08:49
Show Gist options
  • Save coin8086/ea3d952025a3c8d8dda75210fb33a01e to your computer and use it in GitHub Desktop.
Save coin8086/ea3d952025a3c8d8dda75210fb33a01e to your computer and use it in GitHub Desktop.
Install Slurm on Ubuntu 24.04

Install Slurm on Ubuntu 24.04

A mini cluster of 1 head node and 2 compute node.

Prerequisites

  • Make sure the clocks, users and groups (UIDs and GIDs) are synchronized across the cluster.

    There must be a uniform user and group name space (including UIDs and GIDs) across the cluster. It is not necessary to permit user logins to the control hosts (SlurmctldHost), but the users and groups must be resolvable on those hosts.

  • Each node in a cluster must be able to resolve other nodes in the cluster by their host names.

    On Ubuntu 24, host name can be set by sudo hostnamectl set-hostname your-host-name. On Azure, the vnet's DNS will pick up the hostname of each VM (a VM reboot may be required) and make it resolvable across the vnet. So you don't need to sync /etc/hosts on each node in the same vnet.

Steps

  1. Install munge on each node
  2. Install slurmctld on each head node
  3. Install slurmd on each compute node
  4. Configure slurm on each node
  5. Optionally install slurmdbd for accounting on head node (or a dedicated node)
  6. Verification

NOTE

Before any apt install, do apt update

Install munge

Install munge on each node by

sudo apt install munge

The /etc/munge/munge.key must be the same on each node. So generate a key file (by mungekey) or use a key file from one node and sync it to all nodes in the cluster.

Verify the installation by

munge -n -t 10 | ssh somehost unmunge

and

ssh somehost munge -n -t 10 | unmunge

Install slurmctld and slurmd

Install slurmctld on each head node by

sudo apt install slurmctld

Install slurmd on each compute node by

sudo apt install slurmd

A system user slurm will be created on either installation. This is the user for SlurmUser in slurm's configuration.

Configure Slurm

The Slurm configuration file is /etc/slurm/slurm.conf. The file is created manually and must be the same on each node.

GUI tools in slurmctld are provided to help generate the file

  • /usr/share/doc/slurmctld/slurm-wlm-configurator.easy.html
  • /usr/share/doc/slurmctld/slurm-wlm-configurator.html

Just download one to local computer and open it in a web browser.

Example configuration files are provided under /usr/share/doc/slurmctld/examples.

Configure Job Completion Recording

By default job completion is not recorded for accounting. It can be enabled with a simple file storage by the following settings in slurm.conf

JobAcctGatherType=jobacct_gather/cgroup
JobCompType=jobcomp/filetxt
JobCompLoc=/var/log/slurm/job_completions

Then job completion records are written in /var/log/slurm/job_completions, one line per job, like

JobId=3 UserId=robert(1000) GroupId=robert(1000) Name=test JobState=COMPLETED Partition=dev TimeLimit=525600 StartTime=2025-08-21T08:49:46 EndTime=2025-08-21T08:49:56 NodeList=computenode-01 NodeCnt=1 ProcCnt=1 WorkDir=/home/robert ReservationName= Tres=cpu=1,mem=1M,node=1,billing=1 Account= QOS= WcKey= Cluster=unknown SubmitTime=2025-08-21T08:49:46 EligibleTime=2025-08-21T08:49:46 DerivedExitCode=0:0 ExitCode=0:0

Install slurmdbd

Accounting commands like sacct and sreport, etc. depend on slurmdbd, which in turn depends on MySQL or MariaDB. We're going to install slurmdbd and MariaDB on the head node.

Install and Configure MariaDB

On head node, install MariaDB and secure it.

sudo apt install mariadb-server mariadb-client
sudo mariadb-secure-installation

NOTE

Set a password for database root user.

Then configure the database.

create user 'slurm'@'localhost' identified by 'YourPassword';
grant all on *.* TO 'slurm'@'localhost';

Install and Configure slurmdbd

On head node, install slurmdbd.

sudo apt install slurmdbd

Then configure slurmdbd by /etc/slurm/slurmdbd.conf. The file is created manually and should be accessible only by slurmdbd since it contains database credential.

There're example configuration files under /usr/share/doc/slurmdbd/examples.

For Slurm accounting, not only the slurmdbd must be configured, but also the slurmctld. A minimal configuration of slurmctld (slurm.conf) for accounting is like

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=headnode

There're more settings about accounting, with names like AccountingStorage*.

Restart slurmctld and slurmd when slurm.conf changes.

Verification

Verify Slurm installation on the head node.

Show Slurm nodes and partitions by

sinfo

Run a test job by

sbatch ./test.sh

Show job accounting information by

sacct

References

# An example slurm.conf with job completion recording and accounting.
#
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=slurm-dev
SlurmctldHost=headnode
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/lib/slurm/slurmctld
SwitchType=switch/none
TaskPlugin=task/affinity,task/cgroup
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_tres
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=headnode
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
JobCompType=jobcomp/filetxt
JobCompLoc=/var/log/slurm/job_completions
#SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
#SlurmdDebug=info
SlurmdLogFile=/var/log/slurm/slurmd.log
#
#
# COMPUTE NODES
NodeName=computenode-0[1-2] CPUs=2 State=UNKNOWN
PartitionName=dev Nodes=ALL Default=YES MaxTime=INFINITE State=UP
###############################################################################
# Sample configuration file for slurmdbd
###############################################################################
#
# slurmdb.conf is an ASCII file which describes Slurm
# Database Daemon (SlurmDBD) configuration information.
# The contents of the file are case insensitive except for the names of
# nodes and files. Any text following a "#" in the configuration file is
# treated as a comment through the end of that line. The size of each
# line in the file is limited to 1024 characters. Changes to the
# configuration file take effect upon restart of SlurmDbd or daemon
# receipt of the SIGHUP signal unless otherwise noted.
#
# This file should be only on the computer where SlurmDBD executes and
# should only be readable by the user which executes SlurmDBD (e.g.
# "slurm"). This file should be protected from unauthorized access since
# it contains a database password.
###############################################################################
# AuthType
# Define the authentication method for communications between Slurm
# components. Acceptable values at present include "auth/none" and
# "auth/munge". The default value is "auth/munge". Do not use
# "auth/none" if you desire any security. "auth/munge" indicates that
# LLNL's MUNGE system is to be used (this is the supported
# authentication mechanism for Slurm; see "https://dun.github.io/munge/"
# for more information). SlurmDBD must be terminated prior to changing
# the value of AuthType and later restarted.
AuthType=auth/munge
# DbdHost
# The short, or long, name of the machine where the Slurm Database Daemon is
# executed (i.e. the name returned by the command "hostname -s"). This value
# must be specified.
DbdHost=localhost
# DbdPort
# The port number that the Slurm Database Daemon (slurmdbd) listens to for
# work. The default value is SLURMDBD_PORT as established at system build time.
# If none is explicitly specified, it will be set to 6819. This value must be
# equal to the AccountingStoragePort parameter in the slurm.conf file.
# DebugLevel
# The level of detail to provide the Slurm Database Daemon's logs. The default
# value is info.
#
# quiet Log nothing
# fatal Log only fatal errors
# error Log only errors
# info Log errors and general informational messages
# verbose Log errors and verbose informational messages
# debug Log errors and verbose informational messages and debugging mes‐
# sages
# debug2 Log errors and verbose informational messages and more debugging
# messages
# debug3 Log errors and verbose informational messages and even more debug‐
# ging messages
# debug4 Log errors and verbose informational messages and even more debug‐
# ging messages
# debug5 Log errors and verbose informational messages and even more debug‐
# ging messages
DebugLevel=info
# MessageTimeout
# Time permitted for a round-trip communication to complete in seconds. Default
# value is 10 seconds.
# StorageHost
# Define the name of the host the database is running where we are going to
# store the data. Ideally this should be the host on which slurmdbd executes.
StorageHost=localhost
# StorageLoc
# Specify the name of the database as the location where accounting records are
# written. Defaults to "slurm_acct_db".
StorageLoc=slurm_acct_db
# StoragePass
# Define the password used to gain access to the database to store the job
# accounting data. The '#' character is not permitted in a password.
StoragePass=shazaam
# StoragePort
# The port number that the Slurm Database Daemon (slurmdbd) communicates with
# the database.
# StorageType
# Define the accounting storage mechanism type. Acceptable values at present
# include "accounting_storage/mysql". The value "accounting_storage/mysql"
# indicates that accounting records should be written to a MySQL or MariaDB
# database specified by the StorageLoc parameter. This value must be speci‐
# fied.
StorageType=accounting_storage/mysql
# StorageUser
# Define the name of the user we are going to connect to the database with to
# store the job accounting data.
StorageUser=slurm
################################################################################
# WARNING!
# If you are running slurmdbd on Debian or derived system please leave
# the above values untouched
################################################################################
# LogFile
# Fully qualified pathname of a file into which the Slurm Database
# Daemon’s logs are written. The default value is none (performs
# logging via syslog).
LogFile=/var/log/slurm/slurmdbd.log
# PidFile
# Fully qualified pathname of a file into which the Slurm Database Daemon's
# logs are written. The default value is none (performs logging via syslog).
# See the section LOGGING in the slurm.conf man page if a pathname is speci‐
# fied.
PidFile=/run/slurmdbd.pid
# SlurmUser
# The name of the user that the slurmdbd daemon executes as. This user must
# exist on the machine executing the Slurm Database Daemon and have the same
# user ID as the hosts on which slurmdbd execute. For security purposes, a
# user other than "root" is recommended. The default value is "root".
SlurmUser=slurm
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=test_%j.out
#SBATCH --error=test_%j.err
date -Is && hostname
echo "Working..."
sleep 30
echo "Done"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment