Skip to content

Instantly share code, notes, and snippets.

@ctokheim
Last active July 6, 2024 18:08
Show Gist options
  • Save ctokheim/bf68b2c4b78e9851b469be3425470699 to your computer and use it in GitHub Desktop.
Save ctokheim/bf68b2c4b78e9851b469be3425470699 to your computer and use it in GitHub Desktop.
A cheat sheet for the SLURM job scheduler

SLURM

Slurm is a job scheduler for computer clusters. This document is based on this tutorial. A useful guide showing the relationships between SGE and SLURM is available here.

Gathering information

Two commands may be useful:

  • sinfo provides information about the resouces of the cluster.
  • squeue shows which jobs the computer clusters resources are allocated to.

sinfo will show "partitions", which are a set of compute nodes grouped logically. To show all nodes separately:

$ sinfo -N

And to further show the available CPU's and memory, the -l flag can be added.

$ sinfo -N -l

squeue shows the currently running job. The state of the job can be running "R" or pending "PD". The squeue command can also be used to see the job ID. User specific jobs can be displayed with the --user flag.

$ squeue --user=myusername

Jobs can be canceled by specifying the job ID with scancel <Job ID>. scancel also has a --user flag so all jobs by a user can be canceled.

The scontrol command also seems to be more informative on understanding the current job loads for various nodes:

$ scontrol show nodes

Creating a job

Job submissions distinguish between resource requests and job steps. Resource requests are the meta-information about the job like memory, etc. Job steps are what is actually executed.

Job resources can be specified in the header of a submission script. An example script (submit.sh) is below:

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

srun hostname
srun sleep 60

The job can then be submitted through sbatch:

$ sbatch submit.sh

When the job is submitted, the commands will run from where the job was submitted and include PATH information.

The sstat command can be used to track the resource usage of your jobs.

$ sstat -j <Job ID>

Parallelization

For slurm, a "task" is understood to be a "process": so multi-process program is composed of multiple tasks. While a multithreaded program is composed of only a single task, which uses several CPUs.

Tasks are requested with the --ntasks option, while CPUs for multi-threaded programs are done with the --cpus-per-task flag.

Array jobs

An example array job is below:

#!/bin/bash
#
#SBATCH --job-name=test_emb_arr
#SBATCH --output=res_emb_arr.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
#
#SBATCH --array=1-8

ARGS=(0.05 0.25 0.5 1 2 5 100)

srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}

Interactive jobs

The srun command can be used to obtain an interactive shell on the compute nodes.

$ srun --pty bash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment