Slurm is a job scheduler for computer clusters. This document is based on this tutorial. A useful guide showing the relationships between SGE and SLURM is available here.
Two commands may be useful:
sinfo
provides information about the resouces of the cluster.squeue
shows which jobs the computer clusters resources are allocated to.
sinfo
will show "partitions", which are a set of compute nodes grouped logically. To show all nodes separately:
$ sinfo -N
And to further show the available CPU's and memory, the -l
flag can be added.
$ sinfo -N -l
squeue
shows the currently running job. The state of the job can be running "R" or pending "PD". The squeue command can also be used to see the job ID.
User specific jobs can be displayed with the --user
flag.
$ squeue --user=myusername
Jobs can be canceled by specifying the job ID with scancel <Job ID>
. scancel
also has a --user
flag so all jobs by a user can be canceled.
The scontrol
command also seems to be more informative on understanding the current job loads for various nodes:
$ scontrol show nodes
Job submissions distinguish between resource requests and job steps. Resource requests are the meta-information about the job like memory, etc. Job steps are what is actually executed.
Job resources can be specified in the header of a submission script. An example script (submit.sh) is below:
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
srun hostname
srun sleep 60
The job can then be submitted through sbatch:
$ sbatch submit.sh
When the job is submitted, the commands will run from where the job was submitted and include PATH information.
The sstat
command can be used to track the resource usage of your jobs.
$ sstat -j <Job ID>
For slurm, a "task" is understood to be a "process": so multi-process program is composed of multiple tasks. While a multithreaded program is composed of only a single task, which uses several CPUs.
Tasks are requested with the --ntasks
option, while CPUs for multi-threaded programs are done with the --cpus-per-task
flag.
An example array job is below:
#!/bin/bash
#
#SBATCH --job-name=test_emb_arr
#SBATCH --output=res_emb_arr.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
#
#SBATCH --array=1-8
ARGS=(0.05 0.25 0.5 1 2 5 100)
srun ./my_program.exe ${ARGS[$SLURM_ARRAY_TASK_ID]}
The srun
command can be used to obtain an interactive shell on the compute nodes.
$ srun --pty bash