Skip to content

Instantly share code, notes, and snippets.

@andmax
Last active October 30, 2023 14:26
Show Gist options
  • Save andmax/a4706a63a685288b279fdaf2751ee953 to your computer and use it in GitHub Desktop.
Save andmax/a4706a63a685288b279fdaf2751ee953 to your computer and use it in GitHub Desktop.
Tips to compile, install and run jobs using SLURM
1- The problem is to make SLURM (https://slurm.schedmd.com/) work properly
2- SLURM is very intricate and difficult to set up (here is SLURM 20.02)
3- It may be important to have NVIDIA library path added, e.g.:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-410
4- SLURM depends on MUNGE that can be installed using apt as:
sudo apt-get update
sudo apt-get install libmunge-dev libmunge2 munge
sudo apt-get clean
5- The same is not true for SLURM itself as its apt package is old
6- So it seems important to compile it from source code, wgetting aa:
wget https://download.schedmd.com/slurm/slurm-20.02.3.tar.bz2
tar jxvf slurm-20.02.3.tar.bz2
cd slurm-20.02.3
./configure --help &> ../config.help
sudo -E ./configure --with-hdf5=no --with-munge=/usr/lib/libmunge.so &> ../config.output
7- HDF5 may not be readily available for SLURM to compile against it
8- Configure defaults binaries to /usr/local/bin and configurations to /usr/local/etc
sudo -E make -j 39
sudo -E make install
9- After compiling and installing create proper folders and copy .conf files as
10- SLURM_CONF environment variable or -f argument to set where slurm.conf is
11- Do not work properly as they are not shared among all SLURM binaries:
sudo mkdir /var/spool/slurm /var/spool/slurm/d /var/spool/slurm/ctld
sudo mkdir /var/run/slurm /var/log/slurm
sudo cp slurm.conf /usr/local/etc/
sudo cp gres.conf /usr/local/etc/
12- Before starting SLURM itself, it is important to enable and start MUNGE:
sudo systemctl daemon-reload
sudo systemctl enable munge
sudo systemctl start munge
13- MUNGE will be used in slurm.conf as:
AuthType=auth/munge
CryptoType=crypto/munge
14- All outputs of the SLURM daemons can be redirected to /dev/null as
15- SLURM logs will be written in /var/log defined in slurm.conf as:
SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmdSpoolDir=/var/spool/slurm/d
StateSaveLocation=/var/spool/slurm/ctld
16- SLURM DB daemon can be disregarded (MySQL can also be tricky to set up)
17- Without SLURM DB (MySQL) it is not possible to run sreport,
18- That may be important in listing available GPUs and their usage via:
sreport -tminper cluster utilization --tres="gres/gpu"
19- SLURM accounting storage and job completion can be written in text files
20- Also in /var/log defined in slurm.conf as:
AccountingStorageType=accounting_storage/filetxt
AccountingStorageLoc=/var/log/slurm/accounting.txt
AccountingStoreJobComment=YES
JobCompType=jobcomp/filetxt
JobCompLoc=/var/log/slurm/job_completion.txt
21- SLURM user is tricky to use and configure so it is better to
22- Use root instead defined in slurm.conf as:
SlurmUser=root
SlurmdUser=root
23- SLURM run PIDs and PORTs
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm/slurmd.pid
SlurmdPort=6818
ProctrackType=proctrack/linuxproc
24- SLURM can track GPUs as resources by defining in slurm.conf as:
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU_Memory
GresTypes=gpu,mps,gpu_mem
25- The end of SLURM configuration is a list of nodes and partitions
26- A partition is simply a set of nodes
27- SLURM confuses the concept of threads and processes also cores and CPUs
28- So if setting ThreadsPerCore, CoresPerSocket, Socket, etc. via lscpu as:
cat nodes.txt | while read node; do echo -e "NodeName=$node RealMemory=$(expr $(grep MemTotal /proc/meminfo | awk '{print $2}') / 1024) Sockets=$(lscpu | grep Socket\(s\) | awk '{print $2}') CoresPerSocket=$(lscpu | grep Core\(s\) | awk '{print $4}') ThreadsPerCore=$(lscpu | grep Thread\(s\) | awk '{print $4}') Gres=gpu:tesla_k80:no_consume:1,gpu_mem:11441 State=UNKNOWN\n"; done >> slurm.conf
29- It may end up with the right number of CPUs but allocation will be messed up
30- One job consuming 1 CPU or Core will end up taking N CPUs, where N=ThreadsPerCore
31- To avoid these SLURM confusions, it may be better to just set Procs and avoid lscpu as:
cat nodes.txt | while read node; do echo -e "NodeName=$node RealMemory=$(expr $(grep MemTotal /proc/meminfo | awk '{print $2}') / 1024) Procs=$(nproc) Gres=gpu:tesla_k80:no_consume:1,gpu_mem:11441 State=UNKNOWN\n"; done >> slurm.conf
32- Also managing GPU resources in SLURM is a bit tricky,
33- Multiple jobs can use the same GPU by using "no_consume" option in Gres within NodeName
34- Together with gpu_mem as total of the GPU memory in MB in the same option
35- Also GresTypes (above) must specify mps (for multiple processes) and gpu_mem to control GPU memory
36- SLURM gres.conf needs also to include gpu_mem and its count as total GPU memory (below)
37- After defining each node in the NodeName lines as above,
38- Simply define 1 partition with all nodes via:
echo "PartitionName=all Nodes=$(cat nodes.txt | tr '\n' ',' | sed s/.$// -) Default=YES MaxTime=INFINITE State=UP" >> slurm.conf
39- SLURM gres.conf may be as simple as:
AutoDetect=nvml
Name=gpu_mem Count=11441
40- After the above set up, SLURM two daemons can be started:
nohup sudo slurmctld -D -vvvvvv &> /dev/null &
nohup sudo slurmd -D -vvvvvv &> /dev/null &
41- Check the /var/log files to make sure SLURM is working
42- Then to run SLURM jobs do:
srun --gres=<gpu_to_use> --mem-per-gpu=<gpu_mem> --output=<output_file> <exec_and_params>
43- After running a SLURM job it can be checked using its job id via:
scontrol show job <job_id>
44- Stopping SLURM daemons is not enough to finish all the jobs it has started,
45- So the job can be cancelled with SLURM running via:
scancel <job_id>
46- To se the queue of jobs do:
squeue
squeue -o %b
squeue -h -t R -O Gres
squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %23R %8C %23b"
47- To check information of all nodes do:
sinfo -o "%23N %10c %10m %20C %23G %10A"
scontrol -o show nodes
48- To check information of all jobs sudo must be used because of /var/log permission:
sudo sacct -X
sudo sacct -a -X --format=JobID,AllocCPUS,Reqgres
49- The following links may be useful in addition to SLURM manual:
https://ulhpc-tutorials.readthedocs.io/en/latest/scheduling/advanced/
https://slurm.schedmd.com/SLUG19/GPU_Scheduling_and_Cons_Tres.pdf
http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/3_GPUs_SLURM.pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment