Last active
October 30, 2023 14:26
-
-
Save andmax/a4706a63a685288b279fdaf2751ee953 to your computer and use it in GitHub Desktop.
Tips to compile, install and run jobs using SLURM
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1- The problem is to make SLURM (https://slurm.schedmd.com/) work properly | |
2- SLURM is very intricate and difficult to set up (here is SLURM 20.02) | |
3- It may be important to have NVIDIA library path added, e.g.: | |
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-410 | |
4- SLURM depends on MUNGE that can be installed using apt as: | |
sudo apt-get update | |
sudo apt-get install libmunge-dev libmunge2 munge | |
sudo apt-get clean | |
5- The same is not true for SLURM itself as its apt package is old | |
6- So it seems important to compile it from source code, wgetting aa: | |
wget https://download.schedmd.com/slurm/slurm-20.02.3.tar.bz2 | |
tar jxvf slurm-20.02.3.tar.bz2 | |
cd slurm-20.02.3 | |
./configure --help &> ../config.help | |
sudo -E ./configure --with-hdf5=no --with-munge=/usr/lib/libmunge.so &> ../config.output | |
7- HDF5 may not be readily available for SLURM to compile against it | |
8- Configure defaults binaries to /usr/local/bin and configurations to /usr/local/etc | |
sudo -E make -j 39 | |
sudo -E make install | |
9- After compiling and installing create proper folders and copy .conf files as | |
10- SLURM_CONF environment variable or -f argument to set where slurm.conf is | |
11- Do not work properly as they are not shared among all SLURM binaries: | |
sudo mkdir /var/spool/slurm /var/spool/slurm/d /var/spool/slurm/ctld | |
sudo mkdir /var/run/slurm /var/log/slurm | |
sudo cp slurm.conf /usr/local/etc/ | |
sudo cp gres.conf /usr/local/etc/ | |
12- Before starting SLURM itself, it is important to enable and start MUNGE: | |
sudo systemctl daemon-reload | |
sudo systemctl enable munge | |
sudo systemctl start munge | |
13- MUNGE will be used in slurm.conf as: | |
AuthType=auth/munge | |
CryptoType=crypto/munge | |
14- All outputs of the SLURM daemons can be redirected to /dev/null as | |
15- SLURM logs will be written in /var/log defined in slurm.conf as: | |
SlurmctldDebug=3 | |
SlurmctldLogFile=/var/log/slurm/slurmctld.log | |
SlurmdDebug=3 | |
SlurmdLogFile=/var/log/slurm/slurmd.log | |
SlurmdSpoolDir=/var/spool/slurm/d | |
StateSaveLocation=/var/spool/slurm/ctld | |
16- SLURM DB daemon can be disregarded (MySQL can also be tricky to set up) | |
17- Without SLURM DB (MySQL) it is not possible to run sreport, | |
18- That may be important in listing available GPUs and their usage via: | |
sreport -tminper cluster utilization --tres="gres/gpu" | |
19- SLURM accounting storage and job completion can be written in text files | |
20- Also in /var/log defined in slurm.conf as: | |
AccountingStorageType=accounting_storage/filetxt | |
AccountingStorageLoc=/var/log/slurm/accounting.txt | |
AccountingStoreJobComment=YES | |
JobCompType=jobcomp/filetxt | |
JobCompLoc=/var/log/slurm/job_completion.txt | |
21- SLURM user is tricky to use and configure so it is better to | |
22- Use root instead defined in slurm.conf as: | |
SlurmUser=root | |
SlurmdUser=root | |
23- SLURM run PIDs and PORTs | |
SlurmctldPidFile=/var/run/slurm/slurmctld.pid | |
SlurmctldPort=6817 | |
SlurmdPidFile=/var/run/slurm/slurmd.pid | |
SlurmdPort=6818 | |
ProctrackType=proctrack/linuxproc | |
24- SLURM can track GPUs as resources by defining in slurm.conf as: | |
SelectType=select/cons_tres | |
SelectTypeParameters=CR_CPU_Memory | |
GresTypes=gpu,mps,gpu_mem | |
25- The end of SLURM configuration is a list of nodes and partitions | |
26- A partition is simply a set of nodes | |
27- SLURM confuses the concept of threads and processes also cores and CPUs | |
28- So if setting ThreadsPerCore, CoresPerSocket, Socket, etc. via lscpu as: | |
cat nodes.txt | while read node; do echo -e "NodeName=$node RealMemory=$(expr $(grep MemTotal /proc/meminfo | awk '{print $2}') / 1024) Sockets=$(lscpu | grep Socket\(s\) | awk '{print $2}') CoresPerSocket=$(lscpu | grep Core\(s\) | awk '{print $4}') ThreadsPerCore=$(lscpu | grep Thread\(s\) | awk '{print $4}') Gres=gpu:tesla_k80:no_consume:1,gpu_mem:11441 State=UNKNOWN\n"; done >> slurm.conf | |
29- It may end up with the right number of CPUs but allocation will be messed up | |
30- One job consuming 1 CPU or Core will end up taking N CPUs, where N=ThreadsPerCore | |
31- To avoid these SLURM confusions, it may be better to just set Procs and avoid lscpu as: | |
cat nodes.txt | while read node; do echo -e "NodeName=$node RealMemory=$(expr $(grep MemTotal /proc/meminfo | awk '{print $2}') / 1024) Procs=$(nproc) Gres=gpu:tesla_k80:no_consume:1,gpu_mem:11441 State=UNKNOWN\n"; done >> slurm.conf | |
32- Also managing GPU resources in SLURM is a bit tricky, | |
33- Multiple jobs can use the same GPU by using "no_consume" option in Gres within NodeName | |
34- Together with gpu_mem as total of the GPU memory in MB in the same option | |
35- Also GresTypes (above) must specify mps (for multiple processes) and gpu_mem to control GPU memory | |
36- SLURM gres.conf needs also to include gpu_mem and its count as total GPU memory (below) | |
37- After defining each node in the NodeName lines as above, | |
38- Simply define 1 partition with all nodes via: | |
echo "PartitionName=all Nodes=$(cat nodes.txt | tr '\n' ',' | sed s/.$// -) Default=YES MaxTime=INFINITE State=UP" >> slurm.conf | |
39- SLURM gres.conf may be as simple as: | |
AutoDetect=nvml | |
Name=gpu_mem Count=11441 | |
40- After the above set up, SLURM two daemons can be started: | |
nohup sudo slurmctld -D -vvvvvv &> /dev/null & | |
nohup sudo slurmd -D -vvvvvv &> /dev/null & | |
41- Check the /var/log files to make sure SLURM is working | |
42- Then to run SLURM jobs do: | |
srun --gres=<gpu_to_use> --mem-per-gpu=<gpu_mem> --output=<output_file> <exec_and_params> | |
43- After running a SLURM job it can be checked using its job id via: | |
scontrol show job <job_id> | |
44- Stopping SLURM daemons is not enough to finish all the jobs it has started, | |
45- So the job can be cancelled with SLURM running via: | |
scancel <job_id> | |
46- To se the queue of jobs do: | |
squeue | |
squeue -o %b | |
squeue -h -t R -O Gres | |
squeue -o "%.18i %.9P %.8j %.8u %.2t %.10M %.6D %23R %8C %23b" | |
47- To check information of all nodes do: | |
sinfo -o "%23N %10c %10m %20C %23G %10A" | |
scontrol -o show nodes | |
48- To check information of all jobs sudo must be used because of /var/log permission: | |
sudo sacct -X | |
sudo sacct -a -X --format=JobID,AllocCPUS,Reqgres | |
49- The following links may be useful in addition to SLURM manual: | |
https://ulhpc-tutorials.readthedocs.io/en/latest/scheduling/advanced/ | |
https://slurm.schedmd.com/SLUG19/GPU_Scheduling_and_Cons_Tres.pdf | |
http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/3_GPUs_SLURM.pdf |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment