Use apt to install the necessary packages:
sudo apt install -y slurm-wlm slurm-wlm-doc
Load file:///usr/share/doc/slurm-wlm/html/configurator.html in a browser (or file://wsl%24/Ubuntu/usr/share/doc/slurm-wlm/html/configurator.html on WSL2), and:
- Set your machine's hostname in
SlurmctldHost
andNodeName
. - Set
CPUs
as appropriate, and optionallySockets
,CoresPerSocket
, andThreadsPerCore
. Use commandlscpu
to find what you have. - Set
RealMemory
to the number of megabytes you want to allocate to Slurm jobs, - Set
StateSaveLocation
to/var/spool/slurm-llnl
. - Set
ProctrackType
tolinuxproc
because processes are less likely to escape Slurm control on a single machine config. - Make sure
SelectType
is set toCons_res
, and setSelectTypeParameters
toCR_Core_Memory
. - Set
JobAcctGatherType
toLinux
to gather resource use per job, and setAccountingStorageType
toFileTxt
.
Hit Submit
, and save the resulting text into /etc/slurm-llnl/slurm.conf
i.e. the configuration file referred to in /lib/systemd/system/slurmctld.service
and /lib/systemd/system/slurmd.service
.
Load /etc/slurm-llnl/slurm.conf
in a text editor, uncomment DefMemPerCPU
, and set it to 8192
or whatever number of megabytes you want each job to request if not explicitly requested using --mem
during job submission. Read the docs and edit other defaults as you see fit.
Create /var/spool/slurm-llnl
and /var/log/slurm_jobacct.log
, then set ownership appropriately:
sudo mkdir -p /var/spool/slurm-llnl
sudo touch /var/log/slurm_jobacct.log
sudo chown slurm:slurm /var/spool/slurm-llnl /var/log/slurm_jobacct.log
Install mailutils
so that Slurm won't complain about /bin/mail
missing:
sudo apt install -y mailutils
Make sure munge is installed and running, and a munge.key
was created with user-only read-only permissions, owned by munge:munge
:
sudo service munge start
sudo ls -l /etc/munge/munge.key
Start services slurmctld
and slurmd
:
sudo service slurmd start
sudo service slurmctld start
Hi,
Thanks this has worked successfully but when i check status like,
sandeep@sandeep-VirtualBox:~$ systemctl status slurmd.service
● slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2022-04-30 22:50:47 IST; 9min ago
Docs: man:slurmd(8)
Process: 12931 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 12933 (slurmd)
Tasks: 1
Memory: 4.1M
CGroup: /system.slice/slurmd.service
└─12933 /usr/sbin/slurmd
Apr 30 22:50:47 sandeep-VirtualBox systemd[1]: Starting Slurm node daemon...
Apr 30 22:50:47 sandeep-VirtualBox slurmd-sandeep-VirtualBox[12931]: Node reconfigured socket/core boundaries SocketsPerBoard=4:1(hw) CoresPe>
Apr 30 22:50:47 sandeep-VirtualBox slurmd-sandeep-VirtualBox[12931]: Message aggregation disabled
Apr 30 22:50:47 sandeep-VirtualBox slurmd-sandeep-VirtualBox[12931]: CPU frequency setting not configured for this node
Apr 30 22:50:47 sandeep-VirtualBox systemd[1]: slurmd.service: Can't open PID file /run/slurmd.pid (yet?) after start: Operation not permitted
Apr 30 22:50:47 sandeep-VirtualBox slurmd-sandeep-VirtualBox[12933]: slurmd version 19.05.5 started
Apr 30 22:50:47 sandeep-VirtualBox slurmd-sandeep-VirtualBox[12933]: slurmd started on Sat, 30 Apr 2022 22:50:47 +0530
Apr 30 22:50:47 sandeep-VirtualBox systemd[1]: Started Slurm node daemon.
Apr 30 22:50:47 sandeep-VirtualBox slurmd-sandeep-VirtualBox[12933]: CPUs=4 Boards=1 Sockets=1 Cores=4 Threads=1 Memory=7951 TmpDisk=82909 Up>
Apr 30 22:50:56 sandeep-VirtualBox slurmd-sandeep-VirtualBox[12933]: error: Unable to register: Unable to contact slurm controller (connect f>
lines 1-21/21 (END)
last i got cant open pid file error and unable to contact slurm controller. Can you help what to do for this?