Use apt to install the necessary packages:
sudo apt install -y slurm-wlm slurm-wlm-doc
Load file:///usr/share/doc/slurm-wlm/html/configurator.html in a browser (or file://wsl%24/Ubuntu/usr/share/doc/slurm-wlm/html/configurator.html on WSL2), and:
- Set your machine's hostname in
SlurmctldHost
andNodeName
. - Set
CPUs
as appropriate, and optionallySockets
,CoresPerSocket
, andThreadsPerCore
. Use commandlscpu
to find what you have. - Set
RealMemory
to the number of megabytes you want to allocate to Slurm jobs, - Set
StateSaveLocation
to/var/spool/slurm-llnl
. - Set
ProctrackType
tolinuxproc
because processes are less likely to escape Slurm control on a single machine config. - Make sure
SelectType
is set toCons_res
, and setSelectTypeParameters
toCR_Core_Memory
. - Set
JobAcctGatherType
toLinux
to gather resource use per job, and setAccountingStorageType
toFileTxt
.
Hit Submit
, and save the resulting text into /etc/slurm-llnl/slurm.conf
i.e. the configuration file referred to in /lib/systemd/system/slurmctld.service
and /lib/systemd/system/slurmd.service
.
Load /etc/slurm-llnl/slurm.conf
in a text editor, uncomment DefMemPerCPU
, and set it to 8192
or whatever number of megabytes you want each job to request if not explicitly requested using --mem
during job submission. Read the docs and edit other defaults as you see fit.
Create /var/spool/slurm-llnl
and /var/log/slurm_jobacct.log
, then set ownership appropriately:
sudo mkdir -p /var/spool/slurm-llnl
sudo touch /var/log/slurm_jobacct.log
sudo chown slurm:slurm /var/spool/slurm-llnl /var/log/slurm_jobacct.log
Install mailutils
so that Slurm won't complain about /bin/mail
missing:
sudo apt install -y mailutils
Make sure munge is installed and running, and a munge.key
was created with user-only read-only permissions, owned by munge:munge
:
sudo service munge start
sudo ls -l /etc/munge/munge.key
Start services slurmctld
and slurmd
:
sudo service slurmd start
sudo service slurmctld start
Getting this error when "apt install munge" getting error "Errors were encountered while processing:
postfix
E: Sub-process /usr/bin/dpkg returned an error code (1)"
and checking slurmd.service
root@:/etc/slurm-llnl# sudo apt install munge
Reading package lists... Done
Building dependency tree
Reading state information... Done
munge is already the newest version (0.5.13-2build1).
munge set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 52 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] Y
Setting up postfix (3.4.13-0ubuntu1.2) ...
Postfix (main.cf) configuration was not changed. If you need to make changes,
edit /etc/postfix/main.cf (and others) as needed. To view Postfix
configuration values, see postconf(1).
After modifying main.cf, be sure to run 'systemctl reload postfix'.
Running newaliases
newaliases: warning: valid_hostname: misplaced hyphen: gpunode1-wlp0s20f3.--
newaliases: fatal: file /etc/postfix/main.cf: parameter myhostname: bad parameter value: gpunode1-wlp0s20f3.--
dpkg: error processing package postfix (--configure):
installed postfix package post-installation script subprocess returned error exit status 75
Processing triggers for libc-bin (2.31-0ubuntu9.2) ...
Errors were encountered while processing:
postfix
E: Sub-process /usr/bin/dpkg returned an error code (1)
root@gpunode1:/etc/slurm-llnl# systemctl status slurmd.service
● slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-10-01 11:12:34 IST; 3min 49s ago
Docs: man:slurmd(8)
Main PID: 26727 (slurmd)
Tasks: 2
Memory: 2.9M
CGroup: /system.slice/slurmd.service
└─26727 /usr/sbin/slurmd
Oct 01 11:16:09 gpunode1 slurmd-gpunode1[26727]: error: Unable to register: Resource temporarily unavailable
Oct 01 11:16:10 gpunode1 slurmd-gpunode1[26727]: error: Unable to resolve "linuxK": Host name lookup failure
Oct 01 11:16:10 gpunode1 slurmd-gpunode1[26727]: error: Unable to establish control machine address
Oct 01 11:16:10 gpunode1 slurmd-gpunode1[26727]: error: Unable to register: Resource temporarily unavailable
Oct 01 11:16:12 gpunode1 slurmd-gpunode1[26727]: error: Unable to resolve "linuxK": Host name lookup failure
Oct 01 11:16:12 gpunode1 slurmd-gpunode1[26727]: error: Unable to establish control machine address
Oct 01 11:16:12 gpunode1 slurmd-gpunode1[26727]: error: Unable to register: Resource temporarily unavailable
Oct 01 11:16:13 gpunode1 slurmd-gpunode1[26727]: error: Unable to resolve "linuxK": Host name lookup failure
Oct 01 11:16:13 gpunode1 slurmd-gpunode1[26727]: error: Unable to establish control machine address
Oct 01 11:16:13 gpunode1 slurmd-gpunode1[26727]: error: Unable to register: Resource temporarily unavailable
Any suggestion how to resolve it?