Skip to content

Instantly share code, notes, and snippets.

@georgegu1997
Last active February 2, 2024 01:17
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save georgegu1997/4f30549c05a7e5e5ae10d32bc7c4519d to your computer and use it in GitHub Desktop.
Save georgegu1997/4f30549c05a7e5e5ae10d32bc7c4519d to your computer and use it in GitHub Desktop.
Useful slurm commands and tricks

Cancel all my jobs

scancel -u <USERNAME>

Set time limit for one job through env variables

Precedence: command line options > environment variables > options set in a batch script

Reference

export SBATCH_TIMELIMIT=05:00:00
sbatch <bash filename>

Attach to the compute node of one running job

Useful to check out the status of the machine

srun --pty --overlap --jobid $JOBID /bin/bash

Pass all commandline arguments into train.py

In run.bash, write

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.out
#SBATCH --mem=32G
#SBATCH --cpus-per-task=8

$*

Then we can use following to run python using any parameters we want. For example

sbatch ./run.bash python --dataset cifar10 --model resnet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment