Skip to content

Instantly share code, notes, and snippets.

@juanmc2005
Last active April 26, 2022 13:06
Show Gist options
  • Save juanmc2005/e48265ae83fb4617d9566699ec8904e0 to your computer and use it in GitHub Desktop.
Save juanmc2005/e48265ae83fb4617d9566699ec8904e0 to your computer and use it in GitHub Desktop.
Template scripts to run sbatch jobs
#!/bin/bash
#SBATCH --job-name=<some_name>
#SBATCH --account=<some_account> # optional, count job hours from this account
#SBATCH --ntasks=1 # number of tasks (a single process here)
#SBATCH --gres=gpu:1 # number of GPUs (a single GPU here)
#SBATCH --cpus-per-task=4 # number of cores, mostly for data loader workers
#SBATCH --hint=nomultithread # optional, restrict to physical cores and not logical ones
#SBATCH --time=20:00:00 # maximum execution time (HH:MM:SS)
#SBATCH --output=logs_%A_%a.out # output file, %A is the id of the array job and %a is the number of the job in the array
#SBATCH --error=logs_%A_%a.out # error file (same as output is fine)
#SBATCH --array=0-99 # execute as a job array (end is counted in start-end)
# Read a list of hyper-parameter values (stored as an array called lrs/bs in this case)
IFS=$'\n' read -d '' -r -a lrs < learning_rates.lst
IFS=$'\n' read -d '' -r -a bs < batch_sizes.lst
args=()
for lr in "${lrs[@]}"
do
for b in "${bs[@]}"
do
args+=("--lr ${lr} --batch-size ${b}")
done
done
# Load conda environment
conda activate myenv
# Echo launched commands from here
set -x
# Launch the job
python train.py ${args[${SLURM_ARRAY_TASK_ID}]}
#!/bin/bash
#SBATCH --job-name=<some_name>
#SBATCH --account=<some_account> # optional, count job hours from this account
#SBATCH --ntasks=1 # number of tasks (a single process here)
#SBATCH --gres=gpu:1 # number of GPUs (a single GPU here)
#SBATCH --cpus-per-task=4 # number of cores, mostly for data loader workers
#SBATCH --hint=nomultithread # optional, restrict to physical cores and not logical ones
#SBATCH --time=20:00:00 # maximum execution time (HH:MM:SS)
#SBATCH --output=logs.out # output file
#SBATCH --error=logs.out # error file (same as output is fine)
# Load conda environment
conda activate myenv
# Echo launched commands from here
set -x
# Launch the job
python train.py --lr 0.01 --batch-size 32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment