Main Reference: https://docs.nersc.gov/jobs/examples/#job-arrays
"Job arrays provides a mechanism for submitting and managing collections of similar jobs quickly and easily." You will create a single script (e.g., job.sbatch
) which uses the variable SLURM_ARRAY_TASK_ID to point at the correct files:
#!/bin/bash
#SBATCH -q debug
#SBATCH -o job_array_test%j.out
#SBATCH -n 1
#SBATCH --time 00:02:00
#SBATCH -C haswell
#SBATCH --mail-type=BEGIN,ENd,FAIl
#SBATCH --mail-user=mr095415@ohio.edu
# run the array for array id = 1 to 100
# $> sbatch --array=1-100 job.sbatch
source /global/common/software/m3035/conda-activate.sh 3.7 # load packages
# this will assign 'SLURM_ARRAY_TASK_ID' to 'mockid' e.g., mockid=0001 or 0100
printf -v mockid "%04d" $SLURM_ARRAY_TASK_ID
export pks=/global/project/projectdirs/eboss/czhao/EZmock/QSO_v5/clustering/PK/
export input=${pks}PK_EZmock_eBOSS_QSO_NGC_v5_z0.8z2.2_${mockid}.dat
export output=${SCRATCH}/baofits/ezmocks/baofit_${mockid}.dat
srun -n 1 python jobarray_test.py --input $input --output $output
Our example python script (jobarray_test.py
) which takes an input and writes two lines into the output:
"""
A simple python script that reads the input file, and writes its shape
into the output file.
"""
import numpy as np
from time import time
from argparse import ArgumentParser
def main(inputFile, outputFile):
inputData = np.loadtxt(inputFile)
myfile = open(outputFile, 'w')
myfile.write("This is a Jobs array test function\n")
myfile.write(f"dimensions of the input file : {inputData.shape}\n")
myfile.close()
ap = ArgumentParser(description='Jobs Array Test')
ap.add_argument('--input')
ap.add_argument('--output')
ns = ap.parse_args()
t0 = time()
main(ns.input, ns.output)
print(f"Took {time()-t0} secs")
Finally, to submit the jobs array, you will need to execute $> sbatch --array=1-100 job.sbatch
. This is similar to creating 1-100 scripts, each of which SLURM_ARRY_TASK_ID has a distinct value, and running them one by one. You will save an enormous time and effort. Besides with this feature, you don't have to run multiple runs serially inside a single batch, which will reduce the time allocation and waiting time in the queue.