Skip to content

Instantly share code, notes, and snippets.

@mehdirezaie
Last active August 3, 2020 16:29
Show Gist options
  • Save mehdirezaie/5f3d60bfb2c5e374f8553a9cb4fa0d1d to your computer and use it in GitHub Desktop.
Save mehdirezaie/5f3d60bfb2c5e374f8553a9cb4fa0d1d to your computer and use it in GitHub Desktop.
Jobs Array

Jobs Array

Main Reference: https://docs.nersc.gov/jobs/examples/#job-arrays
"Job arrays provides a mechanism for submitting and managing collections of similar jobs quickly and easily." You will create a single script (e.g., job.sbatch) which uses the variable SLURM_ARRAY_TASK_ID to point at the correct files:

#!/bin/bash
#SBATCH -q debug
#SBATCH -o job_array_test%j.out
#SBATCH -n 1
#SBATCH --time 00:02:00
#SBATCH -C haswell
#SBATCH --mail-type=BEGIN,ENd,FAIl
#SBATCH --mail-user=mr095415@ohio.edu

# run the array for array id = 1 to 100
# $> sbatch --array=1-100 job.sbatch

source /global/common/software/m3035/conda-activate.sh 3.7 # load packages

# this will assign 'SLURM_ARRAY_TASK_ID' to 'mockid' e.g., mockid=0001 or 0100
printf -v mockid "%04d" $SLURM_ARRAY_TASK_ID

export pks=/global/project/projectdirs/eboss/czhao/EZmock/QSO_v5/clustering/PK/
export input=${pks}PK_EZmock_eBOSS_QSO_NGC_v5_z0.8z2.2_${mockid}.dat
export output=${SCRATCH}/baofits/ezmocks/baofit_${mockid}.dat

srun -n 1 python jobarray_test.py --input $input --output $output

Our example python script (jobarray_test.py) which takes an input and writes two lines into the output:

"""
    A simple python script that reads the input file, and writes its shape
    into the output file.
    
    
"""
import numpy as np
from time import time
from argparse import ArgumentParser

def main(inputFile, outputFile):
    inputData = np.loadtxt(inputFile)
    myfile    = open(outputFile, 'w')
    myfile.write("This is a Jobs array test function\n")
    myfile.write(f"dimensions of the input file : {inputData.shape}\n")
    myfile.close()

ap = ArgumentParser(description='Jobs Array Test')
ap.add_argument('--input')
ap.add_argument('--output')
ns = ap.parse_args()

t0 = time()
main(ns.input, ns.output)
print(f"Took {time()-t0} secs")

Finally, to submit the jobs array, you will need to execute $> sbatch --array=1-100 job.sbatch. This is similar to creating 1-100 scripts, each of which SLURM_ARRY_TASK_ID has a distinct value, and running them one by one. You will save an enormous time and effort. Besides with this feature, you don't have to run multiple runs serially inside a single batch, which will reduce the time allocation and waiting time in the queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment