Skip to content

Instantly share code, notes, and snippets.

@hmoral
Last active November 23, 2016 13:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hmoral/ba959770474719eb52df73cb7e219c6e to your computer and use it in GitHub Desktop.
Save hmoral/ba959770474719eb52df73cb7e219c6e to your computer and use it in GitHub Desktop.
Small tutorial for submitting array jobs to albiroix

Problem: launch multiple jobs at the same time

Sometimes you have to submitt multiple iterations of the same task with concises difference between paramters for each iterations, e.g. MCMC chains. Some sort of parallelization that does not really put results together once is finsihed...

Non-ideal solution

  1. You could write a shell script which will sequentially launch each job, but you would have to write a SGE script for each job and each iterations will have to wait for the previous one to finsih

  2. You could qsub each iteration one by one but it will require a lot of typing and still you would have to write a SGE script for each job

Good alternative: make an array

With an array you need a single SGE script and the cluster will launch all the jobs at the same time, queuing the ones that do not fit and automatically launching them as soon as there is a chance

Working example: population structure with STRUCTURE

Structure: http://pritchardlab.stanford.edu/structure.html

SGE script

The first block of code is your regular SGE commands

#!/bin/sh
#$ -S /bin/sh
#$ -l h_rt=48:00:00
#$ -cwd
#$ -M hernan.morales@monash.edu
#$ -m beas

Be careful with the mail flag -m because if you have many iterations it will send you an email for each one! maybe just set up for sending emails only if crashes #$ -m as

Setting up the array and the command

Second block of code

task_id=$[$SGE_TASK_ID-1] # this keeps tracks of which iteration is going 

K=$[1+($task_id/$maxRep)] # this a STRUCTURE specific command to automatically assign the K value for each iteration, we will set maxRep later

rep=$[($K*$maxRep)-$task_id] # this keeps tracks of which repetition for a given K is going, see below

out=`echo output_K"$K"_r"$rep"` # specifies the name of output file

seed=`eval od -vAn -N4 -tu4 < /dev/urandom` # assign a random seed for each iteration

# print usefull info about each iteration
echo $task_id
echo $total_tasks
echo $K
echo $rep
echo $out
echo $seed
# loads the module
module load structure/v2.3.4
# launch the commnad for each iteration
structure -K $K -o $out -D $seed

with the above command structure will launch one job for N number of K's each a maximum amount if N times (maxRep) for a total of N number of jobs (see below) *Remember that the two block of code go together!

submitting the array

qsub -v maxRep=10 -V -t 1-50 struc_job.sh This would do K 1-5 with 10 reps each

Check out this other tutorial which explains more in detail array jobs: http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment