Skip to content

Instantly share code, notes, and snippets.

@smacarthur
Created December 16, 2010 12:02
Show Gist options
  • Save smacarthur/743324 to your computer and use it in GitHub Desktop.
Save smacarthur/743324 to your computer and use it in GitHub Desktop.
Example code for using LSF job Arrays
### Generate a random big file that we want to sort, 10 Million lines
perl -e 'for (1..1E7){printf("%.0f\n",rand()*1E7)};' > bigFile
### Split the file up into chunks with 10,000 lines in each chunk
split -a 3 -d -l 10000 bigFile split
### rename the files on a 1-1000 scheme not 0-999
for f in split*;do mv ${f} $(echo ${f} |perl -ne 'm/split(0*)(\d+)/g;print "Split",$2+1,"\n";');done
### submit a job array, allowing 50 jobs to be run at anyone time
ID=$(bsub -J "sort[1-1000]%50" "sort -n Split\$LSB_JOBINDEX >Split\$LSB_JOBINDEX.sorted" |perl -ne 'm/<(\d+)>/;print "$1"')
### merge the sorted files together once all the jobs are finished using the –w dependency
ID2=$(bsub -w "done($ID)" "sort -n -m *.sorted >bigFile.sorted" |perl -ne 'm/<(\d+)>/;print "$1"')
### Delete the temp files, waits for the merge to finish first
bsub -w "done($ID2)" "rm -f Split*"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment