Skip to content

Instantly share code, notes, and snippets.

@sachsmc
Last active August 29, 2015 14:09
Show Gist options
  • Save sachsmc/541c0f50101a5940e97e to your computer and use it in GitHub Desktop.
Save sachsmc/541c0f50101a5940e97e to your computer and use it in GitHub Desktop.
Parallel Jobs on Biowulf
make_swarmfile <- function(id_vector, scriptfile = "run.R", options = "--vanilla", outfile = "Rjobs"){
runlist <- paste0("Rscript ", options, " ", scriptfile, " ", id_vector, " > run-", id_vector, ".Rout")
cat(runlist, file = outfile, sep = "\n")
}
make_swarmfile(1:1000)
Rscript --vanilla run.R 1 > run-1.Rout
Rscript --vanilla run.R 2 > run-2.Rout
Rscript --vanilla run.R 3 > run-3.Rout
Rscript --vanilla run.R 4 > run-4.Rout
Rscript --vanilla run.R 5 > run-5.Rout
Rscript --vanilla run.R 6 > run-6.Rout
Rscript --vanilla run.R 7 > run-7.Rout
Rscript --vanilla run.R 8 > run-8.Rout
Rscript --vanilla run.R 9 > run-9.Rout
Rscript --vanilla run.R 10 > run-10.Rout
args <- commandArgs()
mean <- as.numeric(args[length(args)])
sample = rnorm(4444, mean)
saveRDS(sample, file = paste0("sample_", mean, ".RDS"))
swarm -f Rjobs --module R
@swihart
Copy link

swihart commented Jan 14, 2015

swarm --autobundle -f Rjobs --module R

has really sped up my total simulation time.

Thanks again for this gist!

@swihart
Copy link

swihart commented May 16, 2015

I ran into problems on a simulation and perused the user guide (http://biowulf.nih.gov/user_guide.html), and found:

**Note1: swarm will correctly allocate the correct number of processes to 2-, 4-, 16- or 24-core nodes.**

**Note2: if the programs you are running via swarm either (1) require more than 1 GB memory, or (2) are multi-threaded, you must specify additional information to swarm.**

See the swarm documentation (http://biowulf.nih.gov/apps/swarm.html)  for more details.

and found this.

-g #
--gb-per-process    gigabytes per process. By default swarm assumes each process will require 1 gb of        memory. Some applications require more than 1 gb, and so setting -g will restrict both the nodes allocated to the swarm and the number of processes per node to accomodate the memory requirement.

So I changed

swarm --autobundle -f Rjobs --module R 

to

swarm -g 4 --autobundle -f Rjobs --module R 

And it ...... WORKED!!!!!!!!!!!!

And when that didn't work, I

swarm -g 24 --autobundle -f Rjobs --module R 

and it worked. I didn't think -g 4 could go bigger than 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment