Skip to content

Instantly share code, notes, and snippets.

@gjuggler
Created January 16, 2012 04:42
Show Gist options
  • Save gjuggler/1619105 to your computer and use it in GitHub Desktop.
Save gjuggler/1619105 to your computer and use it in GitHub Desktop.
A simple script for submitting
# farm_scripts.R - a simple script to submit and run LSF jobs and job
# arrays from R. The "main" function reads in arguments from the
# command-line and passes them, possibly along with the job array
# index, to a user-defined function. This can serve as a basic tool
# for creating and running batched processes using R in an LSF
# environment.
#
# contact: greg@ebi.ac.uk
# Extract the command-line arguments
args <- commandArgs(trailingOnly=T)
main <- function() {
if (!is.na(args[1])) {
# The first arg is the function name
fn.name <- args[1]
print(paste("Calling", fn.name))
# Get the LSB_JOBINDEX environment variable.
# If it exists, this indicates that the script is being run
# as part of a job array and gives us the array index.
# Usually this is used to determine which slice of data
# the function should process.
jobindex <- Sys.getenv('LSB_JOBINDEX')
if (jobindex != '' && as.integer(jobindex) > 0) {
args <- c(args, as.integer(jobindex))
}
print(args)
# Print out some info if an error occurs.
error.f <- function(e) {
print("###ERROR###")
print(as.character(e))
}
# Run the indicated function, using the command-line arguments as parameters.
tryCatch(
do.call(fn.name, as.list(args[2:length(args)])),
error = error.f
)
}
}
# A simple test function to verify that parameters are being passed
# correctly. In a real-world scenario, your main analysis code would go here.
test.function <- function(argA='', argB='', jobindex=0) {
print(paste("Running job", jobindex, "with argA=", argA, "and argB=", argB))
}
# Submits a job to the LSF environment. This job will open an R process and use
# this same script as input. The name of the desired function (e.g., "test.function")
# will be passed to the submitted job, along with any additional parameters,
# as command-line arguments. Job arrays are also supported.
bsub.function <- function(fn_name, queue='normal', mem=4, args='', jobarray.count=NULL, jobarray.id=fn_name) {
array_s = ''
if (!is.null(jobarray.count)) {
# Optionally create a job array
array_s <- paste('-J ', jobarray.id, '[1-', jobarray.count, ']', sep='')
}
args_s <- paste(fn_name, ' ', args, sep='')
uname <- Sys.getenv("USER")
# Note that the job output logs will be stored in the home directory. If you are running a large
# job array, this will quickly clutter up the directory with files. If several thousand jobs are
# run, this will overload the filesystem (BAD). You could send output to /dev/null to avoid this.
cmd <- sprintf('bsub -q research -M%d000 -R "rusage[mem=%d000]" %s -o "/homes/%s/lsf_log_%s_%%J_%%I.txt" "R-2.12.0 --vanilla --args %s < farm_scripts.R"',
mem, mem, array_s, uname, fn_name, args_s
)
print(cmd)
system(cmd)
}
# Run the "main" function only after all other functions have been defined.
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment