Skip to content

Instantly share code, notes, and snippets.

@mllg
Created April 19, 2013 13:29
Show Gist options
  • Save mllg/5420354 to your computer and use it in GitHub Desktop.
Save mllg/5420354 to your computer and use it in GitHub Desktop.
Fit a random forests with 1000 trees on the iris data set as an example on sequential and parallel execution in R.
library(randomForest)
data(iris)
mydata = iris
###############################################################################
### sequential
###############################################################################
randomForest(Species ~ ., data = mydata, ntree = 1000)
###############################################################################
### parallel with snowfall
###############################################################################
library(snowfall)
# inititalize the cluster with 2 CPUs, export libraries and data and setup the
# RNG
sfInit(parallel = TRUE, cpus = 2)
sfLibrary(randomForest)
sfExport("mydata")
sfClusterSetupRNG(seed = 1)
# fit a forest on exported "mydata" with n trees
fit = function(n) {
randomForest(Species ~ ., data = mydata, ntree = n)
}
# fit 10 forests with 100 trees each
res = sfLapply(rep(100, 10), fit)
# stop the cluster
sfStop()
# reduce results with randomForest::combine to a forest with 1000 trees
Reduce(combine, res)
###############################################################################
### parallel with mclapply
###############################################################################
library(parallel)
fit = function(mydata, n) {
randomForest(Species ~ ., data = mydata, ntree = n)
}
res = mclapply(rep(100, 10), fit, mydata = mydata)
Reduce(combine, res)
###############################################################################
### parallel with BatchJobs
###############################################################################
# note that this in intended to be used with the default "interactive" backend
library(BatchJobs)
fit = function(mydata, n) {
randomForest(Species ~ ., data = mydata, ntree = n)
}
reg = batchMapQuick(fit, n = rep(100, 10), more.args = list(mydata = mydata))
reduceResults(reg, fun = function(aggr, job, res) combine(aggr, res))
# cleanup temporary directory
unlink(reg$file.dir, recursive = TRUE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment