Skip to content

Instantly share code, notes, and snippets.

@jankuc
Created September 12, 2018 08:25
Show Gist options
  • Save jankuc/82bdc74325ceda088ef01e779685b895 to your computer and use it in GitHub Desktop.
Save jankuc/82bdc74325ceda088ef01e779685b895 to your computer and use it in GitHub Desktop.
Parallel computation in R usable on windows machines with SOCKS parallelization.
library(parallel)
# detect number of compute cores
num_cores <- detectCores()
# register the cluster
cl <- makeCluster(num_cores)
# load libraries on all nodes in cluster
clusterEvalQ(cl, library(data.table))
# register pre-calculated (global) variable on all nodes in cluster
shared_dt <- data.table(a=1:10000, b=1:10000)
clusterExport(cl, "shared_dt")
# split the indeces to delegate work to the nodes
indeces <- splitIndices(nrow(shared_dt), num_cores)
# parallel computation
result_list <- parLapply(cl, indeces, function(inds){
# do the work for the selected part of data table
res <- shared_dt[inds, c := a + b][inds,]
})
# row bind the results
result <- rbindlist(result_list)
result
stopCluster(cl)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment