Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
bigcorPar <- function(x, nblocks = 10, verbose = TRUE, ncore="all", ...){
library(ff, quietly = TRUE)
require(doMC)
if(ncore=="all"){
ncore = multicore:::detectCores()
registerDoMC(cores = ncore)
} else{
registerDoMC(cores = ncore)
}
NCOL <- ncol(x)
## test if ncol(x) %% nblocks gives remainder 0
if (NCOL %% nblocks != 0){stop("Choose different 'nblocks' so that ncol(x) %% nblocks = 0!")}
## preallocate square matrix of dimension
## ncol(x) in 'ff' single format
corMAT <- ff(vmode = "single", dim = c(NCOL, NCOL))
## split column numbers into 'nblocks' groups
SPLIT <- split(1:NCOL, rep(1:nblocks, each = NCOL/nblocks))
## create all unique combinations of blocks
COMBS <- expand.grid(1:length(SPLIT), 1:length(SPLIT))
COMBS <- t(apply(COMBS, 1, sort))
COMBS <- unique(COMBS)
## iterate through each block combination, calculate correlation matrix
## between blocks and store them in the preallocated matrix on both
## symmetric sides of the diagonal
results <- foreach(i = 1:nrow(COMBS)) %dopar% {
COMB <- COMBS[i, ]
G1 <- SPLIT[[COMB[1]]]
G2 <- SPLIT[[COMB[2]]]
if (verbose) cat("Block", COMB[1], "with Block", COMB[2], "\n")
flush.console()
COR <- cor(x[, G1], x[, G2], ...)
corMAT[G1, G2] <- COR
corMAT[G2, G1] <- t(COR)
COR <- NULL
}
gc()
return(corMAT)
}
@zhilongjia

This comment has been minimized.

Show comment
Hide comment
@zhilongjia

zhilongjia Sep 10, 2014

Error in { : task 1 failed - "object 'MAT' not found" . in line 35.

Error in { : task 1 failed - "object 'MAT' not found" . in line 35.

@bkutlu

This comment has been minimized.

Show comment
Hide comment
@bkutlu

bkutlu May 9, 2015

Replace MAT with x in line 35

bkutlu commented May 9, 2015

Replace MAT with x in line 35

@slukowski

This comment has been minimized.

Show comment
Hide comment
@slukowski

slukowski Sep 3, 2015

'multicore' has been deprecated. Replaced with parallel

'multicore' has been deprecated. Replaced with parallel

@KhanIrfanEusysbio

This comment has been minimized.

Show comment
Hide comment
@KhanIrfanEusysbio

KhanIrfanEusysbio Oct 20, 2016

I have the following dimensions in the data. Kindly let me know how many cores should I assign for the calculation as I repeatedly gives me the error Choose different 'nblocks' so that ncol(x) %% nblocks = 0!
dim(data)
[1] 514 26346

I have the following dimensions in the data. Kindly let me know how many cores should I assign for the calculation as I repeatedly gives me the error Choose different 'nblocks' so that ncol(x) %% nblocks = 0!
dim(data)
[1] 514 26346

@melissamlwong

This comment has been minimized.

Show comment
Hide comment
@melissamlwong

melissamlwong Jul 19, 2018

Thanks Bob for the code. The only problem is the size limitation of ff matrix is about 45,000. In addition, converting ff matrix to ffdf and then writing to file takes a long time. I made a fork to modify the code to handle ~120,000 columns (unlimited number in theory) and print flatten correlation to a file. I run the script for 24 chromosomes separately and it took about 15 hours to complete using 128 CPUs and 4Gb of memory.

melissamlwong commented Jul 19, 2018

Thanks Bob for the code. The only problem is the size limitation of ff matrix is about 45,000. In addition, converting ff matrix to ffdf and then writing to file takes a long time. I made a fork to modify the code to handle ~120,000 columns (unlimited number in theory) and print flatten correlation to a file. I run the script for 24 chromosomes separately and it took about 15 hours to complete using 128 CPUs and 4Gb of memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment