Skip to content

Instantly share code, notes, and snippets.

View melissamlwong's full-sized avatar

Melissa M.L. Wong melissamlwong

  • University of Lausanne, Switzerland
View GitHub Profile
#!/usr/bin/env Rscript
#Correlation calculation for large dataset (tested on ~120k columns)
#Modified by Melissa M.L. Wong on 19 July 2018
#Modification 1: Remove ff matrix due to size limitation of ~45k. Converting ff matrix to ffdf and writing to file takes forever.
#Modification 2: Print pearson correlation to console and redirect output to a file using bash
#Modification 3: User can select columns from x to y to be used for the comparisons with other columns
#Modification 4: No data is stored in memory. Memory usage is about 4 Gb.
#Comment: This is faster than all vs all comparison. The task can be split into multple chunks and saved in multiple files
#Usage: Rscript -e 't<-read.table("matrix.dat",sep=" ",header=T, stringsAsFactors=F);a<-as.matrix(sapply(t, as.numeric));source("bigcorPar.r");bigcorPar(a, ncore=64,x=1,y=1000)' >> matrix_cor.txt