-
-
Save rossmounce/6212835 to your computer and use it in GitHub Desktop.
library(phangorn) | |
#264 REFERENCE trees in phylip format, PAUP numbering hence 2 | |
ref2 <- read.tree("jackr2.tre") | |
#264 trees in phylip format to pair-wise compare to the reference trees, TNT numbering hence 1 | |
tr2 <- read.tree("jack1.tre") | |
x <- {} | |
#all reference trees to one comp tree | |
for (i in 1:length(tr2)) { | |
for (j in 1:length(ref2)) { | |
x <- rbind(x,treedist(tr2[[i]],ref2[[j]])) | |
} | |
} | |
object.size(x) #8.5Mb table at the end :) |
Looks like you don't even need to parallelize if this works fine.
There are many ways to parallelize in R but one quick and dirty way would be to split tr2
into a list, then mclapply
it (a multicore lapply
). If you spawn an instance on Amazon EC2 (there are many public AMIs with R installed; you'll just need to install phangorn
and dependencies). Then just rewrite the code above into a function and pass the tr2
list and ref2
. It will automatically split across available cores and throw the results back into a list. You can then rbind
it easily (even locally).
hope this helps.
thx @karthikram ... think I might wait til after I hand in my thesis til I try that. But since I picked up a free $100 Amazon EC2 gift card at the last #hack4ac I think I'll definitely be trying this out at some point. Thoroughly needed to scale-up my meta-analyses!
@ethanwhite @rossmounce Typo now fixed. Thanks.