Created
August 12, 2013 16:59
-
-
Save rossmounce/6212835 to your computer and use it in GitHub Desktop.
A _really_ basic script for doing many-to-many tree2tree distance (RF) comparisons in R, using the phangorn package and the function treedist. I should probably use one of the 'apply' functions here, right?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(phangorn) | |
#264 REFERENCE trees in phylip format, PAUP numbering hence 2 | |
ref2 <- read.tree("jackr2.tre") | |
#264 trees in phylip format to pair-wise compare to the reference trees, TNT numbering hence 1 | |
tr2 <- read.tree("jack1.tre") | |
x <- {} | |
#all reference trees to one comp tree | |
for (i in 1:length(tr2)) { | |
for (j in 1:length(ref2)) { | |
x <- rbind(x,treedist(tr2[[i]],ref2[[j]])) | |
} | |
} | |
object.size(x) #8.5Mb table at the end :) |
thx @karthikram ... think I might wait til after I hand in my thesis til I try that. But since I picked up a free $100 Amazon EC2 gift card at the last #hack4ac I think I'll definitely be trying this out at some point. Thoroughly needed to scale-up my meta-analyses!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Looks like you don't even need to parallelize if this works fine.
There are many ways to parallelize in R but one quick and dirty way would be to split
tr2
into a list, thenmclapply
it (a multicorelapply
). If you spawn an instance on Amazon EC2 (there are many public AMIs with R installed; you'll just need to installphangorn
and dependencies). Then just rewrite the code above into a function and pass thetr2
list andref2
. It will automatically split across available cores and throw the results back into a list. You can thenrbind
it easily (even locally).hope this helps.