-
-
Save rossmounce/6212835 to your computer and use it in GitHub Desktop.
library(phangorn) | |
#264 REFERENCE trees in phylip format, PAUP numbering hence 2 | |
ref2 <- read.tree("jackr2.tre") | |
#264 trees in phylip format to pair-wise compare to the reference trees, TNT numbering hence 1 | |
tr2 <- read.tree("jack1.tre") | |
x <- {} | |
#all reference trees to one comp tree | |
for (i in 1:length(tr2)) { | |
for (j in 1:length(ref2)) { | |
x <- rbind(x,treedist(tr2[[i]],ref2[[j]])) | |
} | |
} | |
object.size(x) #8.5Mb table at the end :) |
ah... that sounds logical. Loop counter, great idea! I just going to say I may not always know how many trees are in each treeset file, but this would appear to solve that problem nicely. Thanks!
Line 4 should be indexing using ii
right?
x[ii, ] <- c(i, j, treedist(tr2[[i]],ref2[[j]]))
I actually like the separate counter since it's going to be clearer and have a lower likelihood of bugs than a calculation based on i i
and j
.
To be clear, I meant Line 4 in @gavinsimpson's very nice pre-allocation example.
@ethanwhite +1 on both points!. I was just hedging my bets in case someone commented here how easy it was to use i
and j
for the loop counter :-)
@ethanwhite @rossmounce Typo now fixed. Thanks.
Looks like you don't even need to parallelize if this works fine.
There are many ways to parallelize in R but one quick and dirty way would be to split tr2
into a list, then mclapply
it (a multicore lapply
). If you spawn an instance on Amazon EC2 (there are many public AMIs with R installed; you'll just need to install phangorn
and dependencies). Then just rewrite the code above into a function and pass the tr2
list and ref2
. It will automatically split across available cores and throw the results back into a list. You can then rbind
it easily (even locally).
hope this helps.
thx @karthikram ... think I might wait til after I hand in my thesis til I try that. But since I picked up a free $100 Amazon EC2 gift card at the last #hack4ac I think I'll definitely be trying this out at some point. Thoroughly needed to scale-up my meta-analyses!
Never, NEVER, grow an object in an R
for()
loop. R has to allocate new storage forx
and copy across the originalx
plus the new row. You waste so much time just growing and copying objects this way.I'm not familiar with what
treedist()
returns, but you know how many rowsx
should have. Lets assumetreedist()
returns a numeric of length 1 (the distance between treestri
andtry
. First allocate storage forx
Now loop
I'm being lazy and using a separate loop counter; you could probably work out how to fill
x
just fromi
andj
alone.Hopefully you get the gist of this? Allocate storage and then fill in.