title | author | date |
---|---|---|
Generating a bootstrap dendrogram with hierfstat's genet.dist |
Zhian N. Kamvar |
2018-08-21 |
This short tutorial is an answer to the question posed on the poppr forum: https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/poppr/vEQ8vb2oObQ/RjTsndaCEQAJ
My ultimate goal when asking the previous question was to employ aboot in order to create a bootstraped dendrogram based on distances offered by the genet.dist function in hierfstat. That's why I was trying to create a tree based on Nei's distance this way - to verify that it worked! Unfortunately even after reading the aboot documentation, I still fail to understand how to do this. I would be grateful if someone could provide an example of using aboot in combination with genet.dist.
Unfortunately, aboot is not the best tool for this job. Below, I show how to create a dendrogram with boot.phylo and some weird R magic.
library("poppr")
#> Loading required package: adegenet
#> Loading required package: ade4
#>
#> /// adegenet 2.1.1 is loaded ////////////
#>
#> > overview: '?adegenet'
#> > tutorials/doc/questions: 'adegenetWeb()'
#> > bug reports/feature requests: adegenetIssues()
#> This is poppr version 2.8.0. To get started, type package?poppr
#> OMP parallel support: available
library("hierfstat")
#>
#> Attaching package: 'hierfstat'
#> The following object is masked from 'package:adegenet':
#>
#> read.fstat
library("ape")
#>
#> Attaching package: 'ape'
#> The following objects are masked from 'package:hierfstat':
#>
#> pcoa, varcomp
data("nancycats", package = "adegenet")
n <- genind2hierfstat(nancycats)
head(n)
#> pop fca8 fca23 fca43 fca45 fca77 fca78 fca90 fca96 fca37
#> 1 P01 NA 136146 139139 116120 156156 142148 199199 113113 208208
#> 2 P01 NA 146146 139145 120126 156156 142148 185199 113113 208208
#> 3 P01 135143 136146 141141 116116 152156 142142 197197 113113 210210
#> 4 P01 133135 138138 139141 116126 150150 142148 199199 91105 208208
#> 5 P01 133135 140146 141145 126126 152152 142148 193199 113113 208208
#> 6 P01 135143 136146 145149 120126 150156 148148 193195 91113 208208
We can create an initial tree by passing the distance to the Neighbor-Joining function:
plot(nj(genet.dist(n)))
We can also create a function that does this for us and adds tip labels from the population factor
genet.tree <- function(x) {
res <- nj(genet.dist(x))
res$tip.label <- levels(x[[1]])
res
}
plot(genet.tree(n))
boot.phylo will randomly sample the columns of our data frame with replacement and use a tree generating function to generate a new tree from the resampled data. The problem we encounter is that we need to fix the first column in place because genet.dist() needs this in order to calculate the population distance.
To address this, we can create a function-generating function that will hold our population factor and combine it with the incoming data (for details on function-generating functions, see https://adv-r.hadley.nz/function-factories.html)
fungen <- function(pop_column) {
force(pop_column)
function(x) {
dat <- cbind(pop_column, x)
res <- nj(genet.dist(dat))
res$tip.label <- levels(dat[[1]])
res
}
}
If we compare these functions, they are identical
genet.tree2 <- fungen(n[1])
identical(genet.tree(n), genet.tree2(n[-1]))
#> [1] TRUE
From here, we can pass this function to the boot.phylo function to generate our node labels
catTree <- genet.tree2(n[-1])
catTree$node.label <- boot.phylo(phy = catTree, x = n[-1], FUN = genet.tree2)
#>
Running bootstraps: 100 / 100
#> Calculating bootstrap values... done.
plot(catTree, show.node.lab = TRUE)
add.scale.bar()
Created on 2018-08-21 by the reprex package (v0.2.0).