Skip to content

Instantly share code, notes, and snippets.

@Rekyt
Created February 6, 2019 14:00
Show Gist options
  • Save Rekyt/4b7b187022ee570d96b103dd6d122550 to your computer and use it in GitHub Desktop.
Save Rekyt/4b7b187022ee570d96b103dd6d122550 to your computer and use it in GitHub Desktop.
Plot a phylogeny with clade labels and colored edges programmatically
---
title: "Phylogeny with traits in branches and clade labels"
author: "Matthias Grenié"
date: /today
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
We are going to use the package `ggtree` to show trait values and clade labels on phylogenetic trees.
```{r needed_packages}
library("ggtree")
data("geospiza_raw", package = "phylobase")
```
We are using a phylogenetic tree and trait data from Darwin's finches in the `phylobase` package called `geospiza`:
```{r overview_geospiza}
geospiza_raw$tree
head(geospiza_raw$data)
```
We have five traits per species and the goal is to produce a fan phylogenetic tree with clade labels around the tree.
We can first visualize a simple tree:
```{r simple_vizualization}
ggtree(geospiza_raw$tree, layout = "fan") +
geom_tiplab()
```
Now we can color the edges of the tree per trait:
```{r colored_by_traits}
base_tree = ggtree(geospiza_raw$tree, layout = "fan") +
geom_tiplab()
annotation_df = as.data.frame(geospiza_raw$data)
annotation_df$taxa = rownames(geospiza_raw$data)
# First column has to contain same names as tip labels
annotation_df = annotation_df[, ncol(annotation_df):1]
annotated_tree = base_tree %<+% annotation_df
colored_tree = annotated_tree + aes(color = wingL)
colored_tree
```
We can add information of clade for a given node considering several species
```{r single_clade_label}
colored_tree +
geom_cladelabel(20, "First clade", offset = 0.1, barsize = 1.5, angle = "auto")
```
If we have informations on several clades we can generate as many clade labels as needed and used them:
```{r several_clade_labels}
# Describing the clades of species
clade_names = data.frame(
node = c(20, 25), # Node id in the phylogenetic tree
clade = c("First clade", "Second clade"))
# All clade labels generated from each row of the data frame of names
all_clade_labs = apply(clade_names, 1, function(row) {
geom_cladelabel(row[["node"]], row[["clade"]], offset = 0.1, barsize = 1.5,
angle = "auto")
})
# Plot all clade labels
colored_tree +
all_clade_labs
```
To alternate the color of the bands you can alternate the color in the given data.frame:
```{r color_df}
# Describing the clades of species
clade_names_color = data.frame(
node = c(20, 25), # Node id in the phylogenetic tree
clade = c("First clade", "Second clade"),
color = c("#000000", "#AAAAAA"))
# All clade labels generated from each row of the data frame of names
all_clade_labs_color = apply(clade_names_color, 1, function(row) {
geom_cladelabel(row[["node"]], row[["clade"]], color = row[["color"]],
offset = 0.1, barsize = 1.5, angle = "auto")
})
# Plot all clade labels
colored_tree +
all_clade_labs_color
```
If the tree is big you can programmatically determine the node at which you need to put the label:
```{r label_mrca}
library("dplyr")
# Data frame with each clade labeled
species_clade = data.frame(
species = c("fuliginosa", "fortis", "magnirostris", "conirostris",
"scandens", "difficilis", "psittacula", "parvulus", "pauper",
"pallida"),
clade_name = c(rep("Clade 1", 6), rep("Clade 2", 4)),
stringsAsFactors = FALSE
)
# Retrieve Most Recent Common Ancestor for each clade to get node number
species_mrca = species_clade %>%
group_by(clade_name) %>%
summarise(mrca = ggtree::MRCA(colored_tree, species))
# Add alternating color column
species_mrca$color = rep(c("#000000", "#AAAAAA"), size = nrow(species_mrca))
# Then annotate the tree
colored_tree +
apply(species_mrca, 1, function(row) {
geom_cladelabel(row[["mrca"]], row[["clade_name"]],
color = row[["color"]], offset = 0.1, barsize = 1.5,
angle = "auto")
})
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment