Skip to content

Instantly share code, notes, and snippets.

@Rekyt
Last active March 10, 2024 00:18
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Rekyt/ee15330639f8719d87aebdb8a5b095d4 to your computer and use it in GitHub Desktop.
Save Rekyt/ee15330639f8719d87aebdb8a5b095d4 to your computer and use it in GitHub Desktop.
Plotting arrows on a PCoA
---
title: "Adding arrows to PCoA"
author: "Matthias Grenié"
date: "26 septembre 2018"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
The goal of this document is to show how to add arrows with variables on a PCoA.
## Load Data
We'll use `ade4` data, `ggplot2` for plotting and `ape::pcoa()` to compute the actual pcoa. We'll use `woangers` dataset included in `ade4` because it mixes variable types.
First let's load packages and data
```{r load_pkg_data}
library("ade4")
library("ggplot2")
library("ape")
data("woangers")
```
Then compute functional dissimilarity using `ade4::dist.ktab()` (this chunk is copy-pasted from the example provided in the help of `dist.ktab()`). Beware this dataset contains `NA` to avoid further problems we select only species with no NA:
```{r compute_dist}
traits = woangers$traits[complete.cases(woangers$traits),]
# Nominal variables 'li', 'pr', 'lp' and 'le'
# (see table 1 in the main text for the codes of the variables)
tabN = traits[, c(1:2, 7, 8)]
# Circular variable 'fo'
tabC = traits[3]
tabCp = prep.circular(tabC, 1, 12)
# The levels of the variable lie between 1 (January) and 12 (December).
# Ordinal variables 'he', 'ae' and 'un'
tabO = traits[, 4:6]
# Fuzzy variables 'mp', 'pe' and 'di'
# 'mp' has 3 levels, 'pe' has 3 levels and 'di' has 5 levels.
tabF = traits[, 9:19]
tabFp = prep.fuzzy(tabF, c(3, 3, 5), labels = c("mp", "pe", "di"))
# Quantitative variables 'lo' and 'lf'
tabQ = traits[, 20:21]
# Compute dissimilarity
ktab1 = ktab.list.df(list(tabN, tabCp, tabO, tabFp, tabQ))
distrait = dist.ktab(ktab1, c("N", "C", "O", "F", "Q"))
str(distrait)
```
## Compute PCoA
Now that we have a distance we can compute the PCoA using `pcoa()` in the `ape` package, other functions exist (`ade4::dudi.pco()`, `vegan::cmdscale()`, etc.) but I use this one for the sake of simplicity:
```{r compute_pcoa}
trait_pcoa = pcoa(distrait)
str(trait_pcoa)
```
## Add arrows
To add the arrows we use a function (considering only numeric traits and converting ordinal traits as numeric) inspired from `ape::biplot.pcoa()`, which formula is taken from Legendre & Legendre (1998):
```{r compute_arrows}
compute_arrows = function(given_pcoa, trait_df) {
# Keep only quantitative or ordinal variables
# /!\ Change this line for different dataset
# or select only quantitative/ordinal var. /!\
trait_df = trait_df[, c(4:6, 20, 21)]
n <- nrow(trait_df)
points.stand <- scale(given_pcoa$vectors)
# Compute covariance of variables with all axes
S <- cov(trait_df, points.stand)
# Select only positive eigenvalues
pos_eigen = given_pcoa$values$Eigenvalues[seq(ncol(S))]
# Standardize value of covariance (see Legendre & Legendre 1998)
U <- S %*% diag((pos_eigen/(n - 1))^(-0.5))
colnames(U) <- colnames(given_pcoa$vectors)
# Add values of covariances inside object
given_pcoa$U <- U
return(given_pcoa)
}
trait_pcoa_arrows = compute_arrows(trait_pcoa, traits)
trait_pcoa_arrows$U[, 1:3]
```
Now for each quantitative or ordinal variable we have a covariance value with the PCoA axis.
## Plotting the PCoA
We can now plot the PCoA with arrows using `ggplot2`. For `ape::pcoa()` objects the values for the individuals are stored in `pcoa$vectors`, combined with our newly computed `pcoa$U` values we should be able to get a full graph:
```{r plotting_pcoa}
# /!\ Be sure to change $vectors to a data.frame before putting it in ggplot2 /!\
trait_pcoa_arrows$vectors = as.data.frame(trait_pcoa_arrows$vectors)
# Basic plot with individuals
plot_pcoa = ggplot(trait_pcoa_arrows$vectors, aes(Axis.1, Axis.2)) +
geom_point()
plot_pcoa
# Now let's add the arrows
# Each arrow begins at the origin of the plot (x = 0, y = 0) and ends at the
# values of covariances of each variable
plot_pcoa +
geom_segment(data = as.data.frame(trait_pcoa_arrows$U),
x = 0, y = 0, alpha = 0.7,
mapping = aes(xend = Axis.1, yend = Axis.2),
# Add arrow head
arrow = arrow(length = unit(3, "mm")))
```
Sometimes the scaling for arrows does not show well with the points. We can scale arbitrarily arrows just to show plot if we precise it in the legend:
```{r plot_pcoa_scaling}
plot_pcoa +
geom_segment(data = as.data.frame(trait_pcoa_arrows$U/20),
x = 0, y = 0, alpha = 0.7,
mapping = aes(xend = Axis.1, yend = Axis.2),
# Add arrow head
arrow = arrow(length = unit(3, "mm"))) +
labs(subtitle = "Arrows scale arbitrarily")
```
If we want to know the variable names we can slightly change the arrows data.frame:
```{r arrows_df}
arrows_df = as.data.frame(trait_pcoa_arrows$U/20)
arrows_df$variable = rownames(arrows_df)
plot_pcoa +
geom_segment(data = arrows_df,
x = 0, y = 0, alpha = 0.7,
mapping = aes(xend = Axis.1, yend = Axis.2),
# Add arrow head
arrow = arrow(length = unit(3, "mm"))) +
geom_label(data = arrows_df, aes(label = variable)) +
labs(subtitle = "Arrows scale arbitrarily")
```
We have the name of the variables but we can even improve using the `ggrepel` package to avoid overlapping labels:
```{r plot_pcoa_label_repel}
plot_pcoa +
geom_segment(data = arrows_df,
x = 0, y = 0, alpha = 0.7,
mapping = aes(xend = Axis.1, yend = Axis.2),
# Add arrow head
arrow = arrow(length = unit(3, "mm"))) +
ggrepel::geom_label_repel(data = arrows_df, aes(label = variable)) +
labs(subtitle = "Arrows scale arbitrarily")
```
From this we can improve the aspect of the PCoA:
```{r plot_pcoa_aspect}
# Add better axes names
arrows_df$var_name = c("Max. Height",
"Aerial Veg. Mult.",
"Underground Veg. Mult.",
"Seed Bank Long.",
"Flowering Period")
plot_pcoa +
geom_segment(data = as.data.frame(trait_pcoa_arrows$U/20),
x = 0, y = 0, alpha = 0.7,
mapping = aes(xend = Axis.1, yend = Axis.2),
# Add arrow head
arrow = arrow(length = unit(3, "mm"))) +
# Add Axes Labels
ggrepel::geom_label_repel(data = arrows_df, aes(label = var_name)) +
# Add axes
geom_hline(yintercept = 0, linetype = 2) +
geom_vline(xintercept = 0, linetype = 2) +
# Keep coord equal for each axis
coord_equal() +
# Labs
labs(subtitle = "Arrows scale arbitrarily",
# Add Explained Variance per axis
x = paste0("Axis 1 (", round(trait_pcoa_arrows$values$Relative_eig[1] * 100, 2), "%)"),
y = paste0("Axis 2 (", round(trait_pcoa_arrows$values$Relative_eig[2] * 100, 2), "%)")) +
# Theme change
theme_bw()
```
## Wrapping up in a single function
Now we can try to automate as many task as possible using the a single function:
```{r final_function}
# axis variables given the axes to show
# the scalar helps for the visualization of the different axes
show_final_pcoa = function(pcoa_object_with_arrows, axis1 = 1, axis2 = 2,
variable_scalar = 20) {
given_pcoa = pcoa_object_with_arrows
# Store axes names
kept_axes = paste0("Axis.", c(axis1, axis2))
# data.frame with values from species
pcoa_df = as.data.frame(given_pcoa$vectors)
# data.frame with variable covariances
arrow_df = as.data.frame(given_pcoa$U/variable_scalar)
arrow_df$variable = rownames(arrow_df) # Add variable names
# Eigen values per axes
eigen_values = given_pcoa$values$Relative_eig[c(axis1, axis2)]
names(eigen_values) = kept_axes
# Axes labels in plot
axes_labs = lapply(kept_axes, function(x) {
paste0(gsub(".", " ", x, fixed = TRUE),
" (", round(eigen_values[x] * 100, 2), "%)")
})
# Compute plot limits to get symetrical limits on each axis (= square plot)
limits = apply(given_pcoa$vectors[, kept_axes], 2, range)
x_lim = c(limits[1, 1] - diff(limits[,1])/10,
limits[2, 1] + diff(limits[,1])/10)
y_lim = c(limits[1, 2] - diff(limits[,2])/10,
limits[2, 2] + diff(limits[,2])/10)
# Final plot
ggplot(pcoa_df, aes_string(x = kept_axes[1],
y = kept_axes[2])) +
# Add dashed axes
geom_hline(yintercept = 0, linetype = 2) +
geom_vline(xintercept = 0, linetype = 2) +
# Add species points
geom_point() +
# Add arrows
geom_segment(data = arrow_df, x = 0, y = 0, alpha = 0.7,
arrow = arrow(length = unit(3, "mm")),
aes_string(xend = kept_axes[1], yend = kept_axes[2])) +
# Add arrow labels
ggrepel::geom_label_repel(data = arrow_df, aes(label = variable),
size = 5, fontface = "bold",
lineheight = 0.7) +
# Specified limits to get square plot
# to see the effect of this you can replace this line by
# coord_equal() +
coord_fixed(xlim = x_lim, ylim = y_lim) +
# Labels & Theme
labs(x = axes_labs[[1]],
y = axes_labs[[2]]) +
theme_bw() +
theme(panel.grid = element_blank())
}
```
We can now test our function:
```{r test_function}
basic_arrow_pcoa = compute_arrows(trait_pcoa, traits)
show_final_pcoa(basic_arrow_pcoa, 1, 2)
show_final_pcoa(basic_arrow_pcoa, 2, 3)
show_final_pcoa(basic_arrow_pcoa, 1, 3)
```
It works!
Feel free to reuse and change the process/function ;)
## References
Legendre, P. and Legendre, L. (1998) Numerical Ecology, 2nd English edition. Amsterdam: Elsevier Science BV.
@Rekyt
Copy link
Author

Rekyt commented Oct 11, 2018

Depending on the PCoA it's important to select positive eigenvalues to allow plotting variables
See
https://gist.github.com/Rekyt/ee15330639f8719d87aebdb8a5b095d4/revisions#diff-3256f99931de7ce30e63cb4d01165853R84

@sunxi822
Copy link

Hi Grenié,
Thanks for your coding!

I just have one question, my data has some nominal traits and dummy/ binary traits.

I really want to add the arrows about nominal and binary traits, but I noted you mentioned that To add the arrows we use a function (considering only numeric traits and converting ordinal traits as numeric) .

I also looked up a lot of information on the Internet, yet I didn't find a good solution. I was wondering if there is any way to add nominal/binary traits arrows to a PCoA?

Hope for your reply! Thank you very much!

@Rekyt
Copy link
Author

Rekyt commented Jul 19, 2021

Hi @sunxi822,

Thank you for your interest :)

One thing you could do with binary and categorical traits is to compute a correlation metric between the categorical variables and each axis, then to represent this correlation on the graph and scale it to make it comparable to other arrows

Another more straightforward way would be to represent the centroid of each modality of the categorical variables taken by the individuals. Like what is done on Multiple Correspondence Analysis

@sunxi822
Copy link

Thank you for your reply!

I will try both of them ♪(^_^;)

I've seen Multiple correspondence analysis in the paper, but I had not tried it yet. I need some time to learn it.
This analysis seems to apply to categorical data and the continuous data converted into categorical data.

I didn't make it clear last time. There are continuous, categorical, binary, and ordinal traits in my data. In this case, I would like to ask you in advance whether the second method is still suitable.

I am very sorry that I am new to these analyses. So I probably have a lot of stupid questions ಥ_ಥ

@Rekyt
Copy link
Author

Rekyt commented Jul 23, 2021

I didn't say that you would have to do Multiple Correspondence Analysis but rather look at how they represent categorical variables. You can represent them in the same way.

For all the traits you have, only Principal Coordinates Analysis can work through a computing a dissimilarity matrix using for example the generalized Gower's dissimilarity by Pavoine et al..

@sunxi822
Copy link

I understand now. I just need to use the same method as Multiple Correspondence Analysis.

And I have calculated the dissimilarity matrix by using ade4.
Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment