A quick exploratory analysis of the dataset collected by BEtti and Manica (Betti L, Manica A (2018) Human variation in the shape of the birth canal is significant and geographically structured. Proceedings of the Royal Society B 285(1889): 20181807.)
We perform classic multidimensional scaling and contrast it with the aggregate means by Region and Population.
Let's load the tidyverse framework to wrangle data and plot it
library(tidyverse)
library(magrittr)
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0 ✔ purrr 0.2.5
✔ tibble 1.4.2 ✔ dplyr 0.7.6
✔ tidyr 0.8.1 ✔ stringr 1.3.1
✔ readr 1.1.1 ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Attaching package: ‘magrittr’
The following object is masked from ‘package:purrr’:
set_names
The following object is masked from ‘package:tidyr’:
extract
We get the data directly from the datadryad repository.
birth_canal_df <- read_csv("https://datadryad.org/bitstream/handle/10255/dryad.189617/Dataset.csv")
Parsed with column specification:
cols(
Population = col_character(),
Site = col_character(),
Institution = col_character(),
Individual.ID = col_character(),
inlet.ML = col_double(),
inlet.AP = col_double(),
midplane.ML = col_double(),
midplane.AP = col_double(),
outlet.ML = col_double(),
outlet.AP = col_double(),
Region = col_character(),
Body.mass = col_double()
)
We perform classic metric dimensionality reduction to plot the data on a flat image. We pick a euclidean distance as we don't have much more insight to pick something better.
dists_nmds <- birth_canal_df %>%
select(contains("."), -Body.mass, -Individual.ID) %>%
dist() %>% # produce euclidean distance by default
cmdscale(eig = TRUE, k=2) %$%
points
colnames(dists_nmds) <- c("PC1", "PC2")
Let's put original and dimensionality reduced data together.
birth_canal_df %<>% bind_cols(dists_nmds %>% as_data_frame())
We compute the convex hulls for the Region (that is, ~ continent) and population level as specified in the dataset.
Region_hulls_12 <- birth_canal_df %>%
select(PC1,PC2,Region) %>%
group_by(Region) %>%
slice(chull(PC1,PC2))
Pop_hulls_12 <- birth_canal_df %>%
select(PC1,PC2,Population) %>%
group_by(Population) %>%
slice(chull(PC1,PC2))
And we overlap a scatter plot and a convex hull plot
p_birth_canal <- birth_canal_df %>%
ggplot(aes(x = PC1, y = PC2, fill = Region, color = Region)) +
geom_point() +
theme_void()
region_plot <- p_birth_canal + aes(fill = Region) + geom_polygon(data = Region_hulls_12, alpha = 0.5)
pop_birth_canal <- birth_canal_df %>%
ggplot(aes(x = PC1, y = PC2, fill = Population, color = Population)) +
geom_point() +
theme_void()
population <- pop_birth_canal + aes(fill = Population) + geom_polygon(data = Pop_hulls_12, alpha = 0.1)
We compute aggregate (mean) metrics and prduce the scatter plots
agg_Region <- birth_canal_df %>%
group_by(Region) %>%
summarise(PC1 = mean(PC1),
PC2 = mean(PC2)) %>%
ggplot(aes(x = PC1, y = PC2, fill = Region, color = Region)) +
geom_point() +
theme_void()
agg_Population <- birth_canal_df %>%
group_by(Population) %>%
summarise(PC1 = mean(PC1),
PC2 = mean(PC2)) %>%
ggplot(aes(x = PC1, y = PC2, fill = Population, color = Population)) +
geom_point() +
theme_void()
Finally we put everything together and save in a couple of formats
library(patchwork)
full <- (agg_Region + agg_Population) / (region_plot + population)
ggsave(filename = "diversity_in_birth_canal.pdf", plot = full, width = 10, height = 7)
ggsave(filename = "diversity_in_birth_canal.png", plot = full, width = 12, height = 8)