gvdr/Birth_Canal_Diversity.md

## Birth_Canal_Diversity.md

      
    Raw
  

              Birth_Canal_Diversity.md
            
          
    Diversity in Birth Canal across the world

A quick exploratory analysis of the dataset collected by BEtti and Manica (Betti L, Manica A (2018) Human variation in the shape of the birth canal is significant and geographically structured. Proceedings of the Royal Society B 285(1889): 20181807.)
We perform classic multidimensional scaling and contrast it with the aggregate means by Region and Population.
Let's load the tidyverse framework to wrangle data and plot it
library(tidyverse)
library(magrittr)
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.0.0     ✔ purrr   0.2.5
✔ tibble  1.4.2     ✔ dplyr   0.7.6
✔ tidyr   0.8.1     ✔ stringr 1.3.1
✔ readr   1.1.1     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: ‘magrittr’

The following object is masked from ‘package:purrr’:

    set_names

The following object is masked from ‘package:tidyr’:

    extract

We get the data directly from the datadryad repository.
birth_canal_df <- read_csv("https://datadryad.org/bitstream/handle/10255/dryad.189617/Dataset.csv")
Parsed with column specification:
cols(
  Population = col_character(),
  Site = col_character(),
  Institution = col_character(),
  Individual.ID = col_character(),
  inlet.ML = col_double(),
  inlet.AP = col_double(),
  midplane.ML = col_double(),
  midplane.AP = col_double(),
  outlet.ML = col_double(),
  outlet.AP = col_double(),
  Region = col_character(),
  Body.mass = col_double()
)

We perform classic metric dimensionality reduction to plot the data on a flat image.
We pick a euclidean distance as we don't have much more insight to pick something better.
dists_nmds <- birth_canal_df %>%
select(contains("."), -Body.mass, -Individual.ID) %>%
dist() %>% # produce euclidean distance by default
cmdscale(eig = TRUE, k=2) %$%
points

colnames(dists_nmds) <- c("PC1", "PC2")
Let's put original and dimensionality reduced data together.
birth_canal_df %<>% bind_cols(dists_nmds %>% as_data_frame())
We compute the convex hulls for the Region (that is, ~ continent) and population level as specified in the dataset.
Region_hulls_12 <- birth_canal_df %>%
  select(PC1,PC2,Region) %>%
  group_by(Region) %>%
  slice(chull(PC1,PC2))

Pop_hulls_12 <- birth_canal_df %>%
  select(PC1,PC2,Population) %>%
  group_by(Population) %>%
  slice(chull(PC1,PC2))
And we overlap a scatter plot and a convex hull plot
p_birth_canal <- birth_canal_df %>%
  ggplot(aes(x = PC1, y = PC2, fill = Region, color = Region)) +
   geom_point() +
  theme_void()

region_plot <- p_birth_canal + aes(fill = Region) + geom_polygon(data = Region_hulls_12, alpha = 0.5)
pop_birth_canal <- birth_canal_df %>%
  ggplot(aes(x = PC1, y = PC2, fill = Population, color = Population)) +
   geom_point() +
  theme_void()

population <- pop_birth_canal + aes(fill = Population) + geom_polygon(data = Pop_hulls_12, alpha = 0.1)
We compute aggregate (mean) metrics and prduce the scatter plots
agg_Region <- birth_canal_df %>%
  group_by(Region) %>%
  summarise(PC1 = mean(PC1),
            PC2 = mean(PC2)) %>%
  ggplot(aes(x = PC1, y = PC2, fill = Region, color = Region)) +
  geom_point() +
  theme_void()

agg_Population <- birth_canal_df %>%
  group_by(Population) %>%
  summarise(PC1 = mean(PC1),
            PC2 = mean(PC2)) %>%
  ggplot(aes(x = PC1, y = PC2, fill = Population, color = Population)) +
  geom_point() +
  theme_void()
Finally we put everything together and save in a couple of formats
library(patchwork)
full <- (agg_Region + agg_Population) / (region_plot + population)

ggsave(filename = "diversity_in_birth_canal.pdf", plot = full, width = 10, height = 7)
ggsave(filename = "diversity_in_birth_canal.png", plot = full, width = 12, height = 8)