Skip to content

Instantly share code, notes, and snippets.

@ryandward
Last active December 4, 2023 18:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ryandward/e70aed094cb99c597dc48dfff63e929c to your computer and use it in GitHub Desktop.
Save ryandward/e70aed094cb99c597dc48dfff63e929c to your computer and use it in GitHub Desktop.

Estimation of Bottleneck Size

The calculation of bottleneck size ($N_b$) follows the principle of effective population size estimation, adapted to quantify the 'founder' population size, i.e., the number of cells contributing descendants to the sample. The formula used in this study is taken from Krimbas and Tsakas and provides the best fit for estimating $N_b$ as per our data. It calculates $N_b$ using allele frequencies at initial and final time points ($f_{i_0}$ and $f_{i_s}$), the total number of sequence reads at both times ($s_0$ and $s_s$), and the number of generations.

This method is an attempt to formalize the intial development from this GitHub repository, which is based on the methodologies from this Nature Methods paper.


T0 Explanation

In the context of bottleneck estimation, "T0" refers to the initial time point where the frequency of each unique allele (or tag) in the population is recorded. The idea is to compare allele frequencies at this initial time point ("T0") with those at a later time point ("Tn") to estimate the effective population size (Ne) or the bottleneck size (Nb).

Code Comments

This code aims to calculate Nb based on temporal allele frequency data:

# Calculate frequencies of alleles at T0
botneck.t0 <- bottleneck_data %>%
  filter(doublings == 0) %>%
  group_by(type) %>%
  mutate(
    fi0 = count / sum(count),
    count0 = count
  ) %>%
  select(type, spacer, fi0, count0) %>%
  nest() %>%
  rename(data0 = data) %>%
  mutate(s0 = map_dbl(data0, ~ sum(.$count0)))

# Calculate frequencies of alleles at other times and compute Nb using the formula
botneck <- bottleneck_data %>%
  filter(doublings != 0) %>%
  group_by(type, condition, sample, doublings) %>%
  mutate(
    fis = count / sum(count)
  ) %>%
  nest() %>%
  mutate(
    ss = map_dbl(data, ~ sum(.$count))
  ) %>%
  full_join(botneck.t0) %>%
  mutate(
    data = map2(data, data0, inner_join)
  )

botneck <- botneck %>%
  mutate(
    data = map(data, ~ .x %>% mutate(ratio = ((fis - fi0)^2) / (fi0 * (1 - fi0)^2)))
  )

botneck <- botneck %>%
  mutate(
    f_hat = map_dbl(data, ~ sum(.$ratio)) * (1 / map_dbl(data, ~ n_distinct(.$spacer))), 
    Nb = doublings / (f_hat - 1 / s0 - 1 / ss)
  )

# Summary Statistics
botneck.stats <- botneck %>%
  group_by(condition, type) %>%
  summarise(
    Nb.med = median(Nb),
    Nb.range = max(Nb) - min(Nb),
    Nb.mean = mean(Nb),
    Nb.sd = sd(Nb)
  )

Statistics Explanation

The statistics calculated are median, range, mean, and standard deviation of $N_b$ across different conditions and types. These statistics provide an overview of the bottleneck size distribution:

  • Nb.med: Median provides a robust measure of the central tendency of $N_b$.
  • Nb.range: Range gives the span of $N_b$, indicating the variability.
  • Nb.mean: Mean offers an average measure but can be affected by outliers.
  • Nb.sd: Standard deviation quantifies how much $N_b$ values deviate from the mean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment