Skip to content

Instantly share code, notes, and snippets.

@turgeonmaxime
Last active June 17, 2022 22:10
Show Gist options
  • Save turgeonmaxime/362a219d7f13656d48e46e4afbb122f5 to your computer and use it in GitHub Desktop.
Save turgeonmaxime/362a219d7f13656d48e46e4afbb122f5 to your computer and use it in GitHub Desktop.
Sensitivity and specificity are not properties of the test, they depend on the population
library(tidyverse)
expit <- function(t) exp(t)/(1 + exp(t))
n <- 1000000
prev_vec <- c(0.01, 0.05, 0.1, 0.25, 0.5)
results <- purrr::map_df(prev_vec, \(prev) {
# Generate data
dvec <- rbinom(n, prob = prev, size = 1)
xvec <- rnorm(n, 2*dvec - 1)
yvec <- ifelse(xvec > 0, 1, 0)
sel_vec <- rbinom(n, prob = expit(xvec),
size = 1)
# Without selection
sens <- sum(dvec & yvec)/sum(dvec)
spec <- sum(!dvec & !yvec)/sum(!dvec)
# With selection
sens2 <- sum(dvec & yvec & sel_vec)/sum(dvec & sel_vec)
spec2 <- sum(!dvec & !yvec & sel_vec)/sum(!dvec & sel_vec)
tribble(
~metric, ~no_select, ~select,
"Sens", sens, sens2,
"Spec", spec, spec2
) |> mutate(prev = prev)
})
results |>
pivot_longer(cols = c("no_select", "select"),
names_to = "Selection",
values_to = "Value") |>
ggplot(aes(prev, Value, colour = Selection)) +
geom_point() +
geom_line() +
facet_grid(~metric) +
cowplot::theme_minimal_hgrid() +
scale_y_continuous(limits = c(0, 1)) +
xlab("Prevalence") + ylab("")
@turgeonmaxime
Copy link
Author

turgeonmaxime commented Jun 17, 2022

People typically assume that sensitivity and specificity are properties of a diagnostic test, but they actually depend on the population. This small simulation study looks at this.

We have a simple data generating mechanism:

  • D is the true disease status
  • X is a continuous marker of the disease. We assume that X | D is normally distributed, with mean -1, 1 depending on whether D=0 or 1, respectively.
  • Y is the result of the test. We assume that the cutoff value of the test is X = 0, so that positive values of X yield Y = 1, and negative values of X yield Y = 0.

Finally, we compare two scenarios, one where selection depends on X, and one where there is no selection. We assume a simple model where the selection probability is the inverse logit of X, so that larger values of X are more likely to be selected.

This is the output of the code above:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment