Skip to content

Instantly share code, notes, and snippets.

@turgeonmaxime
Last active March 5, 2019 14:58
Show Gist options
  • Save turgeonmaxime/8822d0ef9f43ff3d5f4e6c04159dd4bc to your computer and use it in GitHub Desktop.
Save turgeonmaxime/8822d0ef9f43ff3d5f4e6c04159dd4bc to your computer and use it in GitHub Desktop.
In noisy studies, we tend to overestimate the effect size when we select on statistical significance
set.seed(12345)
mu <- 0.5
sigma <- 1
n <- 25
B <- 1000
results <- replicate(B, {
data <- rnorm(n, mu, sigma)
mu_hat <- mean(data)
sigma_hat <- sd(data)/sqrt(n)
pval <- 2*pnorm(abs(mu_hat)/sigma_hat,
lower.tail = FALSE)
return(c(mu_hat, sigma_hat, pval))
})
library(tidyverse)
tibble(mu_hat = results[1, ],
selection = FALSE) %>%
bind_rows(
tibble(mu_hat = results[1, results[3,] < 0.05],
selection = TRUE)
) %>%
ggplot(aes(mu_hat, fill = selection)) +
geom_density(alpha = 0.5) +
geom_vline(xintercept = mu,
linetype = 'dashed') +
theme_minimal()
@turgeonmaxime
Copy link
Author

Note that as the signal to noise ratio increases, the bias decreases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment