Skip to content

Instantly share code, notes, and snippets.

@turgeonmaxime
Last active October 28, 2021 19:26
Show Gist options
  • Save turgeonmaxime/a5e7e40885d6c5f6ed131812ae9a69bc to your computer and use it in GitHub Desktop.
Save turgeonmaxime/a5e7e40885d6c5f6ed131812ae9a69bc to your computer and use it in GitHub Desktop.
Simulation study looking at the effect of outliers on the Type I error rate of the t-test
B <- 1000
n <- 20
sigma <- 10
p <- 0.9
alpha <- 0.05

results <- replicate(B, {
    norm_vars1 <- rnorm(n)
    # Contaminated normal
    sigmas <- sample(c(1, sigma), n, replace = TRUE,
                     prob = c(p, 1 - p))
    norm_vars2 <- rnorm(n, sd = sigmas)
    
    output <- t.test(norm_vars1, norm_vars2)
    
    return(c(output$statistic,
             output$parameter, 
             "p.value" = output$p.value))
})

# Type I error rate
mean(results["p.value",] < alpha)
#> [1] 0.019

# Sampling distribution under H0
hist(results["t",], breaks = 30, freq = FALSE)

# Theoretical density
# Note: the DFs depend on variance,
#       so we take the mean as an approximation
xvars <- seq(-3, 3, length.out = 100)
lines(xvars, dt(xvars, df = mean(results["df",])), 
      col = "blue")

# Or as a QQ-plot
qqplot(results["t",], 
       qt(ppoints(B), df = mean(results["df",])),
       xlab = "Samples", ylab = "Theory")
abline(a = 0, b = 1)

Created on 2021-10-28 by the reprex package (v2.0.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment