Skip to content

Instantly share code, notes, and snippets.

@jtleek
Created December 7, 2016 16:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jtleek/e968aef6937248db02003e12004dafde to your computer and use it in GitHub Desktop.
Save jtleek/e968aef6937248db02003e12004dafde to your computer and use it in GitHub Desktop.
bootstrapping residuals - the dumb way
f1 = formula(log(cost) ~ date)
lm1 = lm(f1,data=nuclear)
stat = lm1$coeff[2]
resid = lm1$residuals
fit = lm1$fitted.values
n = dim(nuclear)[1]
B = 500
stat0 = rep(0,B)
set.seed(1262016)
for(i in 1:B){
resid0 = resid[sample(1:n,replace=T)]
nuclear$y0 = fit + resid0
lm0 = lm(y0 ~ date,data=nuclear)
stat0[i] = lm0$coeff[2]
}
hist(stat0)
abline(v=stat,col="blue")
@jennybc
Copy link

jennybc commented Dec 7, 2016

This works (I think?) and is very hacky, but feels like you should just use replicate(). I'm not sure it makes sense to use purrr or base equivalents unless you want to hold on to the bootstrap data and/or fits for downstream work.

library(purrr)
iris_fit <- lm(Sepal.Length ~ Sepal.Width, data = iris)
jfun <- function(iris_fit) {
  n <- length(fitted(iris_fit))
  boot <- fitted(iris_fit) + resid(iris_fit)[sample(n, replace = TRUE)]
  boot_fit <- lm(boot ~ iris$Sepal.Width)
  coef(boot_fit)[2]
}
boot_coefs <- map_dbl(1:500, ~ jfun(iris_fit))
hist(boot_coefs)
abline(v = coef(iris_fit)[2],col = "blue")


I think if you want to hold on to all that intermediate stuff, a pipe-y purrr workflow is handy.

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
B <- 500
iris_fit <- lm(Sepal.Length ~ Sepal.Width, data = iris)
n <- length(fitted(iris_fit))
x <- tibble(i = seq_len(B))
x <- x %>% 
  mutate(dat = map(i, ~ fitted(iris_fit) + resid(iris_fit)[sample(n, replace = TRUE)]),
         fit = map(dat, ~ lm(.x ~ iris$Sepal.Width)),
         coef = map(fit, coef),
         slope = map_dbl(coef, 2))
x
#> # A tibble: 500 × 5
#>        i         dat      fit      coef       slope
#>    <int>      <list>   <list>    <list>       <dbl>
#> 1      1 <dbl [150]> <S3: lm> <dbl [2]> -0.39698658
#> 2      2 <dbl [150]> <S3: lm> <dbl [2]> -0.17580813
#> 3      3 <dbl [150]> <S3: lm> <dbl [2]> -0.14242184
#> 4      4 <dbl [150]> <S3: lm> <dbl [2]> -0.08907165
#> 5      5 <dbl [150]> <S3: lm> <dbl [2]> -0.45252847
#> 6      6 <dbl [150]> <S3: lm> <dbl [2]> -0.22366493
#> 7      7 <dbl [150]> <S3: lm> <dbl [2]> -0.49122751
#> 8      8 <dbl [150]> <S3: lm> <dbl [2]> -0.28050178
#> 9      9 <dbl [150]> <S3: lm> <dbl [2]> -0.27693677
#> 10    10 <dbl [150]> <S3: lm> <dbl [2]> -0.18510398
#> # ... with 490 more rows
hist(x$slope)
abline(v = coef(iris_fit)[2],col = "blue")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment