Skip to content

Instantly share code, notes, and snippets.

@jtleek
Created Dec 7, 2016
Embed
What would you like to do?
bootstrapping residuals - the dumb way
f1 = formula(log(cost) ~ date)
lm1 = lm(f1,data=nuclear)
stat = lm1$coeff[2]
resid = lm1$residuals
fit = lm1$fitted.values
n = dim(nuclear)[1]
B = 500
stat0 = rep(0,B)
set.seed(1262016)
for(i in 1:B){
resid0 = resid[sample(1:n,replace=T)]
nuclear$y0 = fit + resid0
lm0 = lm(y0 ~ date,data=nuclear)
stat0[i] = lm0$coeff[2]
}
hist(stat0)
abline(v=stat,col="blue")
@jennybc
Copy link

jennybc commented Dec 7, 2016

This works (I think?) and is very hacky, but feels like you should just use replicate(). I'm not sure it makes sense to use purrr or base equivalents unless you want to hold on to the bootstrap data and/or fits for downstream work.

library(purrr)
iris_fit <- lm(Sepal.Length ~ Sepal.Width, data = iris)
jfun <- function(iris_fit) {
  n <- length(fitted(iris_fit))
  boot <- fitted(iris_fit) + resid(iris_fit)[sample(n, replace = TRUE)]
  boot_fit <- lm(boot ~ iris$Sepal.Width)
  coef(boot_fit)[2]
}
boot_coefs <- map_dbl(1:500, ~ jfun(iris_fit))
hist(boot_coefs)
abline(v = coef(iris_fit)[2],col = "blue")


I think if you want to hold on to all that intermediate stuff, a pipe-y purrr workflow is handy.

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats
B <- 500
iris_fit <- lm(Sepal.Length ~ Sepal.Width, data = iris)
n <- length(fitted(iris_fit))
x <- tibble(i = seq_len(B))
x <- x %>% 
  mutate(dat = map(i, ~ fitted(iris_fit) + resid(iris_fit)[sample(n, replace = TRUE)]),
         fit = map(dat, ~ lm(.x ~ iris$Sepal.Width)),
         coef = map(fit, coef),
         slope = map_dbl(coef, 2))
x
#> # A tibble: 500 × 5
#>        i         dat      fit      coef       slope
#>    <int>      <list>   <list>    <list>       <dbl>
#> 1      1 <dbl [150]> <S3: lm> <dbl [2]> -0.39698658
#> 2      2 <dbl [150]> <S3: lm> <dbl [2]> -0.17580813
#> 3      3 <dbl [150]> <S3: lm> <dbl [2]> -0.14242184
#> 4      4 <dbl [150]> <S3: lm> <dbl [2]> -0.08907165
#> 5      5 <dbl [150]> <S3: lm> <dbl [2]> -0.45252847
#> 6      6 <dbl [150]> <S3: lm> <dbl [2]> -0.22366493
#> 7      7 <dbl [150]> <S3: lm> <dbl [2]> -0.49122751
#> 8      8 <dbl [150]> <S3: lm> <dbl [2]> -0.28050178
#> 9      9 <dbl [150]> <S3: lm> <dbl [2]> -0.27693677
#> 10    10 <dbl [150]> <S3: lm> <dbl [2]> -0.18510398
#> # ... with 490 more rows
hist(x$slope)
abline(v = coef(iris_fit)[2],col = "blue")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment