Skip to content

Instantly share code, notes, and snippets.

@mittenchops
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mittenchops/9866880 to your computer and use it in GitHub Desktop.
Save mittenchops/9866880 to your computer and use it in GitHub Desktop.
Testing Amelia for multiple imputation
# got some help here: http://stackoverflow.com/questions/22739070/function-of-a-list-of-data-frames-to-return-also-a-data-frame-in-r/
library(Amelia)
set.seed(1234)
myN <- 50
var1 <- runif(myN)
var2 <- 2*var1 + rnorm(myN, 0, 1)
var3 <- var1^2 + rnorm(myN, 0, 1)
df <- data.frame(cbind(var1,var2, var3))
df$Y <- rowSums(df, na.rm=T) + rnorm(myN, 0, 0.1)
df[seq(10,25),1] <- NA
df[seq(20,30), 2] <- NA
df[seq(26,40),3] <- NA
print(df)
#plot(df)
fit <- lm(Y ~ ., data=df)
summary(fit)
IMP <- amelia(df, m=100)
IMP$imputations$imp1
IMP$imputations$imp1
#lapply(IMP$imputations, mean)
#bestimp <- rowMeans(as.data.frame(IMP$imputations))
imputer <- function(mylist){
df <- Reduce(rbind, lapply(mylist, function(df) {
df$id <- seq_len(nrow(df))
df
}))
df <- aggregate(. ~ id, df, mean)[, -1]
df$id <- NULL
return(df)
}
x1 <- imputer(IMP$imputations)
#trueN[IMP$missMatrix]
#IMP$imputations$imp1[IMP$missMatrix]
imputer2 <- function(mylist) {
return(Reduce("+", mylist)/length(mylist))
}
x <- imputer(IMP$imputations)
y <- imputer2(IMP$imputations)
x[!(x == y)]
y[!(x == y)]
head(x[!(x == y)]) == head(y[!(x == y)])
head(x[!(x == y)]);head(y[!(x == y)])
##################################################
Observations:
1. Interesting that there are differences between imputer1 and imputer2. I think it's a discrete math roundoff thing, not sure which is better, if either is. To the level of precision R shows me, they look the same.
2. Obviously, not the right way to do the mean on the monte carlo variations---you should apply your function then take the mean. f(g(x)) != g(f(x)). I started this way because I wanted to do some testing of how much variation we're getting by Rubin's Rules; what I learned is that doing apply statements across lists of lists of data.frames requires some syntax backflips in R, and not something easy for a quick check. I'll look into it later.
3. imputer2 clearly wins for speed. When m=1000, the results are not even close. (Because it doesn't copy.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment