mittenchops/amelia-test.R

## amelia-test.R
# got some help here: http://stackoverflow.com/questions/22739070/function-of-a-list-of-data-frames-to-return-also-a-data-frame-in-r/

library(Amelia)

set.seed(1234)
myN <- 50
var1 <- runif(myN)
var2 <-  2*var1 + rnorm(myN, 0, 1)
var3 <-  var1^2 + rnorm(myN, 0, 1)

df <- data.frame(cbind(var1,var2, var3))
df$Y <- rowSums(df, na.rm=T) + rnorm(myN, 0, 0.1)
df[seq(10,25),1] <- NA
df[seq(20,30), 2] <- NA
df[seq(26,40),3] <- NA
print(df)
#plot(df)

fit <- lm(Y ~ ., data=df)
summary(fit)

IMP <- amelia(df, m=100)

IMP$imputations$imp1
IMP$imputations$imp1

#lapply(IMP$imputations, mean)

#bestimp <- rowMeans(as.data.frame(IMP$imputations))

imputer <- function(mylist){
    df <- Reduce(rbind, lapply(mylist, function(df) {
        df$id <- seq_len(nrow(df))
        df
    }))
    df <- aggregate(. ~ id, df, mean)[, -1]
    df$id <- NULL
    return(df)
}

x1 <- imputer(IMP$imputations)
#trueN[IMP$missMatrix]
#IMP$imputations$imp1[IMP$missMatrix]

imputer2 <- function(mylist) {
    return(Reduce("+", mylist)/length(mylist))
}

x <- imputer(IMP$imputations)
y <- imputer2(IMP$imputations)
x[!(x == y)]
y[!(x == y)]
head(x[!(x == y)]) == head(y[!(x == y)])
head(x[!(x == y)]);head(y[!(x == y)])

##################################################

Observations:
1. Interesting that there are differences between imputer1 and imputer2.  I think it's a discrete math roundoff thing, not sure which is better, if either is.  To the level of precision R shows me, they look the same.
2. Obviously, not the right way to do the mean on the monte carlo variations---you should apply your function then take the mean.  f(g(x)) != g(f(x)).  I started this way because I wanted to do some testing of how much variation we're getting by Rubin's Rules; what I learned is that doing apply statements across lists of lists of data.frames requires some syntax backflips in R, and not something easy for a quick check.  I'll look into it later.
3. imputer2 clearly wins for speed.  When m=1000, the results are not even close.  (Because it doesn't copy.)
	# got some help here: http://stackoverflow.com/questions/22739070/function-of-a-list-of-data-frames-to-return-also-a-data-frame-in-r/

	library(Amelia)

	set.seed(1234)
	myN <- 50
	var1 <- runif(myN)
	var2 <- 2*var1 + rnorm(myN, 0, 1)
	var3 <- var1^2 + rnorm(myN, 0, 1)

	df <- data.frame(cbind(var1,var2, var3))
	df$Y <- rowSums(df, na.rm=T) + rnorm(myN, 0, 0.1)
	df[seq(10,25),1] <- NA
	df[seq(20,30), 2] <- NA
	df[seq(26,40),3] <- NA
	print(df)
	#plot(df)

	fit <- lm(Y ~ ., data=df)
	summary(fit)

	IMP <- amelia(df, m=100)

	IMP$imputations$imp1
	IMP$imputations$imp1

	#lapply(IMP$imputations, mean)

	#bestimp <- rowMeans(as.data.frame(IMP$imputations))

	imputer <- function(mylist){
	df <- Reduce(rbind, lapply(mylist, function(df) {
	df$id <- seq_len(nrow(df))
	df
	}))
	df <- aggregate(. ~ id, df, mean)[, -1]
	df$id <- NULL
	return(df)
	}

	x1 <- imputer(IMP$imputations)
	#trueN[IMP$missMatrix]
	#IMP$imputations$imp1[IMP$missMatrix]

	imputer2 <- function(mylist) {
	return(Reduce("+", mylist)/length(mylist))
	}

	x <- imputer(IMP$imputations)
	y <- imputer2(IMP$imputations)
	x[!(x == y)]
	y[!(x == y)]
	head(x[!(x == y)]) == head(y[!(x == y)])
	head(x[!(x == y)]);head(y[!(x == y)])

	##################################################

	Observations:
	1. Interesting that there are differences between imputer1 and imputer2. I think it's a discrete math roundoff thing, not sure which is better, if either is. To the level of precision R shows me, they look the same.
	2. Obviously, not the right way to do the mean on the monte carlo variations---you should apply your function then take the mean. f(g(x)) != g(f(x)). I started this way because I wanted to do some testing of how much variation we're getting by Rubin's Rules; what I learned is that doing apply statements across lists of lists of data.frames requires some syntax backflips in R, and not something easy for a quick check. I'll look into it later.
	3. imputer2 clearly wins for speed. When m=1000, the results are not even close. (Because it doesn't copy.)