pedmiston/visualizing-lmer.Rmd

## visualizing-lmer.Rmd
---
title: "Visualizing lmer model random effects"
author: "Pierce Edmiston"
date: "November 1, 2014"
output: html_document
---

I'll be exploring the differences between these three models:

```{r, eval = FALSE}
subj_intercepts_mod <- lmer(rt ~ A + (1|Subject))
subjA_intercepts_mod <- lmer(rt ~ 1 + (1|Subject:A))
subj_slopes_mod <- lmer(rt ~ A + (A|Subject))
```

Granted, the second model is rarely encountered in practice. More common would
be a nested structure (see this [crossvalidated post](http://stats.stackexchange.com/questions/121504/how-many-random-effects-to-specify-in-lmer)).

```{r, eval = FALSE}
lmer(rt ~ 1 + (1|Subject) + (1|Subject:A))
## the above random effects structure is often written as `(1|Subject/A)` the
## same way `y ~ A + B + A:B` is usually written as `y ~ A * B`.
```

Even though the non-nested `(1|Subject:A)` situation is rare, I still find it
informative as an example when exploring these models.

I'll work with a modified version of the sleepstudy dataset from `lme4`.

```{r warning = FALSE, message = FALSE}
library(lme4)
library(ggplot2)

library(dplyr)
library(reshape2)
library(broom)    # install_github("dgrtwo/broom")

library(stringr)

example <- sleepstudy %>%
  mutate(A = ifelse(Days < 5, -0.5, 0.5)) %>%
  select(Subject, A, Reaction)
head(example, n = 11)
```

I'll start by plotting the means.

```{r}
base_plot <- ggplot(example, aes(x = A, y = Reaction)) +
  stat_summary(aes(fill = factor(A)), fun.y = mean, geom = "bar", color = "black") +
  scale_fill_manual(values = c("#66c2a5", "#8da0cb")) +  # colorbrewer2.org
  theme(legend.position = "none")
base_plot
```

# Part 1: `subj_intercepts_mod`

First I'll fit an `lmer` model that allows the intercept to vary across
subjects.

```{r}
subj_intercepts_mod <- lmer(Reaction ~ A + (1|Subject), data = example)
# broom::tidy turns `fixef(subj_intercepts_mod)` into a data.frame
fixed_params <- tidy(subj_intercepts_mod, effects = "fixed")[,c("term", "estimate")]
fixed_params
```

Although it's sometimes helpful to think about model parameters (and you can
draw them easily with `ggplot2::geom_abline`), I find it more beneficial
in a simple design like this to deal in estimates. I'll write a little function
to speed up this conversion that seems like overkill now but it will come in
handy later.

```{r}
# converts parameters of a `Reaction ~ (Intercept) + A` model into estimates,
# assumes A is a 0-centered, unit-weighted, dichotomous variable
convert_parameters_to_estimates <- function(tidy_frame, id_var = ".") {
  tidy_frame %>%
    dcast(as.formula(paste(id_var, "term", sep="~")), value.var = "estimate") %>%
    mutate(`-0.5` = `(Intercept)` - A/2, `0.5` = `(Intercept)` + A/2) %>%
    select(-`(Intercept)`, -A) %>%
    melt(idvars = id_var, measure.vars = c("-0.5", "0.5"),
         variable.name = "A", value.name = "Reaction") %>%
    mutate(A = as.numeric(as.character(A)))
}

fixed_estimates <- convert_parameters_to_estimates(fixed_params)[,c("A", "Reaction")]
fixed_estimates

# sanity check
example %>% group_by(A) %>% summarize(Reaction = mean(Reaction)) %>%
  merge(., fixed_estimates, by = "A", suffixes = c("_mean", "_model"))
```

It's possible to turn parameters from the model into estimates that make sense;
now let's do the same thing with random effects. How will the model's
random effect parameters, when converted to estimates, compare to the averages
for each subject that we can calculate by hand?

```{r}
random_params <- tidy(subj_intercepts_mod, effect = "random")
random_estimates <- convert_parameters_to_estimates(random_params, id_var = "level")

fixed_slopes_plot <- base_plot +
  geom_point(data = random_estimates, shape = 17, size = 3) +
  geom_line(aes(group = level), data = random_estimates)
fixed_slopes_plot

fixed_slopes_plot +
  stat_summary(aes(group = Subject), fun.y = mean,  # means from raw data
               geom = "point", shape = 19, size = 4, color = "#fc8d62", alpha = 0.6)
```

Of course the reason the two sets of points don't line up is because we are only
allowing the subject's overall Reaction to vary, not the subject's overall
Reaction in each condition. Applying the same slope to each subject, this is
the best we can do to account for variance:

```{r}
base_plot +
  geom_line(aes(group = level), data = random_estimates) +
  ## calculate mean Reaction by subject using `stat_summary`
  stat_summary(aes(x=0.0, y=Reaction, group=level), data = random_estimates, fun.y = mean,
               geom = "point", shape = 17, size = 3) +
  stat_summary(aes(x=0.0, group=Subject), fun.y = mean,
               geom = "point", shape = 19, size = 4, color = "#fc8d62", alpha = 0.6)

## Well, **almost** accurate estimates of each subject's overall mean Reaction.
## Note that the random estimate means (triangles) are closer to the overall mean,
## reflecting that the model assumes each subject's mean is closer to the overall
## average than it actually is---a fundamental "assumption" of a multilevel model.
```

## Conclusion

```{r, eval = FALSE}
subj_intercepts_mod <- lmer(Reaction ~ A + (1|Subject))
```

A model that allows intercepts to vary across subjects does just that: it
does a great job of estimating overall Reaction for each subject, but it is
limited in estimating the effect of `A` on Reaction.

# Part 2: `subjA_intercepts_mod`

We're looking for a way to capture the fact that all of the following
by-subject lines don't have the same slope.

```{r}
subj_means_plot <- base_plot +
  stat_summary(aes(group = Subject), fun.y = mean,  # means from raw data
               geom = "point", shape = 19, size = 4, color = "#fc8d62") +
  stat_summary(aes(group = Subject), fun.y = mean,
               geom = "line", size = 1.2, color = "#fc8d62")
subj_means_plot
```

One way to give the model some flexibility would be to "sever the connection"
between the measurements on the left bar from those in the
right bar.

```{r}
example$SubAject <- with(example, paste(Subject, A, sep = ":"))
subjA_intercepts_mod <- lmer(Reaction ~ 1 + (1|SubAject), data = example)
```

Why make a new, hideously-named variable `SubAject`? Because if the model can't
understand the relationship between `Subject` and `A`, I shouldn't be able to
either! We've severed the connection between scores on the left and scores on
the right, and given the model more flexibility to estimate the effects.

Of course, nothing is preventing me from prying apart in the model what I glued
together in the data (it's R, after all), so let's take a look.

```{r}
subjA_estimates <- tidy(subjA_intercepts_mod, effect = "random")

# split Subject:A into Subject and A
subjA_estimates[,c("Subject", "A")] <- as.data.frame(str_split_fixed(subjA_estimates$level, ":", 2))
subjA_estimates <- subjA_estimates %>%
  select(Subject, A, Reaction = estimate) %>%
  mutate(A = as.numeric(as.character(A))) # thanks R!

## It's interesting to note that prying apart the mashed variabl, leads to two
## values for each subject even though I didn't specificy `A` as a "fixed
## effect" in the model.
head(subjA_estimates, n = 3)

model_estimates_plot <- subj_means_plot +
  geom_point(aes(x = A-0.05), data = random_estimates, size = 3, shape = 17) +
  geom_point(aes(x = A+0.05), data = subjA_estimates, size = 3, shape = 15)

model_estimates_plot + annotate(geom = "text", x = -0.48, y = 55, size = 4,
  label = "I Am Legend:\ntriangles=subj_intercepts_mod\ncircles=**subj_means**\nsquares=subjA_intercepts_mod")
```

`(1|Subject:A)` does indeed lead to better estimates of the subject means than
`(1|Subject)`, but in the process we "undid" the thing we really cared about---
the average effect of `A`---and we only have a single fixed effect for intercept.

```{r}
tidy(subjA_intercepts_mod, effects = "fixed")
```

Adding `A` in as a fixed factor would indeed get us a parameter, but what would
does it tell you? In this context it would be as if you took each of
your participants, cut them in two, and assigned each half to a different
condition, and forgot that the two halves know each other!

```{r}
subjA_fixed_mod <- lmer(Reaction ~ A + (1|SubAject), data = example)
tidy(subjA_fixed_mod, effects = "fixed")
```

## Conclusion

Although non-nested random effects (e.g., `(1|Subject:A)`) are not particularly
useful in this case, creating new grouping variables is demonstrably effective
in giving the model more flexibility.

# Part 3: `subj_slopes_mod`

The better solution is to allow intercepts **and slopes** to vary across subjects.

```{r}
subj_slopes_mod <- lmer(Reaction ~ A + (A|Subject), data = example)
# the random intercept is implied, so `(A|Subject) == (1+A|Subject)`

subj_slopes_parameters <- tidy(subj_slopes_mod, effect = "random") %>% select(-group)
subj_slopes_estimates <- convert_parameters_to_estimates(subj_slopes_parameters, id_var = "level")

model_estimates_plot +
  geom_point(aes(x = A+0.1), data = subj_slopes_estimates, shape = 18, size = 4) +
  annotate(geom = "text", x = -0.48, y = 73, size = 4,
    label = "I Am Legend:\ncircles=subj_intercepts_mod\ntriangles=**subj_means**\nsquares=suAbj_intercepts_mod\ndiamonds=subj_slopes_mod")
```

Note that we've added two parameters to our original model: one for random
slopes, and another for the relationship (i.e., correlation) between random
intercepts and random slopes.

```{r}
anova(subj_intercepts_mod, subj_slopes_mod)

## You can remove the random correlation parameter by only specifying one
## random effect at a time: `(1|Subject) + (0+A|Subject)`
```

However, the more complicated model is still better than `subjA_intercept_mod`.

```{r}
anova(subj_slopes_mod, subjA_intercepts_mod)
```

## Conclusion

Fully specifying random intercepts and slopes capitalizes on what hierarchical
mixed effects models are supposed to do: explain variance at multiple levels.
	---
	title: "Visualizing lmer model random effects"
	author: "Pierce Edmiston"
	date: "November 1, 2014"
	output: html_document
	---

	I'll be exploring the differences between these three models:

	```{r, eval = FALSE}
	subj_intercepts_mod <- lmer(rt ~ A + (1\|Subject))
	subjA_intercepts_mod <- lmer(rt ~ 1 + (1\|Subject:A))
	subj_slopes_mod <- lmer(rt ~ A + (A\|Subject))
	```

	Granted, the second model is rarely encountered in practice. More common would
	be a nested structure (see this [crossvalidated post](http://stats.stackexchange.com/questions/121504/how-many-random-effects-to-specify-in-lmer)).

	```{r, eval = FALSE}
	lmer(rt ~ 1 + (1\|Subject) + (1\|Subject:A))
	## the above random effects structure is often written as `(1\|Subject/A)` the
	## same way `y ~ A + B + A:B` is usually written as `y ~ A * B`.
	```

	Even though the non-nested `(1\|Subject:A)` situation is rare, I still find it
	informative as an example when exploring these models.

	I'll work with a modified version of the sleepstudy dataset from `lme4`.

	```{r warning = FALSE, message = FALSE}
	library(lme4)
	library(ggplot2)

	library(dplyr)
	library(reshape2)
	library(broom) # install_github("dgrtwo/broom")

	library(stringr)

	example <- sleepstudy %>%
	mutate(A = ifelse(Days < 5, -0.5, 0.5)) %>%
	select(Subject, A, Reaction)
	head(example, n = 11)
	```

	I'll start by plotting the means.

	```{r}
	base_plot <- ggplot(example, aes(x = A, y = Reaction)) +
	stat_summary(aes(fill = factor(A)), fun.y = mean, geom = "bar", color = "black") +
	scale_fill_manual(values = c("#66c2a5", "#8da0cb")) + # colorbrewer2.org
	theme(legend.position = "none")
	base_plot
	```

	# Part 1: `subj_intercepts_mod`

	First I'll fit an `lmer` model that allows the intercept to vary across
	subjects.

	```{r}
	subj_intercepts_mod <- lmer(Reaction ~ A + (1\|Subject), data = example)
	# broom::tidy turns `fixef(subj_intercepts_mod)` into a data.frame
	fixed_params <- tidy(subj_intercepts_mod, effects = "fixed")[,c("term", "estimate")]
	fixed_params
	```

	Although it's sometimes helpful to think about model parameters (and you can
	draw them easily with `ggplot2::geom_abline`), I find it more beneficial
	in a simple design like this to deal in estimates. I'll write a little function
	to speed up this conversion that seems like overkill now but it will come in
	handy later.

	```{r}
	# converts parameters of a `Reaction ~ (Intercept) + A` model into estimates,
	# assumes A is a 0-centered, unit-weighted, dichotomous variable
	convert_parameters_to_estimates <- function(tidy_frame, id_var = ".") {
	tidy_frame %>%
	dcast(as.formula(paste(id_var, "term", sep="~")), value.var = "estimate") %>%
	mutate(`-0.5` = `(Intercept)` - A/2, `0.5` = `(Intercept)` + A/2) %>%
	select(-`(Intercept)`, -A) %>%
	melt(idvars = id_var, measure.vars = c("-0.5", "0.5"),
	variable.name = "A", value.name = "Reaction") %>%
	mutate(A = as.numeric(as.character(A)))
	}

	fixed_estimates <- convert_parameters_to_estimates(fixed_params)[,c("A", "Reaction")]
	fixed_estimates

	# sanity check
	example %>% group_by(A) %>% summarize(Reaction = mean(Reaction)) %>%
	merge(., fixed_estimates, by = "A", suffixes = c("_mean", "_model"))
	```

	It's possible to turn parameters from the model into estimates that make sense;
	now let's do the same thing with random effects. How will the model's
	random effect parameters, when converted to estimates, compare to the averages
	for each subject that we can calculate by hand?

	```{r}
	random_params <- tidy(subj_intercepts_mod, effect = "random")
	random_estimates <- convert_parameters_to_estimates(random_params, id_var = "level")

	fixed_slopes_plot <- base_plot +
	geom_point(data = random_estimates, shape = 17, size = 3) +
	geom_line(aes(group = level), data = random_estimates)
	fixed_slopes_plot

	fixed_slopes_plot +
	stat_summary(aes(group = Subject), fun.y = mean, # means from raw data
	geom = "point", shape = 19, size = 4, color = "#fc8d62", alpha = 0.6)
	```

	Of course the reason the two sets of points don't line up is because we are only
	allowing the subject's overall Reaction to vary, not the subject's overall
	Reaction in each condition. Applying the same slope to each subject, this is
	the best we can do to account for variance:

	```{r}
	base_plot +
	geom_line(aes(group = level), data = random_estimates) +
	## calculate mean Reaction by subject using `stat_summary`
	stat_summary(aes(x=0.0, y=Reaction, group=level), data = random_estimates, fun.y = mean,
	geom = "point", shape = 17, size = 3) +
	stat_summary(aes(x=0.0, group=Subject), fun.y = mean,
	geom = "point", shape = 19, size = 4, color = "#fc8d62", alpha = 0.6)

	## Well, almost accurate estimates of each subject's overall mean Reaction.
	## Note that the random estimate means (triangles) are closer to the overall mean,
	## reflecting that the model assumes each subject's mean is closer to the overall
	## average than it actually is---a fundamental "assumption" of a multilevel model.
	```

	## Conclusion

	```{r, eval = FALSE}
	subj_intercepts_mod <- lmer(Reaction ~ A + (1\|Subject))
	```

	A model that allows intercepts to vary across subjects does just that: it
	does a great job of estimating overall Reaction for each subject, but it is
	limited in estimating the effect of `A` on Reaction.

	# Part 2: `subjA_intercepts_mod`

	We're looking for a way to capture the fact that all of the following
	by-subject lines don't have the same slope.

	```{r}
	subj_means_plot <- base_plot +
	stat_summary(aes(group = Subject), fun.y = mean, # means from raw data
	geom = "point", shape = 19, size = 4, color = "#fc8d62") +
	stat_summary(aes(group = Subject), fun.y = mean,
	geom = "line", size = 1.2, color = "#fc8d62")
	subj_means_plot
	```

	One way to give the model some flexibility would be to "sever the connection"
	between the measurements on the left bar from those in the
	right bar.

	```{r}
	example$SubAject <- with(example, paste(Subject, A, sep = ":"))
	subjA_intercepts_mod <- lmer(Reaction ~ 1 + (1\|SubAject), data = example)
	```

	Why make a new, hideously-named variable `SubAject`? Because if the model can't
	understand the relationship between `Subject` and `A`, I shouldn't be able to
	either! We've severed the connection between scores on the left and scores on
	the right, and given the model more flexibility to estimate the effects.

	Of course, nothing is preventing me from prying apart in the model what I glued
	together in the data (it's R, after all), so let's take a look.

	```{r}
	subjA_estimates <- tidy(subjA_intercepts_mod, effect = "random")

	# split Subject:A into Subject and A
	subjA_estimates[,c("Subject", "A")] <- as.data.frame(str_split_fixed(subjA_estimates$level, ":", 2))
	subjA_estimates <- subjA_estimates %>%
	select(Subject, A, Reaction = estimate) %>%
	mutate(A = as.numeric(as.character(A))) # thanks R!

	## It's interesting to note that prying apart the mashed variabl, leads to two
	## values for each subject even though I didn't specificy `A` as a "fixed
	## effect" in the model.
	head(subjA_estimates, n = 3)

	model_estimates_plot <- subj_means_plot +
	geom_point(aes(x = A-0.05), data = random_estimates, size = 3, shape = 17) +
	geom_point(aes(x = A+0.05), data = subjA_estimates, size = 3, shape = 15)

	model_estimates_plot + annotate(geom = "text", x = -0.48, y = 55, size = 4,
	label = "I Am Legend:\ntriangles=subj_intercepts_mod\ncircles=subj_means\nsquares=subjA_intercepts_mod")
	```

	`(1\|Subject:A)` does indeed lead to better estimates of the subject means than
	`(1\|Subject)`, but in the process we "undid" the thing we really cared about---
	the average effect of `A`---and we only have a single fixed effect for intercept.

	```{r}
	tidy(subjA_intercepts_mod, effects = "fixed")
	```

	Adding `A` in as a fixed factor would indeed get us a parameter, but what would
	does it tell you? In this context it would be as if you took each of
	your participants, cut them in two, and assigned each half to a different
	condition, and forgot that the two halves know each other!

	```{r}
	subjA_fixed_mod <- lmer(Reaction ~ A + (1\|SubAject), data = example)
	tidy(subjA_fixed_mod, effects = "fixed")
	```

	## Conclusion

	Although non-nested random effects (e.g., `(1\|Subject:A)`) are not particularly
	useful in this case, creating new grouping variables is demonstrably effective
	in giving the model more flexibility.

	# Part 3: `subj_slopes_mod`

	The better solution is to allow intercepts and slopes to vary across subjects.

	```{r}
	subj_slopes_mod <- lmer(Reaction ~ A + (A\|Subject), data = example)
	# the random intercept is implied, so `(A\|Subject) == (1+A\|Subject)`

	subj_slopes_parameters <- tidy(subj_slopes_mod, effect = "random") %>% select(-group)
	subj_slopes_estimates <- convert_parameters_to_estimates(subj_slopes_parameters, id_var = "level")

	model_estimates_plot +
	geom_point(aes(x = A+0.1), data = subj_slopes_estimates, shape = 18, size = 4) +
	annotate(geom = "text", x = -0.48, y = 73, size = 4,
	label = "I Am Legend:\ncircles=subj_intercepts_mod\ntriangles=subj_means\nsquares=suAbj_intercepts_mod\ndiamonds=subj_slopes_mod")
	```

	Note that we've added two parameters to our original model: one for random
	slopes, and another for the relationship (i.e., correlation) between random
	intercepts and random slopes.

	```{r}
	anova(subj_intercepts_mod, subj_slopes_mod)

	## You can remove the random correlation parameter by only specifying one
	## random effect at a time: `(1\|Subject) + (0+A\|Subject)`
	```

	However, the more complicated model is still better than `subjA_intercept_mod`.

	```{r}
	anova(subj_slopes_mod, subjA_intercepts_mod)
	```

	## Conclusion

	Fully specifying random intercepts and slopes capitalizes on what hierarchical
	mixed effects models are supposed to do: explain variance at multiple levels.