Skip to content

Instantly share code, notes, and snippets.

@pedmiston
Created November 2, 2014 03:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pedmiston/acd8eac94e786ba7ea87 to your computer and use it in GitHub Desktop.
Save pedmiston/acd8eac94e786ba7ea87 to your computer and use it in GitHub Desktop.
Visualizing lmer model random effects
---
title: "Visualizing lmer model random effects"
author: "Pierce Edmiston"
date: "November 1, 2014"
output: html_document
---
I'll be exploring the differences between these three models:
```{r, eval = FALSE}
subj_intercepts_mod <- lmer(rt ~ A + (1|Subject))
subjA_intercepts_mod <- lmer(rt ~ 1 + (1|Subject:A))
subj_slopes_mod <- lmer(rt ~ A + (A|Subject))
```
Granted, the second model is rarely encountered in practice. More common would
be a nested structure (see this [crossvalidated post](http://stats.stackexchange.com/questions/121504/how-many-random-effects-to-specify-in-lmer)).
```{r, eval = FALSE}
lmer(rt ~ 1 + (1|Subject) + (1|Subject:A))
## the above random effects structure is often written as `(1|Subject/A)` the
## same way `y ~ A + B + A:B` is usually written as `y ~ A * B`.
```
Even though the non-nested `(1|Subject:A)` situation is rare, I still find it
informative as an example when exploring these models.
I'll work with a modified version of the sleepstudy dataset from `lme4`.
```{r warning = FALSE, message = FALSE}
library(lme4)
library(ggplot2)
library(dplyr)
library(reshape2)
library(broom) # install_github("dgrtwo/broom")
library(stringr)
example <- sleepstudy %>%
mutate(A = ifelse(Days < 5, -0.5, 0.5)) %>%
select(Subject, A, Reaction)
head(example, n = 11)
```
I'll start by plotting the means.
```{r}
base_plot <- ggplot(example, aes(x = A, y = Reaction)) +
stat_summary(aes(fill = factor(A)), fun.y = mean, geom = "bar", color = "black") +
scale_fill_manual(values = c("#66c2a5", "#8da0cb")) + # colorbrewer2.org
theme(legend.position = "none")
base_plot
```
# Part 1: `subj_intercepts_mod`
First I'll fit an `lmer` model that allows the intercept to vary across
subjects.
```{r}
subj_intercepts_mod <- lmer(Reaction ~ A + (1|Subject), data = example)
# broom::tidy turns `fixef(subj_intercepts_mod)` into a data.frame
fixed_params <- tidy(subj_intercepts_mod, effects = "fixed")[,c("term", "estimate")]
fixed_params
```
Although it's sometimes helpful to think about model parameters (and you can
draw them easily with `ggplot2::geom_abline`), I find it more beneficial
in a simple design like this to deal in estimates. I'll write a little function
to speed up this conversion that seems like overkill now but it will come in
handy later.
```{r}
# converts parameters of a `Reaction ~ (Intercept) + A` model into estimates,
# assumes A is a 0-centered, unit-weighted, dichotomous variable
convert_parameters_to_estimates <- function(tidy_frame, id_var = ".") {
tidy_frame %>%
dcast(as.formula(paste(id_var, "term", sep="~")), value.var = "estimate") %>%
mutate(`-0.5` = `(Intercept)` - A/2, `0.5` = `(Intercept)` + A/2) %>%
select(-`(Intercept)`, -A) %>%
melt(idvars = id_var, measure.vars = c("-0.5", "0.5"),
variable.name = "A", value.name = "Reaction") %>%
mutate(A = as.numeric(as.character(A)))
}
fixed_estimates <- convert_parameters_to_estimates(fixed_params)[,c("A", "Reaction")]
fixed_estimates
# sanity check
example %>% group_by(A) %>% summarize(Reaction = mean(Reaction)) %>%
merge(., fixed_estimates, by = "A", suffixes = c("_mean", "_model"))
```
It's possible to turn parameters from the model into estimates that make sense;
now let's do the same thing with random effects. How will the model's
random effect parameters, when converted to estimates, compare to the averages
for each subject that we can calculate by hand?
```{r}
random_params <- tidy(subj_intercepts_mod, effect = "random")
random_estimates <- convert_parameters_to_estimates(random_params, id_var = "level")
fixed_slopes_plot <- base_plot +
geom_point(data = random_estimates, shape = 17, size = 3) +
geom_line(aes(group = level), data = random_estimates)
fixed_slopes_plot
fixed_slopes_plot +
stat_summary(aes(group = Subject), fun.y = mean, # means from raw data
geom = "point", shape = 19, size = 4, color = "#fc8d62", alpha = 0.6)
```
Of course the reason the two sets of points don't line up is because we are only
allowing the subject's overall Reaction to vary, not the subject's overall
Reaction in each condition. Applying the same slope to each subject, this is
the best we can do to account for variance:
```{r}
base_plot +
geom_line(aes(group = level), data = random_estimates) +
## calculate mean Reaction by subject using `stat_summary`
stat_summary(aes(x=0.0, y=Reaction, group=level), data = random_estimates, fun.y = mean,
geom = "point", shape = 17, size = 3) +
stat_summary(aes(x=0.0, group=Subject), fun.y = mean,
geom = "point", shape = 19, size = 4, color = "#fc8d62", alpha = 0.6)
## Well, **almost** accurate estimates of each subject's overall mean Reaction.
## Note that the random estimate means (triangles) are closer to the overall mean,
## reflecting that the model assumes each subject's mean is closer to the overall
## average than it actually is---a fundamental "assumption" of a multilevel model.
```
## Conclusion
```{r, eval = FALSE}
subj_intercepts_mod <- lmer(Reaction ~ A + (1|Subject))
```
A model that allows intercepts to vary across subjects does just that: it
does a great job of estimating overall Reaction for each subject, but it is
limited in estimating the effect of `A` on Reaction.
# Part 2: `subjA_intercepts_mod`
We're looking for a way to capture the fact that all of the following
by-subject lines don't have the same slope.
```{r}
subj_means_plot <- base_plot +
stat_summary(aes(group = Subject), fun.y = mean, # means from raw data
geom = "point", shape = 19, size = 4, color = "#fc8d62") +
stat_summary(aes(group = Subject), fun.y = mean,
geom = "line", size = 1.2, color = "#fc8d62")
subj_means_plot
```
One way to give the model some flexibility would be to "sever the connection"
between the measurements on the left bar from those in the
right bar.
```{r}
example$SubAject <- with(example, paste(Subject, A, sep = ":"))
subjA_intercepts_mod <- lmer(Reaction ~ 1 + (1|SubAject), data = example)
```
Why make a new, hideously-named variable `SubAject`? Because if the model can't
understand the relationship between `Subject` and `A`, I shouldn't be able to
either! We've severed the connection between scores on the left and scores on
the right, and given the model more flexibility to estimate the effects.
Of course, nothing is preventing me from prying apart in the model what I glued
together in the data (it's R, after all), so let's take a look.
```{r}
subjA_estimates <- tidy(subjA_intercepts_mod, effect = "random")
# split Subject:A into Subject and A
subjA_estimates[,c("Subject", "A")] <- as.data.frame(str_split_fixed(subjA_estimates$level, ":", 2))
subjA_estimates <- subjA_estimates %>%
select(Subject, A, Reaction = estimate) %>%
mutate(A = as.numeric(as.character(A))) # thanks R!
## It's interesting to note that prying apart the mashed variabl, leads to two
## values for each subject even though I didn't specificy `A` as a "fixed
## effect" in the model.
head(subjA_estimates, n = 3)
model_estimates_plot <- subj_means_plot +
geom_point(aes(x = A-0.05), data = random_estimates, size = 3, shape = 17) +
geom_point(aes(x = A+0.05), data = subjA_estimates, size = 3, shape = 15)
model_estimates_plot + annotate(geom = "text", x = -0.48, y = 55, size = 4,
label = "I Am Legend:\ntriangles=subj_intercepts_mod\ncircles=**subj_means**\nsquares=subjA_intercepts_mod")
```
`(1|Subject:A)` does indeed lead to better estimates of the subject means than
`(1|Subject)`, but in the process we "undid" the thing we really cared about---
the average effect of `A`---and we only have a single fixed effect for intercept.
```{r}
tidy(subjA_intercepts_mod, effects = "fixed")
```
Adding `A` in as a fixed factor would indeed get us a parameter, but what would
does it tell you? In this context it would be as if you took each of
your participants, cut them in two, and assigned each half to a different
condition, and forgot that the two halves know each other!
```{r}
subjA_fixed_mod <- lmer(Reaction ~ A + (1|SubAject), data = example)
tidy(subjA_fixed_mod, effects = "fixed")
```
## Conclusion
Although non-nested random effects (e.g., `(1|Subject:A)`) are not particularly
useful in this case, creating new grouping variables is demonstrably effective
in giving the model more flexibility.
# Part 3: `subj_slopes_mod`
The better solution is to allow intercepts **and slopes** to vary across subjects.
```{r}
subj_slopes_mod <- lmer(Reaction ~ A + (A|Subject), data = example)
# the random intercept is implied, so `(A|Subject) == (1+A|Subject)`
subj_slopes_parameters <- tidy(subj_slopes_mod, effect = "random") %>% select(-group)
subj_slopes_estimates <- convert_parameters_to_estimates(subj_slopes_parameters, id_var = "level")
model_estimates_plot +
geom_point(aes(x = A+0.1), data = subj_slopes_estimates, shape = 18, size = 4) +
annotate(geom = "text", x = -0.48, y = 73, size = 4,
label = "I Am Legend:\ncircles=subj_intercepts_mod\ntriangles=**subj_means**\nsquares=suAbj_intercepts_mod\ndiamonds=subj_slopes_mod")
```
Note that we've added two parameters to our original model: one for random
slopes, and another for the relationship (i.e., correlation) between random
intercepts and random slopes.
```{r}
anova(subj_intercepts_mod, subj_slopes_mod)
## You can remove the random correlation parameter by only specifying one
## random effect at a time: `(1|Subject) + (0+A|Subject)`
```
However, the more complicated model is still better than `subjA_intercept_mod`.
```{r}
anova(subj_slopes_mod, subjA_intercepts_mod)
```
## Conclusion
Fully specifying random intercepts and slopes capitalizes on what hierarchical
mixed effects models are supposed to do: explain variance at multiple levels.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment