tomsing1/anova_in_R.Rmd

## anova_in_R.Rmd
---
title: "Anova in R"
author: "Thomas Sandmann"
date: "10/15/2019"
output: html_document
editor_options:
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r echo=FALSE}
packages_used <- c("emmeans")
for (package in packages_used) {
  if (!requireNamespace(package, quietly = TRUE)) {
    install.packages(package)
  }
}
```

```{r}
suppressPackageStartupMessages(library(emmeans))
```

# Introduction

The R programming language offers many different ways to fit common
models and perform statisticalt testing.

This document uses updated examples from the
[R Companion for the Handbook of Biological Statistics](https://rcompanion.org/rcompanion/h_01.html).

Because the
[lsmeans](https://cran.r-project.org/web/packages/lsmeans/index.html)
R package is no longer maintained and has been replaced with the
[emmeans](https://cran.r-project.org/web/packages/emmeans/index.html)
R package, the latter will be used throughout.

## Example dataset

In the following example, the effects of drug treatments `D1` and `D2` were
assayed under two different conditions `C1` and `C2`. All four combinations
were tested and were recorded in the `treatment` column of a data table:

```{r}
dataset <- data.frame(
  drug = rep(c("D1", "D1", "D2", "D2"), each = 3),
  condition = rep(c("C1", "C2", "C1", "C2"), each = 3),
  treatment = rep(c("D1:C1", "D1:C2", "D2:C1", "D2:C2"), each = 3),
  response = c(1, 1.2, 1.3, 2.1, 2.2, 2.3, 1.4, 1.6, 1.7,
               2.5, 2.6, 2.8),
  batch = factor(rep(c(1, 2, 3 ), 4))
  )
head(dataset)
```

In this dataset, we have separate `drug` and `treatment` columns. But in
addition, we have also combined these two factors into a grouping variable
called `treatment`. This will simplify how we can specify comparisons between
the groups (see below).

We can explicitly specify the order of the `treatment` covariate. (Otherwise R
will use alphabetical order.)

```{r}
dataset$treatment <- factor(
  x = dataset$treatment,
  levels = c("D1:C1", "D1:C2", "D2:C1", "D2:C2"))
```

```{r}
boxplot(response ~ drug + condition, data = dataset,
        ylab = "Response", xlab = "Treatment")
```

A visual inspection of the dataset shows that there seem to be differences
between all four treatments (= drug x condition combinations.)

## ANOVA

In addition to `drug`, `condition` and the combined `treatment` column,
the dataset also contains the `batch` column. It is not of biological interest,
but corresponds to a nuisance variable. It is important to *adjust for* its
influence,, but it will not be part of the comparisons we want to examine
(see below).

R's `lm()` function can fit many different types of linear models, including
those with categorical predictor variables (like an ANOVA).

The following command fits a linear model taking into account

- the effect of the `drug` treatment (`D2` vs `D1`)
- the effect of the `condition` (`C2` vs `C1`)
- the statistical interaction between `drug` and `condition`, (denoted with the
colon)
- the effect of the batches

Unless otherwise specified, R will consider the level of each categorical
variable in alphabetical order, e.g. it will consider `D1` the baseline for the
`drug` covariate, `C1` the baseline for the `condition` covariate and `1`
the baseline for the `batch`.

```{r}
fit1 = lm(response ~ drug + condition + drug:condition + batch, data = dataset)
```

The `fit1` object contains the mean and standard errors for each of the
treatment groups, but it does not perform statistical testing, e.g. it does
*not* return p-values.

```{r}
fit1
```

To perform a statistical test, we can pass the `fit1` object to the `anova()`
function:

```{r}
anova(fit1)
```

The output of the `anova()` function shows that there are indeed differences
between the experimental groups:

- there is a significant `drug` effect
- there is a significant effect of the `condition`
- there is a significant `batch` effect
- but there is no evidence for a drug:condition interaction, e.g. the effect of
the drug treatment (`D1` vs `D2`) is not significantly differenct between
conditions (`C1` and `C2`).

This gives us a high level overview, but which of the groups are actually
significantly different *from each other*?

### Contrasts

#### Recoding the groups of interest

In addition to the `drug` and `condition` columns in our dataset, we also have
a combined `treatment` variable, which defines the four drug x condition
combinations.

Because our biological questions are about differences between these groups,
we fit the linear model again, this time using the `treatment` column.

```{r}
fit2 = lm(response ~ treatment + batch, data = dataset)
fit2
```

As before, we can use the `anova()` function to ask whether there is evidence
that one or more of the four `treatment` groups are different from each other:

```{r}
anova(fit2)
```

To extract statistics about the *individual treatment groups* (and the
differences between them) from the `fit2` object, we make use functions from
the `emmeans` R package.

```{r}
library(emmeans)
```

For example, the `emmeans` function can return the estimated `mean`, the
standard error (`SE`) and the confidence intervals over all `treatment`
conditions from the linear model:

```{r}
group_means = emmeans(fit2, "treatment")
```

**Note:** The model we fit also included the `batch` covariate, but because
we only want to *adjust* for it, e.g. it is *not* part of the scientific
question we want to ask, it is not specified in the call to `emmeans()`. (The
output of the `emmeans()` call reminds us that the results are averaged over the
levels of the batch covariate).

But what about the *differences* between the groups? In statistics, they are
usually referred to as `contrasts`. Depending on your questions of interest,
you might be interested in

- pairwise differences between all conditions
- the differences between each condition and a specific control group
- a custom set of differences

All of the above can be extracted from the `group_means` object we created
above using the `contrast` function.

#### Pairs

To get statistics on *all* possible pairs, choose the `pairwise` method:

```{r}
contrast(group_means, method = "pairwise", adjust = "tukey")
```

The `estimate` column shows the difference between the estimated means of
each pair, and the `SE` provides the standard error of this estimate. As
expected, the statistical confirms that all pairwise comparisons are
statistically significant.

When we conduct multiple tests, we need to adjust for the number of
analyses. By default, the `contrast()` function uses Tukey's method for p-value
adjustment.[^1]

**Note:** The `batch` covariate does not feature in the comparisons, because
we did not specify it when we created the `group_means` object with the
`emmeans()` function earlier.

####  Treatment vs control

In the example dataset, we have an explicit `Control` treatment. If you are
interested in testing for differences between each of the other groups and this
reference, you can use the `trt.vs.ctrl` contrast.

```{r}
contrast(group_means, method = "trt.vs.ctrl")
```

The default multiplicity adjustment for `treatment vs control` analysis
method is `dunnettx`, a close approximation to the Dunnett adjustment.

**But how does R which of the conditions is the reference?**

When we defined the dataset, we carefully specified the `treatment` column as
a factor and defined its `levels`.

```{r}
levels(dataset$treatment)
```

By default, *first* level is used as the reference. If you want to compare to
a different reference group, let's say `D2:C1`, you can specify it in the
`contrast()` call:

```{r}
contrast(group_means, method = "trt.vs.ctrl", ref = "D2:C1")
```

#### Custom contrasts

Usually, experiments are designed to answer specific questions. To define your
own comparisons, you can specify the comparisons manually.

Is there a significant difference conditions `D1:C2` and `D2:C2`?
Let's plot the estimated means and their confidence intervals.

```{r}
plot(group_means, CIs = TRUE)
```

This plot shows each treatment group on the y-axis and its estimated mean on the
x-axis. Groups `D1:C2` and `D2:C2` are spaced relatively far apart, but is that
difference significant?

**Note:** You can see more information on plotting the output of the `emmeans()`
function by consulting the help page:

```{r eval=FALSE}
?emmeans:::plot.emmGrid
```

Another way of asking whether the `D1:C2` and `D2:C2` conditions are different
from each other is to examine whether the absolute *difference* between them
is larger than zero.

Let's take another look at the estimated means we stored in the
`group_means` object:

```{r}
group_means
```

There are five rows, one for each condition. To compare `D1:C2` and `D2:C2`,
we need to calculate the difference between the second (`D1:C2`) and the fourth
row (`D2:C2`).

We can specify the comparison of interest using a *vector* of `0`, `1` and `-1`.
For example, the following vector

```{r}
setNames(
  c(0, -1, 0, 1),
  as.data.frame(group_means)$treatment
)
```

- ignores the first and third rows, because they are set to `0`.
- pulls out the second and fourth rows, which we want to compare, because they
are set to `1` or `-1`.
- the sign (`1` or `-1`) defines how difference should be calculated; here, we
subtract the second  row (`D1:C2`) from the fourth row (`D2:C2`).

We can use this vectors to communicate the comparison of interest to the
`contrast()` function which comparison:

```{r}
contrast(group_means, method = list(
  "D2:C2 - D1:C2" = c(0, -1, 0, 1)
  ))
```

**Note:** Because we only specified a single comparison, no multiple testing
correction is applied.

To test multiple custom comparisons, we can include them all in the same list:

```{r}
contrast(group_means, method = list(
  "D2:C2 - D1:C2" = c(0, -1, 0, 1),
  "D2:C1 - D1:C1" = c(-1, 0, 1, 0)
  ), adjust = "none")
```

#### Interactions

The same vector notation can also be used to test for interactions, e.g.
*differences between comparisons*.

For example, we might want to ask whether the `drug` treatment (`D1` vs `D2`)
has different effects under the two different `conditions` (`C1` and `C2`).

First, we specify two comparisons to calculate the drug effects, separately
for each condition. Then, we compare these differences by subtracting them
from each other:

```{r}
comparisons <-  list(
  "D2:C1 - D1:C1" = c(-1, 0, 1, 0),  # drug 2 vs drug 1 in condition 1
  "D2:C2 - D1:C2" = c(0, -1, 0, 1),  # drug 2 vs drug 1 in condition 2
  "(D2:C2 - D1:C2)-(D2:C2 - D1:C2)" = c(-1, 0, 1, 0) - c(0, -1, 0, 1)
)
```

Each of these comparisons corresponds to a linear combination of the estimated
effects. Let's add the names of the treatment groups to clarify:

```{r}
lapply(comparisons, function(x) {
  setNames(x, as.data.frame(group_means)$treatment)
  })
```

We can now pass the list of comparisons to the `contrast()` function.

**Note:** Here, we choose not to adjust for multiple testing to compare the
results to those obtained with the `anova()` function earlier. Especially when
you define *many* different contrasts, you should specify `adjust = "sidak"`
instead.

```{r}
contrast(group_means, method = comparisons, adjust = "none")
```

As expected, the drug has significant effects under both conditions, with an
estimated shift in means of `0.4` in condition 1 and `0.4333` in condition 2,
e.g. the drug has slightly stronger effect in condition 2 (+ `0.0333`) - but
that difference is not statistically significant. In other words, there is
no significant evidence that the drug has *different* effects under the two
conditions. (The interaction p-value matches that obatined by the
`anova(fit1)` call we started off with, see above.)

#### Additional information

For more information on available types of pre-defined contrasts and their
associated methods of multiple testing adjustment, please consult the manual:

```{r eval=FALSE}
?emmeans::`contrast-methods`
```

***

[^1]: A number of multiple testing methods have been defined for different
use cases. As a rule of thumb, use the `tukey` method when comparing *every*
mean with every other mean, use `dunnettx` to compare *every* mean with a
control mean. Finally, use `bonferroni` or `sidak` when you manually defined a
subset of means to compare.
	---
	title: "Anova in R"
	author: "Thomas Sandmann"
	date: "10/15/2019"
	output: html_document
	editor_options:
	chunk_output_type: console
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE)
	```

	```{r echo=FALSE}
	packages_used <- c("emmeans")
	for (package in packages_used) {
	if (!requireNamespace(package, quietly = TRUE)) {
	install.packages(package)
	}
	}
	```

	```{r}
	suppressPackageStartupMessages(library(emmeans))
	```

	# Introduction

	The R programming language offers many different ways to fit common
	models and perform statisticalt testing.

	This document uses updated examples from the
	[R Companion for the Handbook of Biological Statistics](https://rcompanion.org/rcompanion/h_01.html).

	Because the
	[lsmeans](https://cran.r-project.org/web/packages/lsmeans/index.html)
	R package is no longer maintained and has been replaced with the
	[emmeans](https://cran.r-project.org/web/packages/emmeans/index.html)
	R package, the latter will be used throughout.

	## Example dataset

	In the following example, the effects of drug treatments `D1` and `D2` were
	assayed under two different conditions `C1` and `C2`. All four combinations
	were tested and were recorded in the `treatment` column of a data table:

	```{r}
	dataset <- data.frame(
	drug = rep(c("D1", "D1", "D2", "D2"), each = 3),
	condition = rep(c("C1", "C2", "C1", "C2"), each = 3),
	treatment = rep(c("D1:C1", "D1:C2", "D2:C1", "D2:C2"), each = 3),
	response = c(1, 1.2, 1.3, 2.1, 2.2, 2.3, 1.4, 1.6, 1.7,
	2.5, 2.6, 2.8),
	batch = factor(rep(c(1, 2, 3 ), 4))
	)
	head(dataset)
	```

	In this dataset, we have separate `drug` and `treatment` columns. But in
	addition, we have also combined these two factors into a grouping variable
	called `treatment`. This will simplify how we can specify comparisons between
	the groups (see below).

	We can explicitly specify the order of the `treatment` covariate. (Otherwise R
	will use alphabetical order.)

	```{r}
	dataset$treatment <- factor(
	x = dataset$treatment,
	levels = c("D1:C1", "D1:C2", "D2:C1", "D2:C2"))
	```

	```{r}
	boxplot(response ~ drug + condition, data = dataset,
	ylab = "Response", xlab = "Treatment")
	```

	A visual inspection of the dataset shows that there seem to be differences
	between all four treatments (= drug x condition combinations.)

	## ANOVA

	In addition to `drug`, `condition` and the combined `treatment` column,
	the dataset also contains the `batch` column. It is not of biological interest,
	but corresponds to a nuisance variable. It is important to adjust for its
	influence,, but it will not be part of the comparisons we want to examine
	(see below).

	R's `lm()` function can fit many different types of linear models, including
	those with categorical predictor variables (like an ANOVA).

	The following command fits a linear model taking into account

	- the effect of the `drug` treatment (`D2` vs `D1`)
	- the effect of the `condition` (`C2` vs `C1`)
	- the statistical interaction between `drug` and `condition`, (denoted with the
	colon)
	- the effect of the batches

	Unless otherwise specified, R will consider the level of each categorical
	variable in alphabetical order, e.g. it will consider `D1` the baseline for the
	`drug` covariate, `C1` the baseline for the `condition` covariate and `1`
	the baseline for the `batch`.

	```{r}
	fit1 = lm(response ~ drug + condition + drug:condition + batch, data = dataset)
	```

	The `fit1` object contains the mean and standard errors for each of the
	treatment groups, but it does not perform statistical testing, e.g. it does
	not return p-values.

	```{r}
	fit1
	```

	To perform a statistical test, we can pass the `fit1` object to the `anova()`
	function:

	```{r}
	anova(fit1)
	```

	The output of the `anova()` function shows that there are indeed differences
	between the experimental groups:

	- there is a significant `drug` effect
	- there is a significant effect of the `condition`
	- there is a significant `batch` effect
	- but there is no evidence for a drug:condition interaction, e.g. the effect of
	the drug treatment (`D1` vs `D2`) is not significantly differenct between
	conditions (`C1` and `C2`).

	This gives us a high level overview, but which of the groups are actually
	significantly different from each other?

	### Contrasts

	#### Recoding the groups of interest

	In addition to the `drug` and `condition` columns in our dataset, we also have
	a combined `treatment` variable, which defines the four drug x condition
	combinations.

	Because our biological questions are about differences between these groups,
	we fit the linear model again, this time using the `treatment` column.

	```{r}
	fit2 = lm(response ~ treatment + batch, data = dataset)
	fit2
	```

	As before, we can use the `anova()` function to ask whether there is evidence
	that one or more of the four `treatment` groups are different from each other:

	```{r}
	anova(fit2)
	```

	To extract statistics about the individual treatment groups (and the
	differences between them) from the `fit2` object, we make use functions from
	the `emmeans` R package.

	```{r}
	library(emmeans)
	```

	For example, the `emmeans` function can return the estimated `mean`, the
	standard error (`SE`) and the confidence intervals over all `treatment`
	conditions from the linear model:

	```{r}
	group_means = emmeans(fit2, "treatment")
	```

	Note: The model we fit also included the `batch` covariate, but because
	we only want to adjust for it, e.g. it is not part of the scientific
	question we want to ask, it is not specified in the call to `emmeans()`. (The
	output of the `emmeans()` call reminds us that the results are averaged over the
	levels of the batch covariate).

	But what about the differences between the groups? In statistics, they are
	usually referred to as `contrasts`. Depending on your questions of interest,
	you might be interested in

	- pairwise differences between all conditions
	- the differences between each condition and a specific control group
	- a custom set of differences

	All of the above can be extracted from the `group_means` object we created
	above using the `contrast` function.

	#### Pairs

	To get statistics on all possible pairs, choose the `pairwise` method:

	```{r}
	contrast(group_means, method = "pairwise", adjust = "tukey")
	```

	The `estimate` column shows the difference between the estimated means of
	each pair, and the `SE` provides the standard error of this estimate. As
	expected, the statistical confirms that all pairwise comparisons are
	statistically significant.

	When we conduct multiple tests, we need to adjust for the number of
	analyses. By default, the `contrast()` function uses Tukey's method for p-value
	adjustment.[^1]

	Note: The `batch` covariate does not feature in the comparisons, because
	we did not specify it when we created the `group_means` object with the
	`emmeans()` function earlier.

	#### Treatment vs control

	In the example dataset, we have an explicit `Control` treatment. If you are
	interested in testing for differences between each of the other groups and this
	reference, you can use the `trt.vs.ctrl` contrast.

	```{r}
	contrast(group_means, method = "trt.vs.ctrl")
	```

	The default multiplicity adjustment for `treatment vs control` analysis
	method is `dunnettx`, a close approximation to the Dunnett adjustment.

	But how does R which of the conditions is the reference?

	When we defined the dataset, we carefully specified the `treatment` column as
	a factor and defined its `levels`.

	```{r}
	levels(dataset$treatment)
	```

	By default, first level is used as the reference. If you want to compare to
	a different reference group, let's say `D2:C1`, you can specify it in the
	`contrast()` call:

	```{r}
	contrast(group_means, method = "trt.vs.ctrl", ref = "D2:C1")
	```

	#### Custom contrasts

	Usually, experiments are designed to answer specific questions. To define your
	own comparisons, you can specify the comparisons manually.

	Is there a significant difference conditions `D1:C2` and `D2:C2`?
	Let's plot the estimated means and their confidence intervals.

	```{r}
	plot(group_means, CIs = TRUE)
	```

	This plot shows each treatment group on the y-axis and its estimated mean on the
	x-axis. Groups `D1:C2` and `D2:C2` are spaced relatively far apart, but is that
	difference significant?

	Note: You can see more information on plotting the output of the `emmeans()`
	function by consulting the help page:

	```{r eval=FALSE}
	?emmeans:::plot.emmGrid
	```

	Another way of asking whether the `D1:C2` and `D2:C2` conditions are different
	from each other is to examine whether the absolute difference between them
	is larger than zero.

	Let's take another look at the estimated means we stored in the
	`group_means` object:

	```{r}
	group_means
	```

	There are five rows, one for each condition. To compare `D1:C2` and `D2:C2`,
	we need to calculate the difference between the second (`D1:C2`) and the fourth
	row (`D2:C2`).

	We can specify the comparison of interest using a vector of `0`, `1` and `-1`.
	For example, the following vector

	```{r}
	setNames(
	c(0, -1, 0, 1),
	as.data.frame(group_means)$treatment
	)
	```

	- ignores the first and third rows, because they are set to `0`.
	- pulls out the second and fourth rows, which we want to compare, because they
	are set to `1` or `-1`.
	- the sign (`1` or `-1`) defines how difference should be calculated; here, we
	subtract the second row (`D1:C2`) from the fourth row (`D2:C2`).

	We can use this vectors to communicate the comparison of interest to the
	`contrast()` function which comparison:

	```{r}
	contrast(group_means, method = list(
	"D2:C2 - D1:C2" = c(0, -1, 0, 1)
	))
	```

	Note: Because we only specified a single comparison, no multiple testing
	correction is applied.

	To test multiple custom comparisons, we can include them all in the same list:

	```{r}
	contrast(group_means, method = list(
	"D2:C2 - D1:C2" = c(0, -1, 0, 1),
	"D2:C1 - D1:C1" = c(-1, 0, 1, 0)
	), adjust = "none")
	```

	#### Interactions

	The same vector notation can also be used to test for interactions, e.g.
	differences between comparisons.

	For example, we might want to ask whether the `drug` treatment (`D1` vs `D2`)
	has different effects under the two different `conditions` (`C1` and `C2`).

	First, we specify two comparisons to calculate the drug effects, separately
	for each condition. Then, we compare these differences by subtracting them
	from each other:

	```{r}
	comparisons <- list(
	"D2:C1 - D1:C1" = c(-1, 0, 1, 0), # drug 2 vs drug 1 in condition 1
	"D2:C2 - D1:C2" = c(0, -1, 0, 1), # drug 2 vs drug 1 in condition 2
	"(D2:C2 - D1:C2)-(D2:C2 - D1:C2)" = c(-1, 0, 1, 0) - c(0, -1, 0, 1)
	)
	```

	Each of these comparisons corresponds to a linear combination of the estimated
	effects. Let's add the names of the treatment groups to clarify:

	```{r}
	lapply(comparisons, function(x) {
	setNames(x, as.data.frame(group_means)$treatment)
	})
	```

	We can now pass the list of comparisons to the `contrast()` function.

	Note: Here, we choose not to adjust for multiple testing to compare the
	results to those obtained with the `anova()` function earlier. Especially when
	you define many different contrasts, you should specify `adjust = "sidak"`
	instead.

	```{r}
	contrast(group_means, method = comparisons, adjust = "none")
	```

	As expected, the drug has significant effects under both conditions, with an
	estimated shift in means of `0.4` in condition 1 and `0.4333` in condition 2,
	e.g. the drug has slightly stronger effect in condition 2 (+ `0.0333`) - but
	that difference is not statistically significant. In other words, there is
	no significant evidence that the drug has different effects under the two
	conditions. (The interaction p-value matches that obatined by the
	`anova(fit1)` call we started off with, see above.)

	#### Additional information

	For more information on available types of pre-defined contrasts and their
	associated methods of multiple testing adjustment, please consult the manual:

	```{r eval=FALSE}
	?emmeans::`contrast-methods`
	```

	***

	[^1]: A number of multiple testing methods have been defined for different
	use cases. As a rule of thumb, use the `tukey` method when comparing every
	mean with every other mean, use `dunnettx` to compare every mean with a
	control mean. Finally, use `bonferroni` or `sidak` when you manually defined a
	subset of means to compare.