widdowquinn/effect_sample_size.Rmd

## effect_sample_size.Rmd
---
title: "Effect and Sample Size"
author: "Leighton Pritchard"
date: "10 May 2016"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Create dataset

Sampling from a Normal distribution, mean zero, unity standard deviation, with n in {3, 5, 7, 10, 50, 100, 250, 500, 1000, 5000, 10000}, one thousand times each, and calculating the probability of each sample being drawn from that distribution, by t-test.

```{r sim}
sample_sizes = c(3, 5, 7, 10, 50, 100, 250, 5000, 1000, 5000, 10000)
df = data.frame(samples=integer(), mean=double(), sd=double(), p=double(), lci=double(), uci=double())
for (n in sample_sizes) {
  for (i in 1:1000) {
    data = rnorm(n)
    df = rbind(df, setNames(as.list(c(n, mean(data), sd(data),
                                      t.test(data)$p.value,
                                      t.test(data)$conf.int[[1]], t.test(data)$conf.int[[2]])),
                            names(df)) )
  }
}
```

## Significant effects

How likely are we to see 'significant' effects, or other signs of bias due to sample size?

```{r sig}
library("dplyr")
# Count of times 95% CI doesn't include zero (equivalent to two-tailed P<0.05)
sig_summary = df %>% group_by(samples) %>% summarize(sig_effect = sum(lci > 0 | uci < 0))
```

Plotting mean distributions against sample size, we see that the range of means is greater at smaller sample sizes. However, if statistical tests are being run correctly, this should not translate into unduly optimistic estimates of statistical significance.

```{r plot_means}
library(ggplot2)
p1 = ggplot(df, aes(x=samples, y=mean, alpha=0.3))
p1 + geom_point() + scale_x_log10()
```

Plotting the count of events where a statistically significant difference is seen, there's no strong relationship visible between sample size and frequency of statistically significant effects.

```{r plot_sigs}
p2 = ggplot(sig_summary, aes(x=samples, y=sig_effect))
p2 + geom_point() + scale_x_log10()
```

Where there is scope for misinterpretation, this is likely due to the apparent absolute size of an effect - which must be larger at small sample sizes, to produce the same P-value. That may lead to a presumption of biological (or domain-specific) significance, because the effect is 'large'. The statistical test doesn't itself demonstrate this kind of effect, only that the observed difference has a sufficiently low probability of being produced by the null hypothesis distribution to perhaps be worth further investigation.
	---
	title: "Effect and Sample Size"
	author: "Leighton Pritchard"
	date: "10 May 2016"
	output: html_document
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE)
	```

	## Create dataset

	Sampling from a Normal distribution, mean zero, unity standard deviation, with n in {3, 5, 7, 10, 50, 100, 250, 500, 1000, 5000, 10000}, one thousand times each, and calculating the probability of each sample being drawn from that distribution, by t-test.

	```{r sim}
	sample_sizes = c(3, 5, 7, 10, 50, 100, 250, 5000, 1000, 5000, 10000)
	df = data.frame(samples=integer(), mean=double(), sd=double(), p=double(), lci=double(), uci=double())
	for (n in sample_sizes) {
	for (i in 1:1000) {
	data = rnorm(n)
	df = rbind(df, setNames(as.list(c(n, mean(data), sd(data),
	t.test(data)$p.value,
	t.test(data)$conf.int[[1]], t.test(data)$conf.int[[2]])),
	names(df)) )
	}
	}
	```

	## Significant effects

	How likely are we to see 'significant' effects, or other signs of bias due to sample size?

	```{r sig}
	library("dplyr")
	# Count of times 95% CI doesn't include zero (equivalent to two-tailed P<0.05)
	sig_summary = df %>% group_by(samples) %>% summarize(sig_effect = sum(lci > 0 \| uci < 0))
	```

	Plotting mean distributions against sample size, we see that the range of means is greater at smaller sample sizes. However, if statistical tests are being run correctly, this should not translate into unduly optimistic estimates of statistical significance.

	```{r plot_means}
	library(ggplot2)
	p1 = ggplot(df, aes(x=samples, y=mean, alpha=0.3))
	p1 + geom_point() + scale_x_log10()
	```

	Plotting the count of events where a statistically significant difference is seen, there's no strong relationship visible between sample size and frequency of statistically significant effects.

	```{r plot_sigs}
	p2 = ggplot(sig_summary, aes(x=samples, y=sig_effect))
	p2 + geom_point() + scale_x_log10()
	```

	Where there is scope for misinterpretation, this is likely due to the apparent absolute size of an effect - which must be larger at small sample sizes, to produce the same P-value. That may lead to a presumption of biological (or domain-specific) significance, because the effect is 'large'. The statistical test doesn't itself demonstrate this kind of effect, only that the observed difference has a sufficiently low probability of being produced by the null hypothesis distribution to perhaps be worth further investigation.