jirilukavsky/correlation_ci.Rmd

## correlation_ci.Rmd
---
title: "Sample estimation for correlation coefficients"
author: "Jiri Lukavsky"
date: "12/20/2019"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)
library(kableExtra)
library(psychometric)
```

## Goal

You want to compare two versions of a test. You have some prior idea about their correlation and you wonder how your sample size will affect the precision of the estimated correlation.

Let us assume some possible scenarios differing in the expected correlation coefficient and sample size.

```{r data}
r <- c(0.6, 0.7, 0.8, 0.9)
n <- c(100, 150, 200, 250, 300, 350, 400, 450, 500)

tab <- crossing(r, n) %>%
  mutate(lo95 = NA_real_, hi95 = NA_real_)
```

We use `CIr` function from `psychometric` package:

```{r calculation}
for (i in 1:nrow(tab)) {
  ci <- CIr(tab$r[i], tab$n[i], level = 0.95)
  tab$lo95[i] <- ci[1]
  tab$hi95[i] <- ci[2]
}
tab <- tab %>% mutate(ciw95 = hi95 - lo95)
```

## Results

In the following table, we can see 95% confidence intervals for each scenario.
`ciw95` is the width of the confidence interval.

```{r}
tab %>% kable() %>%
  kable_styling(bootstrap_options = "striped", full_width = F)
```

The same results in plot:

```{r pressure, echo=FALSE}
ggplot(tab %>% mutate(rf = factor(r)),
       aes(x = n, y = r, ymin = lo95, ymax = hi95,
           colour = rf, group = rf)) +
  geom_pointrange(position = position_dodge(width = 20)) + theme_minimal()

```
	---
	title: "Sample estimation for correlation coefficients"
	author: "Jiri Lukavsky"
	date: "12/20/2019"
	output: html_document
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE)

	library(tidyverse)
	library(kableExtra)
	library(psychometric)
	```

	## Goal

	You want to compare two versions of a test. You have some prior idea about their correlation and you wonder how your sample size will affect the precision of the estimated correlation.

	Let us assume some possible scenarios differing in the expected correlation coefficient and sample size.

	```{r data}
	r <- c(0.6, 0.7, 0.8, 0.9)
	n <- c(100, 150, 200, 250, 300, 350, 400, 450, 500)

	tab <- crossing(r, n) %>%
	mutate(lo95 = NA_real_, hi95 = NA_real_)
	```

	We use `CIr` function from `psychometric` package:

	```{r calculation}
	for (i in 1:nrow(tab)) {
	ci <- CIr(tab$r[i], tab$n[i], level = 0.95)
	tab$lo95[i] <- ci[1]
	tab$hi95[i] <- ci[2]
	}
	tab <- tab %>% mutate(ciw95 = hi95 - lo95)
	```

	## Results

	In the following table, we can see 95% confidence intervals for each scenario.
	`ciw95` is the width of the confidence interval.

	```{r}
	tab %>% kable() %>%
	kable_styling(bootstrap_options = "striped", full_width = F)
	```

	The same results in plot:

	```{r pressure, echo=FALSE}
	ggplot(tab %>% mutate(rf = factor(r)),
	aes(x = n, y = r, ymin = lo95, ymax = hi95,
	colour = rf, group = rf)) +
	geom_pointrange(position = position_dodge(width = 20)) + theme_minimal()

	```