Skip to content

Instantly share code, notes, and snippets.

@jirilukavsky
Created December 20, 2019 10:03
Show Gist options
  • Save jirilukavsky/37dda95cbb38552d8550fe19a99e239d to your computer and use it in GitHub Desktop.
Save jirilukavsky/37dda95cbb38552d8550fe19a99e239d to your computer and use it in GitHub Desktop.
---
title: "Sample estimation for correlation coefficients"
author: "Jiri Lukavsky"
date: "12/20/2019"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(kableExtra)
library(psychometric)
```
## Goal
You want to compare two versions of a test. You have some prior idea about their correlation and you wonder how your sample size will affect the precision of the estimated correlation.
Let us assume some possible scenarios differing in the expected correlation coefficient and sample size.
```{r data}
r <- c(0.6, 0.7, 0.8, 0.9)
n <- c(100, 150, 200, 250, 300, 350, 400, 450, 500)
tab <- crossing(r, n) %>%
mutate(lo95 = NA_real_, hi95 = NA_real_)
```
We use `CIr` function from `psychometric` package:
```{r calculation}
for (i in 1:nrow(tab)) {
ci <- CIr(tab$r[i], tab$n[i], level = 0.95)
tab$lo95[i] <- ci[1]
tab$hi95[i] <- ci[2]
}
tab <- tab %>% mutate(ciw95 = hi95 - lo95)
```
## Results
In the following table, we can see 95% confidence intervals for each scenario.
`ciw95` is the width of the confidence interval.
```{r}
tab %>% kable() %>%
kable_styling(bootstrap_options = "striped", full_width = F)
```
The same results in plot:
```{r pressure, echo=FALSE}
ggplot(tab %>% mutate(rf = factor(r)),
aes(x = n, y = r, ymin = lo95, ymax = hi95,
colour = rf, group = rf)) +
geom_pointrange(position = position_dodge(width = 20)) + theme_minimal()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment