Skip to content

Instantly share code, notes, and snippets.

@dhicks
Created January 5, 2024 02:49
Show Gist options
  • Save dhicks/bcd6be8ec6c4c16b6ea402656094e221 to your computer and use it in GitHub Desktop.
Save dhicks/bcd6be8ec6c4c16b6ea402656094e221 to your computer and use it in GitHub Desktop.
---
title: "Samples can be smaller than you think"
output:
flexdashboard::flex_dashboard:
vertical_layout: scroll
runtime: shiny
---
```{r setup, include = FALSE}
library(tidyverse)
library(flexdashboard)
library(glue)
```
# Inputs {.sidebar}
```{r}
numericInput('theta', 'True (population) rate', 0.5)
numericInput('n', 'Sample size (each survey)', 200)
numericInput('N', 'How many simulations?', 15)
```
```{r}
actionButton('go', 'Go!')
```
# Simulation
```{r}
sim_data = function(theta, n, N) {
tibble(
response = sample(c('yes', 'no'),
size = n * N,
replace = TRUE,
prob = c(theta, 1-theta)),
survey = rep.int(1:N, n)
) |>
arrange(survey, desc(response)) |>
mutate(idx = row_number() / n(), .by = survey, .before = everything())
}
dataf = eventReactive(input$go, sim_data(input$theta, input$n, input$N))
# renderTable(dataf())
```
```{r}
plot_sims = function(dataf, n, theta) {
ggplot(dataf, aes(idx, survey)) +
# geom_tile(aes(height = .75)) +
geom_point(aes(fill = response, color = response),
shape = '|', size = 8, alpha = .8) +
geom_vline(xintercept = theta,
linetype = 'solid',
size = 1) +
geom_vline(xintercept = c(theta - .05, theta + .05),
linetype = 'solid',
size = 1,
alpha = .25) +
scale_x_continuous(name = '',
minor_breaks = 0.1*(1:9),
labels = scales::percent_format()) +
scale_y_continuous(breaks = scales::pretty_breaks(),
trans = scales::reverse_trans()) +
scale_fill_brewer(palette = 'Set1',
aesthetics = c('color', 'fill')) +
labs(caption = glue::glue('θ = {theta}, n = {n}')) +
theme_minimal(base_size = 20)
}
renderPlot(plot_sims(dataf(), input$n, input$theta))
```
# What's going on here?
This app simulates a public opinion survey, replicated multiple times, to show the effect of different sample sizes.
People who aren't familiar with statistics often think that sample sizes must be very large to give accurate results. But sample sizes can be much smaller than you might think and still be fairly accurate.
Set the parameters for the simulation using the box on the left. In the simulation, the survey has a single question, and everyone answers either "yes" or "not." The **true (population) rate** θ ("theta") is the fraction of the population that thinks "yes." (Default value is 50%.) But we can't talk to everyone, so each survey has a set **sample size**. (Default value is 200.) To examine how reliable this sample size is, we conduct multiple independent simulations of the survey. (Default value is 15.)
After setting the parameter values, hit "Go!" Here's how to read the plot:
- Each run of the simulation is represented by a horizontal bar.
- The bar is made up of individual lines, one for each participant in the survey.
- (Depending on the sample size and the size of the screen you're using, you might not be able to see the individual lines.)
- The individual lines are ordered and colored by the response:
- "Yes" responses on the left in blue;
- "No" responses on the right in red.
- The heavy black line running down the whole plot is the true population value, what the survey is trying to estimate.
- There are also two fainter lines, representing ±5% of the true population value .
- (So 45% and 55% if you stick with the default population value of 50%.)
**Even with a small sample size of 200 participants, the survey results are often within ±5% of the true population value.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment