dhicks/survey_sim.Rmd

## survey_sim.Rmd
---
title: "Samples can be smaller than you think"
output:
    flexdashboard::flex_dashboard:
    vertical_layout: scroll
runtime: shiny
---

```{r setup, include = FALSE}
library(tidyverse)
library(flexdashboard)
library(glue)
```

# Inputs {.sidebar}

```{r}
numericInput('theta', 'True (population) rate', 0.5)
numericInput('n', 'Sample size (each survey)', 200)
numericInput('N', 'How many simulations?', 15)
```
```{r}
actionButton('go', 'Go!')
```


# Simulation

```{r}
sim_data = function(theta, n, N) {
    tibble(
        response = sample(c('yes', 'no'),
                          size = n * N,
                          replace = TRUE,
                          prob = c(theta, 1-theta)),
        survey = rep.int(1:N, n)
    ) |>
        arrange(survey, desc(response)) |>
        mutate(idx = row_number() / n(), .by = survey, .before = everything())
}
dataf = eventReactive(input$go, sim_data(input$theta, input$n, input$N))
# renderTable(dataf())
```

```{r}
plot_sims = function(dataf, n, theta) {
    ggplot(dataf, aes(idx, survey)) +
    # geom_tile(aes(height = .75)) +
    geom_point(aes(fill = response, color = response),
               shape = '|', size = 8, alpha = .8) +
    geom_vline(xintercept = theta,
               linetype = 'solid',
               size = 1) +
    geom_vline(xintercept = c(theta - .05, theta + .05),
               linetype = 'solid',
               size = 1,
               alpha = .25) +
    scale_x_continuous(name = '',
                       minor_breaks = 0.1*(1:9),
                       labels = scales::percent_format()) +
    scale_y_continuous(breaks = scales::pretty_breaks(),
                       trans = scales::reverse_trans()) +
    scale_fill_brewer(palette = 'Set1',
                      aesthetics = c('color', 'fill')) +
    labs(caption = glue::glue('θ = {theta}, n = {n}')) +
    theme_minimal(base_size = 20)
}
renderPlot(plot_sims(dataf(), input$n, input$theta))
```

# What's going on here?

This app simulates a public opinion survey, replicated multiple times, to show the effect of different sample sizes.

People who aren't familiar with statistics often think that sample sizes must be very large to give accurate results.  But sample sizes can be much smaller than you might think and still be fairly accurate.

Set the parameters for the simulation using the box on the left. In the simulation, the survey has a single question, and everyone answers either "yes" or "not."  The **true (population) rate** θ ("theta") is the fraction of the population that thinks "yes."  (Default value is 50%.) But we can't talk to everyone, so each survey has a set **sample size**.  (Default value is 200.)  To examine how reliable this sample size is, we conduct multiple independent simulations of the survey.  (Default value is 15.)

After setting the parameter values, hit "Go!" Here's how to read the plot:

- Each run of the simulation is represented by a horizontal bar.
- The bar is made up of individual lines, one for each participant in the survey.
    - (Depending on the sample size and the size of the screen you're using, you might not be able to see the individual lines.)
- The individual lines are ordered and colored by the response:
    - "Yes" responses on the left in blue;
    - "No" responses on the right in red.
- The heavy black line running down the whole plot is the true population value, what the survey is trying to estimate.
- There are also two fainter lines, representing ±5% of the true population value .
    - (So 45% and 55% if you stick with the default population value of 50%.)

**Even with a small sample size of 200 participants, the survey results are often within ±5% of the true population value.**
	---
	title: "Samples can be smaller than you think"
	output:
	flexdashboard::flex_dashboard:
	vertical_layout: scroll
	runtime: shiny
	---

	```{r setup, include = FALSE}
	library(tidyverse)
	library(flexdashboard)
	library(glue)
	```

	# Inputs {.sidebar}

	```{r}
	numericInput('theta', 'True (population) rate', 0.5)
	numericInput('n', 'Sample size (each survey)', 200)
	numericInput('N', 'How many simulations?', 15)
	```
	```{r}
	actionButton('go', 'Go!')
	```


	# Simulation

	```{r}
	sim_data = function(theta, n, N) {
	tibble(
	response = sample(c('yes', 'no'),
	size = n * N,
	replace = TRUE,
	prob = c(theta, 1-theta)),
	survey = rep.int(1:N, n)
	) \|>
	arrange(survey, desc(response)) \|>
	mutate(idx = row_number() / n(), .by = survey, .before = everything())
	}
	dataf = eventReactive(input$go, sim_data(input$theta, input$n, input$N))
	# renderTable(dataf())
	```

	```{r}
	plot_sims = function(dataf, n, theta) {
	ggplot(dataf, aes(idx, survey)) +
	# geom_tile(aes(height = .75)) +
	geom_point(aes(fill = response, color = response),
	shape = '\|', size = 8, alpha = .8) +
	geom_vline(xintercept = theta,
	linetype = 'solid',
	size = 1) +
	geom_vline(xintercept = c(theta - .05, theta + .05),
	linetype = 'solid',
	size = 1,
	alpha = .25) +
	scale_x_continuous(name = '',
	minor_breaks = 0.1*(1:9),
	labels = scales::percent_format()) +
	scale_y_continuous(breaks = scales::pretty_breaks(),
	trans = scales::reverse_trans()) +
	scale_fill_brewer(palette = 'Set1',
	aesthetics = c('color', 'fill')) +
	labs(caption = glue::glue('θ = {theta}, n = {n}')) +
	theme_minimal(base_size = 20)
	}
	renderPlot(plot_sims(dataf(), input$n, input$theta))
	```

	# What's going on here?

	This app simulates a public opinion survey, replicated multiple times, to show the effect of different sample sizes.

	People who aren't familiar with statistics often think that sample sizes must be very large to give accurate results. But sample sizes can be much smaller than you might think and still be fairly accurate.

	Set the parameters for the simulation using the box on the left. In the simulation, the survey has a single question, and everyone answers either "yes" or "not." The true (population) rate θ ("theta") is the fraction of the population that thinks "yes." (Default value is 50%.) But we can't talk to everyone, so each survey has a set sample size. (Default value is 200.) To examine how reliable this sample size is, we conduct multiple independent simulations of the survey. (Default value is 15.)

	After setting the parameter values, hit "Go!" Here's how to read the plot:

	- Each run of the simulation is represented by a horizontal bar.
	- The bar is made up of individual lines, one for each participant in the survey.
	- (Depending on the sample size and the size of the screen you're using, you might not be able to see the individual lines.)
	- The individual lines are ordered and colored by the response:
	- "Yes" responses on the left in blue;
	- "No" responses on the right in red.
	- The heavy black line running down the whole plot is the true population value, what the survey is trying to estimate.
	- There are also two fainter lines, representing ±5% of the true population value .
	- (So 45% and 55% if you stick with the default population value of 50%.)

	Even with a small sample size of 200 participants, the survey results are often within ±5% of the true population value.