Skip to content

Instantly share code, notes, and snippets.

@mcfrank
Last active February 16, 2016 00:04
Show Gist options
  • Save mcfrank/fdfcd869e7938f3bfebd to your computer and use it in GitHub Desktop.
Save mcfrank/fdfcd869e7938f3bfebd to your computer and use it in GitHub Desktop.
---
title: 'Class 6a: Blake et al. (2015) exercise'
author: "Mike Frank"
date: "February 9, 2016"
output:
html_document:
toc: true
---
# Intro
This is an exploration of Blake et al. (2015), [Ontogeny of fairness in seven societies](http://www.nature.com/nature/journal/v528/n7581/full/nature15703.html).
```{r}
rm(list=ls())
library(ggplot2)
library(tidyr)
library(dplyr)
library(bootstrap)
theme_set(theme_bw())
sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(length(x))}
ci95 <- function(x) {sem(x) * 1.96}
theta <- function(x,xdata,na.rm=T) {mean(xdata[x],na.rm=na.rm)}
ci.low <- function(x,na.rm=T) {
mean(x,na.rm=na.rm) - quantile(bootstrap(1:length(x),1000,theta,x,na.rm=na.rm)$thetastar,.025,na.rm=na.rm)}
ci.high <- function(x,na.rm=T) {
quantile(bootstrap(1:length(x),1000,theta,x,na.rm=na.rm)$thetastar,.975,na.rm=na.rm) - mean(x,na.rm=na.rm)}
```
# Data Prep
First read in the data, as distributed by the journal.
```{r}
d <- read.csv("data/Ontogeny_fairness_seven_societies_data.csv",
header = TRUE, na.strings = c("NA", "."))
```
Do some preprocessing, taken directly from the supplemental material.
```{r}
facVars <- c("eq.uneq", "value", "decision")
d[, facVars] <- lapply(d[, facVars], factor)
d$trial.number <- as.numeric(gsub(".(\\d+)", "\\1", d$trial))
```
Rename things so that they are easy to deal with.
```{r}
d$trial_type <- factor(d$eq.uneq,
levels = c("E","U"),
labels = c("Equal","Unequal"))
d$condition <- factor(d$condition,
levels = c("AI","DI"),
labels = c("Advantageous","Disadvantageous"))
```
# Variable exploration
Describe the dataset graphically in ways that are useful for you to get a handle on the data collection effort.
```{r}
demo <- d %>%
mutate(age = floor(actor.age.years)) %>%
group_by(country, age, actor.id) %>%
summarise(n = n()) %>%
summarise(n = n())
ggplot(demo, aes(x = age, y = n)) +
geom_bar(stat = "identity") +
facet_grid(. ~ country)
```
# Hypothesis-related exploration
Try just plotting the un-adjusted curves.
```{r}
ms <- d %>%
filter(!is.na(eq.uneq)) %>%
mutate(age = floor(actor.age.years),
decision = decision == "reject") %>%
group_by(country, trial_type, condition, age, actor.id) %>%
summarise(decision = mean(decision, na.rm=TRUE)) %>%
summarise(mean = mean(decision, na.rm=TRUE),
ci_lower = mean(decision) - ci95(decision),
ci_upper = mean(decision) + ci95(decision),
n = n())
ggplot(ms, aes(x = age, y = mean, col = country)) +
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper),
position = position_dodge(width = .1)) +
geom_point(aes(size = n)) +
facet_grid(condition ~ trial_type) +
geom_smooth(method="lm", se = FALSE) +
ylab("Proportion offers rejected") +
xlab("Age (years)") +
ylim(c(0,1))
```
Now rebin into 3-year bins.
```{r}
ms <- d %>%
filter(!is.na(eq.uneq)) %>%
mutate(age = floor(actor.age.years/2)*2,
decision = decision == "reject") %>%
group_by(country, trial_type, condition, age, actor.id) %>%
summarise(decision = mean(decision, na.rm=TRUE)) %>%
summarise(mean = mean(decision, na.rm=TRUE),
ci_lower = mean(decision) - ci95(decision),
ci_upper = mean(decision) + ci95(decision),
n = n())
ggplot(ms, aes(x = age, y = mean, col = country)) +
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper),
position = position_dodge(width = .1)) +
geom_point(aes(size = n)) +
facet_grid(condition ~ trial_type) +
geom_smooth(method="lm", se = FALSE) +
ylab("Proportion offers rejected") +
xlab("Age (years)") +
ylim(c(0,1))
```
Break this down by country. I like this plot best so far.
```{r}
ggplot(ms, aes(x = age, y = mean, col = condition)) +
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper),
position = position_dodge(width = .1)) +
geom_point(aes(size = n)) +
facet_grid(trial_type ~ country) +
geom_smooth(method="lm", se = FALSE, span = 2) +
ylab("Proportion offers rejected") +
xlab("Age (years)") +
ylim(c(0,1))
```
But maybe we can do better. Let's:
+ bootstrap CIs
+ sort the plots
+ add better smoother (quadratic)
+ use lines instead of points
+ better palette
```{r}
ms <- d %>%
filter(!is.na(eq.uneq)) %>%
mutate(age = floor(actor.age.years/2)*2,
decision = decision == "reject") %>%
group_by(country, trial_type, condition, age, actor.id) %>%
summarise(decision = mean(decision, na.rm=TRUE)) %>%
summarise(mean = mean(decision, na.rm=TRUE),
ci_lower = ci.low(decision),
ci_upper = ci.high(decision),
n = n())
ms$country <- factor(ms$country,
levels = c("Uganda","US","Canada",
"Senegal","India","Peru","Mexico"))
ggplot(ms, aes(x = age, y = mean, col = condition)) +
geom_linerange(aes(ymin = mean - ci_lower,
ymax = mean + ci_upper),
position = position_dodge(width = .5)) +
geom_point(aes(size=n)) +
facet_grid(trial_type ~ country) +
geom_smooth(method="lm", se = FALSE, formula = y ~ I(x^2), aes(weight = 1/n)) +
ylab("Proportion offers rejected") +
xlab("Age (years)") +
ylim(c(0,1)) +
langcog::scale_colour_solarized()
```
I'm still not super happy with the curves. Can we add some model-derived fits?
I'd like to be fitting an lmer model. Unfortunately it doesn't converge on country subsets.
So, let's ignore the p values and SEs, which will be wrong, and fit regular GLM. (The coefficients should be very similar because the design is balanced).
```{r}
mods <- d %>%
group_by(country) %>%
do(predict(glm(decision == "reject" ~ condition * trial_type * actor.age.years,
data = .,
family = "binomial"),
newdata = data.frame(expand.grid(actor.age.years = 4:14,
condition = c("Advantageous","Disadvantageous"),
trial_type = c("Equal","Unequal")))))
```
Now plot these. This is not the *ideal* model, but it's way better than nothing.
```{r}
nd <- expand.grid(actor.age.years = 4:14,
condition = c("Advantageous","Disadvantageous"),
trial_type = c("Equal","Unequal"))
preds <- d %>%
group_by(country) %>%
do(data.frame(nd,
preds = predict(glm(decision == "reject" ~
condition * trial_type * actor.age.years,
data = .,
family = "binomial"),
newdata = nd, type = "response")))
```
Plot these and then put legends on the bottom.
```{r}
ggplot(ms, aes(x = age, y = mean, col = condition)) +
geom_linerange(aes(ymin = mean - ci_lower,
ymax = mean + ci_upper),
position = position_dodge(width = .5)) +
geom_point(aes(size=n)) +
facet_grid(trial_type ~ country) +
geom_line(data = preds, aes(x = actor.age.years, y = preds)) +
ylab("Proportion offers rejected") +
xlab("Age (years)") +
ylim(c(0,1)) +
langcog::scale_colour_solarized() +
theme(legend.position = "bottom")
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment