Last active
February 16, 2016 00:04
-
-
Save mcfrank/fdfcd869e7938f3bfebd to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: 'Class 6a: Blake et al. (2015) exercise' | |
author: "Mike Frank" | |
date: "February 9, 2016" | |
output: | |
html_document: | |
toc: true | |
--- | |
# Intro | |
This is an exploration of Blake et al. (2015), [Ontogeny of fairness in seven societies](http://www.nature.com/nature/journal/v528/n7581/full/nature15703.html). | |
```{r} | |
rm(list=ls()) | |
library(ggplot2) | |
library(tidyr) | |
library(dplyr) | |
library(bootstrap) | |
theme_set(theme_bw()) | |
sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(length(x))} | |
ci95 <- function(x) {sem(x) * 1.96} | |
theta <- function(x,xdata,na.rm=T) {mean(xdata[x],na.rm=na.rm)} | |
ci.low <- function(x,na.rm=T) { | |
mean(x,na.rm=na.rm) - quantile(bootstrap(1:length(x),1000,theta,x,na.rm=na.rm)$thetastar,.025,na.rm=na.rm)} | |
ci.high <- function(x,na.rm=T) { | |
quantile(bootstrap(1:length(x),1000,theta,x,na.rm=na.rm)$thetastar,.975,na.rm=na.rm) - mean(x,na.rm=na.rm)} | |
``` | |
# Data Prep | |
First read in the data, as distributed by the journal. | |
```{r} | |
d <- read.csv("data/Ontogeny_fairness_seven_societies_data.csv", | |
header = TRUE, na.strings = c("NA", ".")) | |
``` | |
Do some preprocessing, taken directly from the supplemental material. | |
```{r} | |
facVars <- c("eq.uneq", "value", "decision") | |
d[, facVars] <- lapply(d[, facVars], factor) | |
d$trial.number <- as.numeric(gsub(".(\\d+)", "\\1", d$trial)) | |
``` | |
Rename things so that they are easy to deal with. | |
```{r} | |
d$trial_type <- factor(d$eq.uneq, | |
levels = c("E","U"), | |
labels = c("Equal","Unequal")) | |
d$condition <- factor(d$condition, | |
levels = c("AI","DI"), | |
labels = c("Advantageous","Disadvantageous")) | |
``` | |
# Variable exploration | |
Describe the dataset graphically in ways that are useful for you to get a handle on the data collection effort. | |
```{r} | |
demo <- d %>% | |
mutate(age = floor(actor.age.years)) %>% | |
group_by(country, age, actor.id) %>% | |
summarise(n = n()) %>% | |
summarise(n = n()) | |
ggplot(demo, aes(x = age, y = n)) + | |
geom_bar(stat = "identity") + | |
facet_grid(. ~ country) | |
``` | |
# Hypothesis-related exploration | |
Try just plotting the un-adjusted curves. | |
```{r} | |
ms <- d %>% | |
filter(!is.na(eq.uneq)) %>% | |
mutate(age = floor(actor.age.years), | |
decision = decision == "reject") %>% | |
group_by(country, trial_type, condition, age, actor.id) %>% | |
summarise(decision = mean(decision, na.rm=TRUE)) %>% | |
summarise(mean = mean(decision, na.rm=TRUE), | |
ci_lower = mean(decision) - ci95(decision), | |
ci_upper = mean(decision) + ci95(decision), | |
n = n()) | |
ggplot(ms, aes(x = age, y = mean, col = country)) + | |
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), | |
position = position_dodge(width = .1)) + | |
geom_point(aes(size = n)) + | |
facet_grid(condition ~ trial_type) + | |
geom_smooth(method="lm", se = FALSE) + | |
ylab("Proportion offers rejected") + | |
xlab("Age (years)") + | |
ylim(c(0,1)) | |
``` | |
Now rebin into 3-year bins. | |
```{r} | |
ms <- d %>% | |
filter(!is.na(eq.uneq)) %>% | |
mutate(age = floor(actor.age.years/2)*2, | |
decision = decision == "reject") %>% | |
group_by(country, trial_type, condition, age, actor.id) %>% | |
summarise(decision = mean(decision, na.rm=TRUE)) %>% | |
summarise(mean = mean(decision, na.rm=TRUE), | |
ci_lower = mean(decision) - ci95(decision), | |
ci_upper = mean(decision) + ci95(decision), | |
n = n()) | |
ggplot(ms, aes(x = age, y = mean, col = country)) + | |
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), | |
position = position_dodge(width = .1)) + | |
geom_point(aes(size = n)) + | |
facet_grid(condition ~ trial_type) + | |
geom_smooth(method="lm", se = FALSE) + | |
ylab("Proportion offers rejected") + | |
xlab("Age (years)") + | |
ylim(c(0,1)) | |
``` | |
Break this down by country. I like this plot best so far. | |
```{r} | |
ggplot(ms, aes(x = age, y = mean, col = condition)) + | |
geom_linerange(aes(ymin = ci_lower, ymax = ci_upper), | |
position = position_dodge(width = .1)) + | |
geom_point(aes(size = n)) + | |
facet_grid(trial_type ~ country) + | |
geom_smooth(method="lm", se = FALSE, span = 2) + | |
ylab("Proportion offers rejected") + | |
xlab("Age (years)") + | |
ylim(c(0,1)) | |
``` | |
But maybe we can do better. Let's: | |
+ bootstrap CIs | |
+ sort the plots | |
+ add better smoother (quadratic) | |
+ use lines instead of points | |
+ better palette | |
```{r} | |
ms <- d %>% | |
filter(!is.na(eq.uneq)) %>% | |
mutate(age = floor(actor.age.years/2)*2, | |
decision = decision == "reject") %>% | |
group_by(country, trial_type, condition, age, actor.id) %>% | |
summarise(decision = mean(decision, na.rm=TRUE)) %>% | |
summarise(mean = mean(decision, na.rm=TRUE), | |
ci_lower = ci.low(decision), | |
ci_upper = ci.high(decision), | |
n = n()) | |
ms$country <- factor(ms$country, | |
levels = c("Uganda","US","Canada", | |
"Senegal","India","Peru","Mexico")) | |
ggplot(ms, aes(x = age, y = mean, col = condition)) + | |
geom_linerange(aes(ymin = mean - ci_lower, | |
ymax = mean + ci_upper), | |
position = position_dodge(width = .5)) + | |
geom_point(aes(size=n)) + | |
facet_grid(trial_type ~ country) + | |
geom_smooth(method="lm", se = FALSE, formula = y ~ I(x^2), aes(weight = 1/n)) + | |
ylab("Proportion offers rejected") + | |
xlab("Age (years)") + | |
ylim(c(0,1)) + | |
langcog::scale_colour_solarized() | |
``` | |
I'm still not super happy with the curves. Can we add some model-derived fits? | |
I'd like to be fitting an lmer model. Unfortunately it doesn't converge on country subsets. | |
So, let's ignore the p values and SEs, which will be wrong, and fit regular GLM. (The coefficients should be very similar because the design is balanced). | |
```{r} | |
mods <- d %>% | |
group_by(country) %>% | |
do(predict(glm(decision == "reject" ~ condition * trial_type * actor.age.years, | |
data = ., | |
family = "binomial"), | |
newdata = data.frame(expand.grid(actor.age.years = 4:14, | |
condition = c("Advantageous","Disadvantageous"), | |
trial_type = c("Equal","Unequal"))))) | |
``` | |
Now plot these. This is not the *ideal* model, but it's way better than nothing. | |
```{r} | |
nd <- expand.grid(actor.age.years = 4:14, | |
condition = c("Advantageous","Disadvantageous"), | |
trial_type = c("Equal","Unequal")) | |
preds <- d %>% | |
group_by(country) %>% | |
do(data.frame(nd, | |
preds = predict(glm(decision == "reject" ~ | |
condition * trial_type * actor.age.years, | |
data = ., | |
family = "binomial"), | |
newdata = nd, type = "response"))) | |
``` | |
Plot these and then put legends on the bottom. | |
```{r} | |
ggplot(ms, aes(x = age, y = mean, col = condition)) + | |
geom_linerange(aes(ymin = mean - ci_lower, | |
ymax = mean + ci_upper), | |
position = position_dodge(width = .5)) + | |
geom_point(aes(size=n)) + | |
facet_grid(trial_type ~ country) + | |
geom_line(data = preds, aes(x = actor.age.years, y = preds)) + | |
ylab("Proportion offers rejected") + | |
xlab("Age (years)") + | |
ylim(c(0,1)) + | |
langcog::scale_colour_solarized() + | |
theme(legend.position = "bottom") | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment