Skip to content

Instantly share code, notes, and snippets.

@jennybc
Created April 9, 2016 16:46
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jennybc/2e1937941e9bef7222659e47ab29a1a1 to your computer and use it in GitHub Desktop.
Save jennybc/2e1937941e9bef7222659e47ab29a1a1 to your computer and use it in GitHub Desktop.
Get polygraphing's film data into R
## http://polygraph.cool/films/
## https://github.com/matthewfdaniels/scripts
x <- "https://raw.githubusercontent.com/matthewfdaniels/scripts/master/data/character_list5.csv"
characters <- read.csv(x, na.strings = c("NULL", "?"),
fileEncoding = "ISO-8859-1", stringsAsFactors = FALSE)
## some ages are clearly (negative) birth years ... oops
characters$age[!is.na(characters$age) & characters$age < 0] <- NA
characters$age[!is.na(characters$age) & characters$age > 105] <- NA
y <- "https://raw.githubusercontent.com/matthewfdaniels/scripts/master/data/meta_data7.csv"
films <- read.csv(y, fileEncoding = "ISO-8859-1", stringsAsFactors = FALSE,
colClasses = list(lines_data = NULL))
## setequal(characters$script_id, films$script_id)
## wow, a pleasant surprise
df <- merge(characters, films)
write.table(df, "characters_with_film.csv", sep = ",", row.names = FALSE)
library(ggplot2)
ggplot(subset(df, !is.na(gender)), aes(x = age, colour = gender)) +
geom_density()
@matthewfdaniels
Copy link

Love that fixed the errors that we solve in front-end code. We're missing birth year (or it was wrong) for many actors/actresses. Just as you've done, we remove those rows from the age calculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment