Skip to content

Instantly share code, notes, and snippets.

@wrathematics
Created November 26, 2014 19:52
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wrathematics/2e4996408c9750c16327 to your computer and use it in GitHub Desktop.
Save wrathematics/2e4996408c9750c16327 to your computer and use it in GitHub Desktop.
"genderize" test with the babynames dataset
library(babynames)
genderize <- function(name)
{
regex <- "(ua|pher|andy|elijah)$"
if (grepl(x=name, pattern=regex, ignore.case=TRUE))
return("male")
regex <- "(a|i|y|ah|ee|et|ette|elle|fer|ine|lyn|ie|anne|een|en|er|yn|ynn|kim|rachel|lind|pam|sue)$"
if (grepl(x=name, pattern=regex, ignore.case=TRUE))
return("female")
return("male")
}
year <- 2013
female <- babynames[which(babynames$sex == "F" & babynames$year == year), ]
male <- babynames[which(babynames$sex == "M" & babynames$year == year), ]
female <- female[order(-female$prop), "name"]
male <- male[order(-male$prop), "name"]
female_genderized <- sapply(female, genderize)
male_genderized <- sapply(male, genderize)
table(female_genderized)
# female_genderized
# female male
# 14974 4140
table(male_genderized)
# male_genderized
# female male
# 4072 9886
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment