Skip to content

Instantly share code, notes, and snippets.

@chris-prener
Last active January 11, 2018 19:22
Show Gist options
  • Save chris-prener/71e4882fbf2b78eb9c6c36c676a7fed1 to your computer and use it in GitHub Desktop.
Save chris-prener/71e4882fbf2b78eb9c6c36c676a7fed1 to your computer and use it in GitHub Desktop.
Generate a Random Data Frame of Names and Genders
# dependencies
library(dplyr) # data wrangling
library(gender) # assess gender
library(randomNames) # generate names
library(stringr) # work with strings
# create data frame of random names
names <- randomNames(30, which.names = "both", name.order = "first.last", name.sep = " ")
names <- as.data.frame(names, stringsAsFactors = FALSE)
# parse full name into first and last name variables
names %>%
rename(fullName = names) %>%
mutate(firstName = word(fullName,1)) %>%
mutate(lastName = word(fullName,-1)) -> names
# evaluate names for typical gender
gender <- gender(names$firstName)
# trim gender data frame
gender %>%
rename(firstName = name) %>%
select(firstName, gender) -> gender
# combine name and gender data
sampleData <- left_join(names, gender, by = "firstName")
# make sure all names have a gender and there are no duplicates
sampleData <- filter(sampleData, is.na(gender) == FALSE)
sampleData <- distinct(sampleData, fullName, .keep_all = TRUE)
# take random sample of names
index <- sample(1:nrow(sampleData), 20)
# filter based on random sample
sampleData %>%
mutate(row = row_number()) %>%
filter(row %in% index == TRUE) %>%
select(-row) -> sampleData
# save as tibble
sampleData <- as_tibble(sampleData)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment