Skip to content

Instantly share code, notes, and snippets.

@MaiaPelletier
Last active May 5, 2020 23:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MaiaPelletier/be58ae7a2a73b5f48e0c38901ca52ffa to your computer and use it in GitHub Desktop.
Save MaiaPelletier/be58ae7a2a73b5f48e0c38901ca52ffa to your computer and use it in GitHub Desktop.
Cleaning Pudding music challenge data
pudding_data <-
map_dfr(list.files(pattern = '.xlsx'), read_excel) %>%
na.omit()
knowledge_labels <- c(
"don't know it",
"sounds familiar",
"know it",
"singing the lyrics"
)
knowledge_labels_regex <- "(don't know it)|(sounds familiar)|(know it)|(singing the lyrics)"
pudding_data %>%
mutate(name = ifelse(name == 'Maia', 'M', 'J')) %>%
mutate(
knowledge = case_when( # I did this since there were emojis in the original labels
str_detect(song, knowledge_labels[1]) ~ knowledge_labels[1],
str_detect(song, knowledge_labels[2]) ~ knowledge_labels[2],
str_detect(song, knowledge_labels[3]) ~ knowledge_labels[3],
str_detect(song, knowledge_labels[4]) ~ knowledge_labels[4]
)
) %>%
fill(knowledge) %>%
filter(!str_detect(song, knowledge_labels_regex)) %>%
separate(song, c('song', 'artist'), sep = '\\sby', remove = TRUE) %>%
mutate(
year = str_extract(artist, '[0-9]{4}$'),
year = as.integer(year),
artist = str_remove(artist, ', [0-9]{4}$'),
artist = str_trim(artist),
knowledge = fct_inorder(knowledge)
) %>%
mutate(year_dec = case_when(
year < 1970 ~ 1960,
year >= 1970 & year < 1980 ~ 1970,
year >= 1980 & year < 1990 ~ 1980,
year >= 1990 & year < 2000 ~ 1990,
year >= 2000 & year < 2010 ~ 2000,
year >= 2010 ~ 2010
)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment