Skip to content

Instantly share code, notes, and snippets.

@debruine
Last active February 12, 2022 14:10
Show Gist options
  • Save debruine/6b71e30ff4e0393bf55b892429dfc1ea to your computer and use it in GitHub Desktop.
Save debruine/6b71e30ff4e0393bf55b892429dfc1ea to your computer and use it in GitHub Desktop.
Deal with the terrible doodle format
library(dplyr)
library(tidyr)
library(readxl)
library(lubridate)
# deal with multi-line headers
data_head <- readxl::read_excel("Doodle.xls",
skip = 3, n_max = 3,
col_names = FALSE)
for (i in 2:ncol(data_head)) {
prev <- data_head[, i-1]
this <- data_head[, i]
missing <- is.na(this)
this[missing, ] <- prev[missing, ]
data_head[, i] <- this
}
new_names <- data_head %>%
summarise(across(.fns = paste, collapse = "_")) %>%
unlist() %>% unname() %>%
c("Name", .)
# import data
doodle <- readxl::read_excel("Doodle.xls",
skip = 6,
col_names = new_names) %>%
filter(Name != "Count") %>% # remove the Count row
pivot_longer(cols = -Name,
names_to = c("month", "day", "time"),
names_sep = "_",
values_to = "decision") %>%
separate(time, into = c("starttime", "endtime"), sep = " – ", remove = FALSE) %>%
mutate(datetime = lubridate::dmy_hm(paste(day, month, starttime)),
# replace blanks with "No"
decision = replace_na(decision, "No"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment