Skip to content

Instantly share code, notes, and snippets.

@thibautjombart
Last active June 18, 2020 08:43
Show Gist options
  • Save thibautjombart/d968e767bbca165de33cf7756aab12cb to your computer and use it in GitHub Desktop.
Save thibautjombart/d968e767bbca165de33cf7756aab12cb to your computer and use it in GitHub Desktop.
Basic idea for a linelist class
library(outbreaks)
library(tidyverse)
make_linelist <- function(x, date, interval = 1L, date_start = NULL, date_stop = NULL) {
## TODO: add tests on inputs
x <- tibble::as_tibble(x)
out <- dplyr::select(x, date, everything())
dates <- pull(out, date)
if (is.null(date_start)) {
date_start <- min(dates, na.rm = TRUE)
}
if (is.null(date_stop)) {
date_stop <- max(dates, na.rm = TRUE)
}
x_info <- list(
date = names(out)[1],
interval = interval,
date_start = date_start,
date_stop = date_stop
)
## append class and add attributes
class(out) <- c("linelist", class(x))
attr(out, "linelist_info") <- x_info
out
}
x <- make_linelist(ebola_sim_clean$linelist, "date_of_onset")
x
## some operations are okay preserving attributes
x %>%
select(1:10) %>%
attr("linelist_info")
x %>%
select(1:10) %>%
group_by(gender) %>%
attr("linelist_info")
x %>%
select(1:10) %>%
filter(date_of_onset < as.Date("2015-01-01")) %>%
attr("linelist_info")
## some are not
x %>%
select(1:10) %>%
group_by(gender) %>%
filter(date_of_onset < as.Date("2015-01-01")) %>%
attr("linelist_info")
@TimTaylor
Copy link

Yes and yes.

I think we're on the same page. I'll expand on my thinking on the interval function and how it can be used:

  • In principle the interval function is any function applied to date_var that maintains the monotonic ordering of the date_var.
  • By keeping it abstract it makes it easier to apply different functions in future should we choose.
  • Initially I'll just do a "cut, paste and tweak" to the current functionality in the incidence package to make it fit.
  • the variable .interval is probably better named date_group. By putting it as a variable in the tibble rather than an attribute it's easier to work with.
  • In theory we would like the interval function to dispatch on both it's arguments although in practice I will probably dispatch on one and switch on the other (I'm undecided on order of arguments at moment). This way it will match current functionality and deal with character vectors and integer/numeric.

Does this make sense / answer your questions?

@thibautjombart
Copy link
Author

Yes it makes total sense, and I think it is a nice way to do things. We may need to add:

interval_function <- function(dat_var, interval, date_start, date_stop) {
 ...
}

+1 to naming the output of that function date_group inside incidence::incidence, probably clearer like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment