Skip to content

Instantly share code, notes, and snippets.

@thibautjombart
Last active June 18, 2020 08:43
Show Gist options
  • Save thibautjombart/d968e767bbca165de33cf7756aab12cb to your computer and use it in GitHub Desktop.
Save thibautjombart/d968e767bbca165de33cf7756aab12cb to your computer and use it in GitHub Desktop.
Basic idea for a linelist class
library(outbreaks)
library(tidyverse)
make_linelist <- function(x, date, interval = 1L, date_start = NULL, date_stop = NULL) {
## TODO: add tests on inputs
x <- tibble::as_tibble(x)
out <- dplyr::select(x, date, everything())
dates <- pull(out, date)
if (is.null(date_start)) {
date_start <- min(dates, na.rm = TRUE)
}
if (is.null(date_stop)) {
date_stop <- max(dates, na.rm = TRUE)
}
x_info <- list(
date = names(out)[1],
interval = interval,
date_start = date_start,
date_stop = date_stop
)
## append class and add attributes
class(out) <- c("linelist", class(x))
attr(out, "linelist_info") <- x_info
out
}
x <- make_linelist(ebola_sim_clean$linelist, "date_of_onset")
x
## some operations are okay preserving attributes
x %>%
select(1:10) %>%
attr("linelist_info")
x %>%
select(1:10) %>%
group_by(gender) %>%
attr("linelist_info")
x %>%
select(1:10) %>%
filter(date_of_onset < as.Date("2015-01-01")) %>%
attr("linelist_info")
## some are not
x %>%
select(1:10) %>%
group_by(gender) %>%
filter(date_of_onset < as.Date("2015-01-01")) %>%
attr("linelist_info")
@thibautjombart
Copy link
Author

I really like it! It is both close to the previous implementation in terms of interface, and adds some of the key features we need - most importantly stratification by > 1 factor. A small note on the interval (though I understand this is a proof of concept): here .interval seems wrong (interval = 1L should be taken as one day). Just to make sure we're on the same page:

  • question / confirmation: here .interval is the left hand-side of the bins to count cases by (right?)
  • in practice, the argument interval in incidence() should keep current ability to handle named time units (and thus non-constant intervals), e.g. "1 month" or "2 weeks" or "quarter"

@TimTaylor
Copy link

Yes and yes.

I think we're on the same page. I'll expand on my thinking on the interval function and how it can be used:

  • In principle the interval function is any function applied to date_var that maintains the monotonic ordering of the date_var.
  • By keeping it abstract it makes it easier to apply different functions in future should we choose.
  • Initially I'll just do a "cut, paste and tweak" to the current functionality in the incidence package to make it fit.
  • the variable .interval is probably better named date_group. By putting it as a variable in the tibble rather than an attribute it's easier to work with.
  • In theory we would like the interval function to dispatch on both it's arguments although in practice I will probably dispatch on one and switch on the other (I'm undecided on order of arguments at moment). This way it will match current functionality and deal with character vectors and integer/numeric.

Does this make sense / answer your questions?

@thibautjombart
Copy link
Author

Yes it makes total sense, and I think it is a nice way to do things. We may need to add:

interval_function <- function(dat_var, interval, date_start, date_stop) {
 ...
}

+1 to naming the output of that function date_group inside incidence::incidence, probably clearer like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment