Last active
June 18, 2020 08:43
-
-
Save thibautjombart/d968e767bbca165de33cf7756aab12cb to your computer and use it in GitHub Desktop.
Basic idea for a linelist class
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(outbreaks) | |
library(tidyverse) | |
make_linelist <- function(x, date, interval = 1L, date_start = NULL, date_stop = NULL) { | |
## TODO: add tests on inputs | |
x <- tibble::as_tibble(x) | |
out <- dplyr::select(x, date, everything()) | |
dates <- pull(out, date) | |
if (is.null(date_start)) { | |
date_start <- min(dates, na.rm = TRUE) | |
} | |
if (is.null(date_stop)) { | |
date_stop <- max(dates, na.rm = TRUE) | |
} | |
x_info <- list( | |
date = names(out)[1], | |
interval = interval, | |
date_start = date_start, | |
date_stop = date_stop | |
) | |
## append class and add attributes | |
class(out) <- c("linelist", class(x)) | |
attr(out, "linelist_info") <- x_info | |
out | |
} | |
x <- make_linelist(ebola_sim_clean$linelist, "date_of_onset") | |
x | |
## some operations are okay preserving attributes | |
x %>% | |
select(1:10) %>% | |
attr("linelist_info") | |
x %>% | |
select(1:10) %>% | |
group_by(gender) %>% | |
attr("linelist_info") | |
x %>% | |
select(1:10) %>% | |
filter(date_of_onset < as.Date("2015-01-01")) %>% | |
attr("linelist_info") | |
## some are not | |
x %>% | |
select(1:10) %>% | |
group_by(gender) %>% | |
filter(date_of_onset < as.Date("2015-01-01")) %>% | |
attr("linelist_info") |
Yes and yes.
I think we're on the same page. I'll expand on my thinking on the interval function and how it can be used:
- In principle the interval function is any function applied to
date_var
that maintains the monotonic ordering of thedate_var
. - By keeping it abstract it makes it easier to apply different functions in future should we choose.
- Initially I'll just do a "cut, paste and tweak" to the current functionality in the incidence package to make it fit.
- the variable
.interval
is probably better nameddate_group
. By putting it as a variable in the tibble rather than an attribute it's easier to work with. - In theory we would like the interval function to dispatch on both it's arguments although in practice I will probably dispatch on one and switch on the other (I'm undecided on order of arguments at moment). This way it will match current functionality and deal with character vectors and integer/numeric.
Does this make sense / answer your questions?
Yes it makes total sense, and I think it is a nice way to do things. We may need to add:
interval_function <- function(dat_var, interval, date_start, date_stop) {
...
}
+1 to naming the output of that function date_group
inside incidence::incidence
, probably clearer like this.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I really like it! It is both close to the previous implementation in terms of interface, and adds some of the key features we need - most importantly stratification by > 1 factor. A small note on the interval (though I understand this is a proof of concept): here
.interval
seems wrong (interval = 1L
should be taken as one day). Just to make sure we're on the same page:.interval
is the left hand-side of the bins to count cases by (right?)interval
inincidence()
should keep current ability to handle named time units (and thus non-constant intervals), e.g."1 month"
or"2 weeks"
or"quarter"