Skip to content

Instantly share code, notes, and snippets.

@pratikunterwegs
Last active February 7, 2024 10:14
Show Gist options
  • Save pratikunterwegs/0feb204f31740f1050f127bde6c649ba to your computer and use it in GitHub Desktop.
Save pratikunterwegs/0feb204f31740f1050f127bde6c649ba to your computer and use it in GitHub Desktop.
API options for {epidemics}
#### Dummy internal fn ####
f_internal <- function(x) x
#### Dummy internal stochastic fn ####
# some function of the input (fixed) and draws from a distr (stochastic)
# assumes a param beta at position 3
f_internal_stochastic <- function(x) x[[3]] + runif(length(x))
#### Single parameter set and single intervention ####
# function operates on some parameters with one run per parameter
f_current <- function(contacts, npi, ...) {
# cross-check interventions and population characteristics, here, demo grps
stopifnot(nrow(contacts) == nrow(npi))
# some internal operation on a list
f_internal(
list(contacts, npi, ...)
)
}
f_current(matrix(1, 2, 2), matrix(0.2, 2, 2), beta = 1, sigma = 2, gamma = 3)
#### Single parameter set and multiple interventions ####
f_multi_npi <- function(contacts, npi, ...) {
# if npi is a list, cross-check each element else cross-check npi
if (is.list(npi)) {
invisible(
lapply(npi, function(x) {
# cross-checking
stopifnot(nrow(contacts) == nrow(x))
})
)
} else {
stopifnot(nrow(contacts) == nrow(npi))
}
# collect intervention combinations for internal fn
if (is.list(npi)) {
scenarios <- lapply(npi, function(x) {
list(contacts, x, ...)
})
# return internal fn operation on parameter-intervention combos
# aka a 'scenario'
lapply(scenarios, function(sce) {
f_internal(sce)
})
} else {
f_internal(list(contacts, npi, ...))
}
}
f_multi_npi(matrix(1, 2, 2), matrix(0.2, 2, 2), beta = 1, sigma = 2, gamma = 3)
# function multi npi is identical to single npi case when a single npi is passed
identical(
f_current(matrix(1, 2, 2), matrix(0.2, 2, 2), beta = 1, sigma = 2, gamma = 3),
f_multi_npi(matrix(1, 2, 2), matrix(0.2, 2, 2), beta = 1, sigma = 2, gamma = 3)
)
# function multi npi returns a list of outputs when multiple interventions
# are passed
f_multi_npi(
contacts = matrix(1, 2, 2),
npi = list(
matrix(0.2, 2, 2),
matrix(0.3, 2, 2),
matrix(0.15, 2, 2)
),
beta = 1, sigma = 2, gamma = 3
)
# cross checking required for each npi
f_multi_npi(
contacts = matrix(1, 2, 2),
npi = list(
matrix(0.2, 2, 2),
matrix(0.3, 1, 2), # this npi is not compatible with contacts
matrix(0.15, 2, 2)
),
beta = 1, sigma = 2, gamma = 3
)
# compare outputs in some appropriate way
#### Multiple parameter sets, single intervention ####
# user passes parameters as vectors of equal lengths - no recycling
f_multi_param <- function(contacts, npi, ...) {
# cross-checking single intervention
stopifnot(nrow(contacts) == nrow(npi))
# collect params and check
params <- list(...)
stopifnot(
length(unique(vapply(params, length, FUN.VALUE = 1L))) == 1
)
# prepare param combos for discrete runs of internal fn
params <- purrr::transpose(params)
params <- lapply(
params, function(x) {
c(
list(contacts, npi),
x
)
}
)
# run internal fn and return list of outputs
lapply(params, f_internal)
}
f_multi_param(
matrix(1, 2, 2), matrix(0.2, 2, 2),
beta = runif(3),
sigma = runif(3),
gamma = runif(3)
)
#### Stochastic internal function, multiple parameter sets ####
# specify replicates as times to run each combination of parameters
f_stochastic <- function(contacts, npi, replicates, ...) {
# cross-checking single intervention
# NOTE ebola model does not accept contact matrix
# but other stochastic models might
stopifnot(nrow(contacts) == nrow(npi))
## SIMILAR TO fn_multi_param
# collect params and check
params <- list(...)
stopifnot(
length(unique(vapply(params, length, FUN.VALUE = 1L))) == 1
)
# prepare param combos for discrete runs of internal fn
params <- purrr::transpose(params)
params <- lapply(
params, function(x) {
c(
list(contacts, npi),
x
)
}
)
# similar logic applies for combinations of parameters and interventions
# N replicates per parameter combination
# running with preserved seed --- NOTE: NOT ABLE TO HAVE THE SEED
# VARY BETWEEN REPLICATES using `replicate()` +
# `withr::with_preserve_seed()`
# hence using hacky implementation
data <- lapply(
seq(replicates), function(t) {
lapply(params, function(p_list) {
withr::with_seed(
t,
f_internal_stochastic(p_list)
)
})
})
# set names for clarity
names(data) <- sprintf("replicate_%i", seq_along(data))
data <- lapply(data, function(x) {
names(x) <- sprintf("param_%i", seq_along(x))
x
})
data
}
# same random number stream across parameters, differs over replicates
output <- f_stochastic(
matrix(1, 2, 2), matrix(0.2, 2, 2),
replicates = 3,
beta = c(10, 20, 30),
sigma = c(10, 20, 30),
gamma = c(10, 20, 30)
)
@TimTaylor
Copy link

Cool - cheers @pratikunterwegs. I'll add another stripped back implementation below to feed in to the discussion tomorrow:

# dummy function
.seir <- function(alpha, beta, gamma, contact_matrix, ... , intervention = NULL) {
    list(t = 1, x = 1)
}

# inputs
repetitions <- 100L
alpha <- beta <- gamma <- rnorm(50L)
contact_matrix <- matrix(1)
inter_set_1 <- inter_set_2 <- inter_set_3 <- inter_set_4 <- list()
interventions <- list(inter_set_1, inter_set_2, inter_set_3, inter_set_4)

# Create a nested list for presenting the results
result <- replicate(
    repetitions,
    vector("list", length = length(interventions)),
    simplify = FALSE
)
result <- replicate(length(alpha), result, simplify = FALSE)
    
# loop over parameters
for (i in seq_along(alpha)) {
    # loop over repetitions
    for (j in seq_len(repetitions)) {
        current_seed <- .Random.seed
        # loop over interventions
        for (k in seq_along(interventions)) {
            .Random.seed <- current_seed
            result[[i]][[j]][[k]] <- .seir(
                alpha = alpha[i],
                beta = beta[i],
                gamma = gamma[i],
                contact_matrix = contact_matrix,
                intervention = interventions[[k]]
            )
        }
    }
}

# for nested output tibbles are good
(out <- tibble::tibble(
    alpha,
    beta,
    gamma,
    repetition = replicate(length(alpha), seq_len(repetitions), simplify = FALSE),
    intervention = replicate(
        length(alpha),
        replicate(repetitions, seq_along(interventions), simplify = FALSE),
        simplify = FALSE
    ),
    result = result
))
#> # A tibble: 50 × 6
#>     alpha   beta  gamma repetition  intervention result      
#>     <dbl>  <dbl>  <dbl> <list>      <list>       <list>      
#>  1 -0.292 -0.292 -0.292 <int [100]> <list [100]> <list [100]>
#>  2 -0.474 -0.474 -0.474 <int [100]> <list [100]> <list [100]>
#>  3 -1.39  -1.39  -1.39  <int [100]> <list [100]> <list [100]>
#>  4  0.750  0.750  0.750 <int [100]> <list [100]> <list [100]>
#>  5 -0.429 -0.429 -0.429 <int [100]> <list [100]> <list [100]>
#>  6  1.73   1.73   1.73  <int [100]> <list [100]> <list [100]>
#>  7 -1.57  -1.57  -1.57  <int [100]> <list [100]> <list [100]>
#>  8 -1.56  -1.56  -1.56  <int [100]> <list [100]> <list [100]>
#>  9 -1.73  -1.73  -1.73  <int [100]> <list [100]> <list [100]>
#> 10 -1.34  -1.34  -1.34  <int [100]> <list [100]> <list [100]>
#> # ℹ 40 more rows

Created on 2024-02-06 with reprex v2.0.2

@TimTaylor
Copy link

some additional points to consider re above:

  • A user taking this approach would experience any input checking on each call of the underlying function.
  • depending on how the underlying models are implemented, there may be a desire to utilise withCallingHandlers to capture warnings and errors and return a more complicated output accordingly.

@TimTaylor
Copy link

Opportunities as {epidemics} maintainers:

  • Allow vector input to reduce the number of input checks required for repeated iterations and/or scenario comparisons.
  • Opportunity to move the looping in to C++ (only if ever deemed necessary)

@pratikunterwegs
Copy link
Author

pratikunterwegs commented Feb 7, 2024

Thanks @TimTaylor - looks pretty similar to the options I've got, which tackle the different cases separately, but could be combined.

I have my doubts about passing multiple intervention sets as a list (of lists of interventions) to a single function call. Agreed this is a nice and convenient functionality for users, but I'm not sure whether users would then be able to conveniently filter a nested data.frame on the intervention column, as that would involve writing a conditional that inspects list elements two levels down (the column, then the intervention list). Two alternatives would be:

  1. Returning a list of nested data.frames where each list element corresponds to an intervention set - this allows easier filtering by name or index, but is less convenient for users with a single intervention set who would get a list of length one when they could get a data.frame;
  2. Allowing only a single intervention set per function call; this would create issues for comparing across interventions in stochastic models as I'd need to store and return the seeds. It's a bit radical at this stage, but it might be worth thinking about whether stochastic models should be split off into their own package.

@pratikunterwegs
Copy link
Author

A user taking this approach would experience any input checking on each call of the underlying function.
depending on how the underlying models are implemented, there may be a desire to utilise withCallingHandlers to capture warnings and errors and return a more complicated output accordingly.

If .seir() is intended to be the 'unsafe' function without input checks, then Input checking would happen within exposed seir() though? Parameter sets can be checked as vectors, and only unique intervention sets could be cross-checked against the population.

@pratikunterwegs
Copy link
Author

Allow vector input to reduce the number of input checks required for repeated iterations and/or scenario comparisons.

I think we've already decided on this following the previous meeting? Either parameters are directly passed as vectors, or eventually they could be passed as distribution objects which would be sampled (see epiverse-trace/epidemics#20).

Opportunity to move the looping in to C++ (only if ever deemed necessary)

I have nothing against moving the looping to C++, except that then we would definitely have to get rid of the R implementations of deterministic models. Good to note that we have a bus factor of 1 (at LSHTM), with no plans to increase it that I can see. The level of C++ we have now is the minimum (imo) required to get a decent performance boost for the deterministic models. We do not use it for stochastic models, so that does again raise the issue of whether they should be handled differently (in a different package even?).

@TimTaylor
Copy link

TimTaylor commented Feb 7, 2024

If .seir() is intended to be the 'unsafe' function without input checks, then Input checking would happen within exposed seir() though?

Ignore the dot. Copy paste. Just treat it as an unvectorised function a user has been given of which they know nothing about underlying checks.

I think we've already decided on this following the previous meeting?

@Bisaloo expressed concerns/doubts on the API so hoping to have similar discussion today to explain what was decided and why.

@pratikunterwegs
Copy link
Author

Ignore the dot. Copy paste. Just treat it as an unvectorised function a user has been given of which they know nothing about underlying checks.

Ah got it.

@Bisaloo expressed concerns/doubts on the API so hoping to have similar discussion today to explain what was decided and why.

Sure, and I broadly agree but I'm not sure there's much room for manoeuvre given growing interest in this sort of functionality (as in https://github.com/orgs/epiverse-trace/discussions/173).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment