Skip to content

Instantly share code, notes, and snippets.

@seabbs
Created May 11, 2020 18:38
Show Gist options
  • Save seabbs/5a9a22950357d569c482535a12645c2c to your computer and use it in GitHub Desktop.
Save seabbs/5a9a22950357d569c482535a12645c2c to your computer and use it in GitHub Desktop.
Test memory leak
# Packages -----------------------------------------------------------------
require(data.table, quietly = TRUE)
require(future, quietly = TRUE)
## Require for data and nowcasting
# require(EpiNow, quietly = TRUE)
# require(NCoVUtils, quietly = TRUE)
## Required for forecasting
# require(future.apply, quietly = TRUE)
# require(fable, quietly = TRUE)
# require(fabletools, quietly = TRUE)
# require(feasts, quietly = TRUE)
# require(urca, quietly = TRUE)
# Get cases ---------------------------------------------------------------
NCoVUtils::reset_cache()
cases <- NCoVUtils::get_ecdc_cases()
cases <- NCoVUtils::format_ecdc_data(cases)
cases <- data.table::setDT(cases)[!is.na(region)][,
`:=`(local = cases, imported = 0)][, cases := NULL]
cases <- data.table::melt(cases, measure.vars = c("local", "imported"),
variable.name = "import_status",
value.name = "confirm")
## Remove regions with data issues
cases <- cases[!region %in% c("Faroe Islands", "Sao Tome and Principe", "Tajikistan")]
cases <- cases[1:2000]
# Get linelist ------------------------------------------------------------
linelist <-
data.table::fread("https://raw.githubusercontent.com/epiforecasts/NCoVUtils/master/data-raw/linelist.csv")
delays <- linelist[!is.na(date_onset_symptoms)][,
.(report_delay = as.numeric(lubridate::dmy(date_confirmation) -
as.Date(lubridate::dmy(date_onset_symptoms))))]
delays <- delays$report_delay
# Set up cores -----------------------------------------------------
if (!interactive()){
options(future.fork.enable = TRUE)
}
future::plan("multiprocess",
gc = TRUE, earlySignal = TRUE)
# Fit the reporting delay -------------------------------------------------
delay_defs <- EpiNow::get_dist_def(delays,
bootstraps = 10,
samples = 50)
# Fit the incubation period -----------------------------------------------
## Mean delay
exp(EpiNow::covid_incubation_period[1, ]$mean)
## Get incubation defs
incubation_defs <- EpiNow::lognorm_dist_def(mean = EpiNow::covid_incubation_period[1, ]$mean,
mean_sd = EpiNow::covid_incubation_period[1, ]$mean_sd,
sd = EpiNow::covid_incubation_period[1, ]$sd,
sd_sd = EpiNow::covid_incubation_period[1, ]$sd_sd,
max_value = 30, samples = 50)
# future::plan("multiprocess", workers = future::availableCores(),
# gc = TRUE, earlySignal = TRUE)
future::plan("sequential", earlySignal = TRUE, gc = TRUE)
# Run pipeline ----------------------------------------------------
EpiNow::regional_rt_pipeline(
cases = cases,
delay_defs = delay_defs,
incubation_defs = incubation_defs,
target_folder = "national",
case_limit = 60,
horizon = 14,
nowcast_lag = 8,
approx_delay = TRUE,
report_forecast = TRUE,
forecast_model = function(...) {
EpiSoon::fable_model(model = fabletools::combination_model(fable::RW(y ~ drift()), fable::ETS(y),
fable::NAIVE(y),
cmbn_args = list(weights = "inv_var")), ...)
}
)
future::plan("sequential")
# Summarise results -------------------------------------------------------
EpiNow::regional_summary(results_dir = "national",
summary_dir = "national-summary",
target_date = "latest",
region_scale = "Country",
csv_region_label = "country",
log_cases = TRUE)
@seabbs
Copy link
Author

seabbs commented May 11, 2020

This runs the global nowcast/forecast on the first 2000 rows of data sequentially for 50 samples.

If you install epiforecasts/EpiNow@review you will get a version of the code that spits out memory used at each step helping to identify where the leak occurs (requires pryr to be installed).

My read on the situation is that it is caused by the saving of plots in plot_pipeline but this could be incorrect. To try and contain this I have moved all reporting functions into a clean environment that only depends on empty environment. This does not appear to help.

valgrind suggests there is a memory leak but I have not been able to use it to identify a line of R code as refers to the C code causing the issue.

@seabbs
Copy link
Author

seabbs commented May 11, 2020

There are two obvious solutions to this problem both of which make the code less usable:

  1. Split each region into a separate Rscript call that is completely isolated. This would be the traditional HPC approach but is not ideal as it reduces the usability of the tooling for others and makes the infrastructure harder to reproduce. Potentially something like callr coule be used internally to reproduce this but this also seems clunky.

  2. Isolate the area of the code causing problems (likely but not definitely the graphing - not definitely as this makes no sense to scale with samples) and remove/run in a separate process. This is also clunky and requires the problem to be fully understood and identified.

@seabbs
Copy link
Author

seabbs commented May 11, 2020

There are also a few other options:

  1. Nesting parallelization for efficiency. This is again pretty clunky but easy to implement. In theory a good solution (if a bit inefficient) and one that I have previously explored with no luck. As the code base has now changed quite a bit this may now be more possible (testing overnight).

  2. Run with few cores so ram >> cores in a batch process of regions. Extremely clunky and hard to predict when failures will occur.

@adhusch
Copy link

adhusch commented May 13, 2020

@seabbs Any news on this? I could try to find some R wizards here at our Bioinformatics Core group to have a look. Unfortunately i cant help much with R, never really used it. (Anyway, problem seems to be in C in the end right? Could you provide the valgrind output?)

@seabbs
Copy link
Author

seabbs commented May 13, 2020

Hi @adhusch - still a work in progress unfortunately though I think getting there. Any help with the code would be much appreciated at any time. Being new to valgrind I didn't find the output particularly helpful in identifying the code areas in R that were causing problems and I have not been able to find anything on how to do this - anything on this would be great.

@seabbs
Copy link
Author

seabbs commented May 13, 2020

New issue detailing progress: epiforecasts/covid#56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment