Last active
April 30, 2020 22:30
-
-
Save nmatzke/898b99e6b43b1bc81f6cb27cf184181d to your computer and use it in GitHub Desktop.
week_06_Discussion_example_option1_v2.R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
####################################################### | |
# DISCUSSION ASSIGNMENT: FIT R0s FOR A COUNTRY OF INTEREST | |
# | |
# For the final part of your assignment, you will | |
# pretend you are an epidemiologist writing a report | |
# on the spread of Covid-19 in a particular country, | |
# and commenting on what you can tell about the effectiveness | |
# of interventions at this point. | |
# | |
# | |
# You will use ML to fit three (3) or more SIR models | |
# to some country's Covid-19 active cases. You will | |
# then compare these model fits with AIC. | |
# | |
# You will have to choose what "regimes" to use -- | |
# i.e., the points at which R0 might change due to | |
# the citizenry doing physical distancing, lockdown, | |
# and/or release of these. | |
# | |
# This means you will have to decide how many days | |
# after the first case to start each regime. You | |
# will do this by changing "time" as before. These | |
# decisions are made by you, the researcher, based on | |
# looking at the chart of case counts, and/or based | |
# on using google etc. to research when a particular | |
# country went into lockdown. | |
# | |
# You might also manually change "delay_val," which | |
# represents "how long after a change in R0, do you | |
# start to see it show up in the active cases data?" | |
# | |
# I found that a "delay_val" of 14 worked well, but | |
# you are welcome to change it if you want. | |
# | |
# There are 2 ways to do this assignment: | |
# | |
# Option #1. IF YOU HAVE R ON YOUR COMPUTER, AND HAVE | |
# RELIABLE INTERNET. Use country case count | |
# data provided by the European Union. | |
# | |
# Option #2: IF YOU ARE USING AN ONLINE VERSION OF R, | |
# OR HAVE WEAK INTERNET, you might not | |
# be able to access the online EU data | |
# from your R session. Instead, update | |
# the New Zealand data | |
# by hand so that it extends to the present | |
# day, and then fit your models to that. | |
# You can get the updated data from the | |
# "Total and active cases" chart at: | |
# | |
# Hancock, Faith (2020). Covid-19 in NZ. NZ Herald. | |
# https://www.newsroom.co.nz/2020/03/27/1101216/covid-19-in-nz-the-numbers | |
# | |
# We just want the "active cases" count (hover your | |
# mouse over the blue line to see the numbers). | |
# | |
# I am providing example scripts for both options. These | |
# scripts will include all of the required functions. | |
# | |
# You will just have to: | |
# | |
# 1. Copy the whole example script to a text editor where | |
# you can make/save changes. | |
# | |
# 2. Copy-paste the whole script into R | |
# and run it once, | |
# just to make sure it works for you. If you hit some | |
# problem with Option #1, go to the Option #2 script and | |
# try it. | |
# | |
# 3. Modify the script as follows: | |
# | |
# For option #1, change the country you want to study | |
# from something other than New Zealand. Change the | |
# "time" options for each model as desired. | |
# | |
# For option #2, edit the New Zealand active cases data | |
# in the script to make it up-to-date. Change the "time" | |
# options for each model as desired. | |
# | |
# You are free to experiment with other changes if you | |
# like, of course. (E.g., changing plot titles, line | |
# colors, other features of the model, whatever.) | |
# | |
# 4. Get the log-likelihoods under each model. | |
# | |
# 5. Calculate AIC, relative likelihoods, and AIC weights | |
# for each model. You can do it in R, or modify the | |
# Excel or Google-Sheet examples I have provided. | |
# | |
# 6. In a Discussion post: | |
# | |
# * include a screenshot of your AIC table, including | |
# at least these columns: | |
# - Model name | |
# - log-likelihood | |
# - number of free parameters | |
# - AIC | |
# - AIC model weights | |
# | |
# * include a screenshot that plots your AIC-best | |
# model against the data for your country | |
# | |
# * Summarize your results and critically evaluate: | |
# | |
# E.g., "I fit 4 SIR models | |
# to the active-case data from country X. The models | |
# had 1, 2, 3, or 4 regimes with different R0 values. | |
# The AIC-best model was Model 3, gaining 55% of the | |
# AIC weight. Under this best model, R0 was 6.1 for | |
# the first phase of the epidemic, indicating extremely | |
# rapid spread. However, after the country imposed a | |
# lockdown on day 14, R0 dropped to 0.1. However, | |
# political disputes in the country led to the lockdown | |
# being lifted after 7 days, causing a second wave of | |
# cases with R0 estimated as 4.5. This wave appears to | |
# be ongoing. | |
# | |
# Based on these model results, country X clearly needs | |
# to return to lockdown. However, visual inspection of | |
# the plot of the best-fit SIR model against the data | |
# suggests that the available models were not a | |
# perfect fit, perhaps because of a change in case | |
# definitions on Day 23 that made it appear as if | |
# 1000 cases occurred suddenly on that day. A more | |
# sophisticated model would account for this change | |
# in case definition, as well as XYZ. | |
# | |
# References (if any) | |
# | |
# 7. Comment on at least one other student's summary, | |
# pointing out what you found interesting, making | |
# suggestions, or asking questions. As always, stay | |
# professional. | |
# | |
####################################################### | |
####################################################### | |
# Example script for Option #1, Week 6 Discussion assignment | |
# (doing the assignment on a computer with R installed) | |
####################################################### | |
# Download EU worldwide data | |
data = read.csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv", na.strings = "", fileEncoding = "UTF-8-BOM") | |
# See the list of countries named in the database: | |
unique(data$countriesAndTerritories) | |
# Subset by country | |
country_name = "New_Zealand" # NOTE how the underscore ("_") is required in the country name! | |
TF = data$countriesAndTerritories == country_name # TF = TRUE/FALSE for rows matching the country | |
d1 = data[TF,] | |
# Convert the dates to R's "Date" format, then order d by date | |
dates = as.Date(d1$dateRep, format="%d/%m/%Y") | |
d2 = cbind(dates, d1) | |
d2 = d2[order(d2$dates),] | |
# Find the first day with 1 case | |
TF = d2$cases > 0 | |
first_row_with_case = (1:nrow(d2))[TF][1] | |
# Subset the country's table to start with the first case in that country | |
d3 = d2[first_row_with_case:nrow(d2),] | |
d3 | |
# Messy code to "add-in" in any missing days | |
d4 = NULL | |
numdays = d3$dates[nrow(d3)] - d3$dates[1] + 1 | |
d3_rownum = 0 | |
for (i in 1:numdays) | |
{ | |
if (i == 1) | |
{ | |
d3_rownum = d3_rownum + 1 | |
d4 = rbind(d4, d3[d3_rownum,]) | |
} else { | |
if ( (d3$dates[d3_rownum+1] - d4[(i-1),]$dates[1]) == 1 ) | |
{ | |
d3_rownum = d3_rownum + 1 | |
d4 = rbind(d4, d3[d3_rownum,]) | |
} else { | |
newdate = d4[(i-1),]$dates[1] + 1 | |
day = format(newdate, "%d") | |
month = format(newdate, "%m") | |
year = format(newdate, "%Y") | |
dateRep = paste(day, month, year, sep="/") | |
newrow = as.data.frame(matrix(data=c(newdate, dateRep, day, month, year, 0, 0, country_name, d4[(i-1),]$geoId[1], d4[(i-1),]$countryterritoryCode[1], d4[(i-1),]$popData2018[1], d4[(i-1),]$continentExp[1]), nrow=1), stringsAsFactors=FALSE) | |
names(newrow) = c("dates", "dateRep", "day", "month", "year", "cases", "deaths", "countriesAndTerritories", "geoId", "countryterritoryCode", "popData2018", "continentExp") | |
newrow$dates[1] = as.character(as.Date(paste(year, month, day, sep="-"), format="%Y-%m-%d")) | |
d4 = rbind(d4, newrow) | |
class(d4[,1]) = "Date" | |
} | |
} | |
} | |
d4 | |
# The number of cumulative cases on a particular day equals the new cases that day, | |
# plus the number of cases on the 13 previous days (Note: we are ASSUMING the Duration of | |
# Infection, D_inf, equals 14 here - this database doesn't collect the number of | |
# Recovered). | |
D_inf = 14 | |
active_cases = NULL | |
recovered = NULL | |
for (i in 1:nrow(d4)) | |
{ | |
startrow = i - (D_inf-1) | |
if (startrow <= 1) | |
{ | |
startrow = 1 | |
tmp_recovered = 0 | |
} else { | |
tmp_recovered = sum(as.numeric(d4$cases[1:(startrow-1)])) | |
} | |
tmp_active = sum(as.numeric(d4$cases[startrow:i])) | |
# Store these results | |
active_cases = c(active_cases, tmp_active) | |
recovered = c(recovered, tmp_recovered) | |
} # END for (i in 1:nrow(d4)) | |
# Add in the new columns: | |
d5 = cbind(d4, active_cases, recovered) | |
# Look at your data table: | |
head(d5) # look at first 6 rows | |
tail(d5) # look at last 6 rows | |
dim(d5) # look at the number of rows & columns | |
# Plot the data: | |
plot(x=d5$dates, y=d5$active_cases, xlab="Date", ylab="Active cases", col="firebrick2") | |
titletxt = paste0("Active COVID-19 cases in ", country_name, "\n(EUDC data, assuming D_inf=14)") | |
title(titletxt) | |
points(x=d5$dates, y=d5$recovered, col="green3") | |
####################################################### | |
# LOAD OUR FUNCTIONS FOR SIR SIMULATION AND INFERENCE | |
####################################################### | |
library(deSolve) | |
likelihood_single_point <- function(data.point, model.point, log=TRUE) | |
{ | |
# Get the probability density of the count, assuming | |
# the observed count is a poisson process with an | |
# average value matching the true process. | |
# | |
# This allows for variation, e.g. if a case is reported | |
# days after it occurs. | |
likelihood = dpois(x=data.point, lambda=model.point, log=log) | |
# Return the likelihood value | |
return(likelihood) | |
} # END of likelihood_single_point | |
likelihood_time_series <- function(data.points, model.points, log=TRUE) | |
{ | |
# Error check - return a stop() message if check fails | |
if (length(data.points) != length(model.points)) | |
{ stop("ERROR in likelihood_time_series(): the lengths of data.points and model.points must match.") } | |
# Calculate the likelihoods of the data.points | |
likelihoods = mapply(FUN=likelihood_single_point, data.point=data.points, model.point=model.points, MoreArgs=list(log=log)) | |
# If you calculated log-likelihoods, then sum them | |
# If you calculated raw likelihoods, then multiply them | |
if (log == TRUE) | |
{ | |
total_likelihood = sum(likelihoods) | |
} else { | |
total_likelihood = prod(likelihoods) # prod=take the product | |
} # END if/else statement: if (log == TRUE) | |
# Return that value | |
return (total_likelihood) | |
} # END of likelihood_time_series | |
####################################################### | |
# Functions from Week 5 | |
####################################################### | |
library(deSolve) | |
SIR_ode <- function(time, state, parameters) | |
{ | |
# The above inputs are: | |
# time = a time point | |
# state = an R list of the 3 state values, i.e. the | |
# population sizes of S, I, and R | |
# parameters = an R list of the parameters R0 and D_inf | |
# Convert the input parameters "R0" and "D_inf" | |
# into the rates "beta" and "nu" | |
beta <- parameters[["R0"]] / parameters[["D_inf"]] | |
nu <- 1 / parameters[["D_inf"]] | |
# Extract the current values of the states S, I, and R | |
S <- state[["S"]] | |
I <- state[["I"]] | |
R <- state[["R"]] | |
# Calculate N, the total of S+I+R, | |
# i.e. the population size | |
N <- S + I + R | |
# Write out the system of | |
# Ordinary Differential Equations (ODEs) | |
dS <- -beta * I/N * S | |
dI <- beta * I/N * S - nu * I | |
dR <- nu * I | |
# Return the current rates of change of S, I, and R: | |
return(list(c(dS, dI, dR))) | |
} # End of function SIR_ode | |
simulate_SIR <- function(theta, init.state, times) | |
{ | |
trajectory <- data.frame(ode(y = init.state, | |
times = times, | |
func = SIR_ode, | |
parms = theta, | |
method = "ode45")) | |
return(trajectory) | |
} # End of simulate_SIR() | |
simulate_SIR_changes <- function(thetas_states_df, times) | |
{ | |
# Loop through the rows | |
trajectory = NULL | |
for (rn in 1:nrow(thetas_states_df)) | |
{ | |
if (rn == 1) | |
{ | |
init.state = c(S=thetas_states_df$S[rn], I=thetas_states_df$I[rn], R=thetas_states_df$R[rn]) | |
theta = list(R0=thetas_states_df$R0[rn], D_inf=thetas_states_df$D_inf[rn]) | |
} else { | |
last_row = trajectory[nrow(trajectory),] | |
init.state = c(S=last_row$S+thetas_states_df$S[rn], I=last_row$I+thetas_states_df$I[rn], R=last_row$R+thetas_states_df$R[rn]) | |
theta = list(R0=thetas_states_df$R0[rn], D_inf=thetas_states_df$D_inf[rn]) | |
} | |
# Read in the parameters for the initial, versus later, time-bins | |
if (rn >= nrow(thetas_states_df)) | |
{ | |
start_time = thetas_states_df$time[rn] | |
end_time = max(times) | |
} else { | |
start_time = thetas_states_df$time[rn] | |
end_time = thetas_states_df$time[rn+1] | |
} | |
TF1 = times >= start_time | |
TF2 = times <= end_time | |
TF = (TF1 + TF2) == 2 | |
tmp_times = times[TF] | |
tmp_trajectory = simulate_SIR(theta=theta, init.state=init.state, times=tmp_times) | |
if (rn == 1) | |
{ | |
trajectory = rbind(trajectory, tmp_trajectory) | |
} else { | |
trajectory_minus_last_state = trajectory[-nrow(trajectory),] | |
trajectory = rbind(trajectory_minus_last_state, tmp_trajectory) | |
} # if (rn == 1) | |
} # END for (rn in 1:nrow(thetas_states_df)) | |
return(trajectory) | |
} # End simulate_SIR_changes() | |
params_to_likelihood_v1 <- function(params, data.points, thetas_states_df, delay_val=14, printvals=TRUE, makePlots=FALSE) | |
{ | |
# The times are just the length of the data.points | |
times = 1:length(data.points) | |
# (1) input the free parameters | |
thetas_states_temp = thetas_states_df | |
TF = thetas_states_temp == "free" | |
thetas_states_temp[TF] = params | |
# Error check | |
if ((max(thetas_states_temp$time)+delay_val) > max(times)) | |
{ | |
stop("ERROR in params_to_likelihood_v1: there is a 'thetas_states_df$time'+delay_val greater than found in 'times'") | |
} | |
# Make sure everything in thetas_states_df is numeric, instead of character | |
for (i in 1:ncol(thetas_states_temp)) | |
{ | |
thetas_states_temp[,i] = as.numeric(thetas_states_temp[,i]) | |
} | |
# Ad2 the delay (allows time for interventions to show up | |
# in the detections) | |
thetas_states_temp$time[2:length(thetas_states_temp$time)] = thetas_states_temp$time[2:length(thetas_states_temp$time)] + delay_val | |
# Calculate the trajectory given the parameters, | |
# input into model.points | |
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_temp, times) | |
model.points = trajectory$I | |
lnL = likelihood_time_series(data.points, model.points, log=TRUE) | |
if (is.finite(lnL)== FALSE) | |
{ | |
lnL=-10e100 | |
} | |
# Print, if desired | |
if (printvals == TRUE) | |
{ | |
print(thetas_states_temp) | |
cat("log-likelihood = ", lnL, "\n", sep="") | |
} | |
# Make plots, if desired | |
if (makePlots == TRUE) | |
{ | |
plot_trajectory_lines(trajectory, theta=NULL, thetas_states_df=thetas_states_df, title_prefix="Plot #5:") | |
} | |
return(lnL) | |
} # END params_to_likelihood_v1 | |
####################################################### | |
# FINISHED LOADING OUR FUNCTIONS FOR SIR SIMULATION AND INFERENCE | |
####################################################### | |
####################################################### | |
# Model 1: A 1-regime model | |
####################################################### | |
times = 1:nrow(d5) # Number of days since first case | |
time = c(1, nrow(d5)-1) # The 2nd entry is the 2nd-to-last day, | |
# this forces most of the data to | |
# use the first R0 | |
R0 = c(3.0, 2.5) | |
D_inf = c(14.0, 14.0) | |
S = c(as.numeric(d5$popData2018[1]), 0) # putting in the country's population size | |
I = c(1, 0) | |
R = c(0, 0) | |
thetas_states_table = cbind(time, R0, D_inf, S, I, R) | |
thetas_states_df = as.data.frame(thetas_states_table, stringsAsFactors=FALSE) | |
# NOTE: ONLY 2 FREE PARAMETERS | |
thetas_states_df$R0[1] = "free" | |
thetas_states_df$I[1] = "free" | |
thetas_states_df | |
# Run the params_to_likelihood_v1() function once, to see your starting likelihood | |
data.points = d5$active_cases | |
params = c(2.5, 0.03) # starting parameter values | |
params_to_likelihood_v1(params, data.points, thetas_states_df, delay_val=0, printvals=TRUE, makePlots=FALSE) | |
# Running the Maximum Likelihood search with the optim() function. | |
# LOOK AT THE OUTPUT THAT PRINTS TO SCREEN!! | |
delay_val = 0 | |
ML_results = optim(par=params, fn=params_to_likelihood_v1, data.points=data.points, thetas_states_df=thetas_states_df, delay_val=delay_val, printvals=TRUE, method="L-BFGS-B", lower=0.0, control=list(fnscale=-1)) | |
# Graphing the ML model | |
# Take the learned parameters from "ML_results", put them | |
# into a theta_states data.frame for simulation and plotting | |
thetas_states_ML = thetas_states_df | |
TF = thetas_states_ML == "free" | |
thetas_states_ML[TF] = ML_results$par | |
for (i in 1:ncol(thetas_states_ML)) | |
{ thetas_states_ML[,i] = as.numeric(thetas_states_ML[,i]) } | |
thetas_states_ML$time[2:length(thetas_states_ML$time)] = thetas_states_ML$time[2:length(thetas_states_ML$time)] + delay_val | |
# Plot the results | |
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_ML, times=times) | |
maxy = max(max(trajectory$I), max(data.points)) | |
xvals = c(trajectory$time, times) | |
yvals = c(trajectory$I, data.points) | |
plot(x=xvals, y=yvals, pch=".", col="white", xlim=c(0, max(trajectory$time)), ylim=c(0, maxy), xlab="Day", ylab="Number of individuals") | |
lines(x=trajectory$time, y=trajectory$I, lwd=3, col="firebrick2") | |
points(times, data.points) | |
legend(x="topleft", legend=c("Active COVID-19 case count", 'ML-fitted projection of "I" (Infected)'), lty=c("blank", "solid"), lwd=c(1,3), pch=c(1, "."), col=c("black","firebrick2"), cex=0.8) | |
titletxt = paste0("ML fit, EUDC active cases from: ", country_name, "\nModel 1 (a 1-regime model) max lnL = ", round(ML_results$value, 2)) | |
title(titletxt) | |
# Save this model's parameters and log-likelihood | |
thetas_states_ML_model1 = thetas_states_ML | |
total_lnL_Model1 = ML_results$value | |
####################################################### | |
# Model 2: A 2-regime model | |
####################################################### | |
times = 1:nrow(d5) # Number of days since first case | |
time = c(1, 24) # NZ Level 4 lockdown went in on March 26, day 24 since 1st case | |
R0 = c(3.0, 0.3) | |
D_inf = c(14.0, 14.0) | |
S = c(as.numeric(d5$popData2018[1]), 0) # putting in the country's population size | |
I = c(1, 0) | |
R = c(0, 0) | |
thetas_states_table = cbind(time, R0, D_inf, S, I, R) | |
thetas_states_df = as.data.frame(thetas_states_table, stringsAsFactors=FALSE) | |
# Make some parameters into "free" parameters, to be inferred | |
thetas_states_df$R0 = c("free", "free") | |
thetas_states_df$I[1] = "free" | |
thetas_states_df | |
# Run the params_to_likelihood_v1() function once, to see your starting likelihood | |
data.points = d5$active_cases | |
params = c(3.0, 0.3, 0.03) # starting parameter values | |
params_to_likelihood_v1(params, data.points, thetas_states_df, delay_val=14, printvals=TRUE, makePlots=FALSE) | |
# Running the Maximum Likelihood search with the optim() function. | |
# LOOK AT THE OUTPUT THAT PRINTS TO SCREEN!! | |
delay_val = 14 | |
ML_results = optim(par=params, fn=params_to_likelihood_v1, data.points=data.points, thetas_states_df=thetas_states_df, delay_val=delay_val, printvals=TRUE, method="L-BFGS-B", lower=0.0, control=list(fnscale=-1)) | |
# Graphing the ML model | |
# Take the learned parameters from "ML_results", put them | |
# into a theta_states data.frame for simulation and plotting | |
thetas_states_ML = thetas_states_df | |
TF = thetas_states_ML == "free" | |
thetas_states_ML[TF] = ML_results$par | |
for (i in 1:ncol(thetas_states_ML)) | |
{ thetas_states_ML[,i] = as.numeric(thetas_states_ML[,i]) } | |
thetas_states_ML$time[2:length(thetas_states_ML$time)] = thetas_states_ML$time[2:length(thetas_states_ML$time)] + delay_val | |
# Plot the results | |
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_ML, times=times) | |
maxy = max(max(trajectory$I), max(data.points)) | |
xvals = c(trajectory$time, times) | |
yvals = c(trajectory$I, data.points) | |
plot(x=xvals, y=yvals, pch=".", col="white", xlim=c(0, max(trajectory$time)), ylim=c(0, maxy), xlab="Day", ylab="Number of individuals") | |
lines(x=trajectory$time, y=trajectory$I, lwd=3, col="firebrick2") | |
points(times, data.points) | |
legend(x="topleft", legend=c("Active COVID-19 case count", 'ML-fitted projection of "I" (Infected)'), lty=c("blank", "solid"), lwd=c(1,3), pch=c(1, "."), col=c("black","firebrick2"), cex=0.8) | |
titletxt = paste0("ML fit, EUDC active cases from: ", country_name, "\nModel 2 (a 2-regime model) max lnL = ", round(ML_results$value, 2)) | |
# Save this model's parameters and log-likelihood | |
thetas_states_ML_model2 = thetas_states_ML | |
total_lnL_Model2 = ML_results$value | |
####################################################### | |
# Model 3: A 3-regime model | |
####################################################### | |
times = 1:nrow(d5) # Number of days since first case | |
time = c(1, 16, 24) # NZ Level 4 lockdown went in on March 26, day 24 since 1st case | |
R0 = c(3.0, 1.0, 0.3) | |
D_inf = c(14.0, 14.0, 14.0) | |
S = c(as.numeric(d5$popData2018[1]), 0, 0) # putting in the country's population size | |
I = c(1, 0, 0) | |
R = c(0, 0, 0) | |
thetas_states_table = cbind(time, R0, D_inf, S, I, R) | |
thetas_states_df = as.data.frame(thetas_states_table, stringsAsFactors=FALSE) | |
# Make some parameters into "free" parameters, to be inferred | |
thetas_states_df$R0 = c("free", "free", "free") | |
thetas_states_df$I[1] = "free" | |
thetas_states_df | |
# Run the params_to_likelihood_v1() function once, to see your starting likelihood | |
data.points = d5$active_cases | |
params = c(3.0, 2.0, 0.3, 0.03) # starting parameter values | |
params_to_likelihood_v1(params, data.points, thetas_states_df, delay_val=14, printvals=TRUE, makePlots=FALSE) | |
# Running the Maximum Likelihood search with the optim() function. | |
# LOOK AT THE OUTPUT THAT PRINTS TO SCREEN!! | |
delay_val = 14 | |
ML_results = optim(par=params, fn=params_to_likelihood_v1, data.points=data.points, thetas_states_df=thetas_states_df, delay_val=delay_val, printvals=TRUE, method="L-BFGS-B", lower=0.0, control=list(fnscale=-1)) | |
# Graphing the ML model | |
# Take the learned parameters from "ML_results", put them | |
# into a theta_states data.frame for simulation and plotting | |
thetas_states_ML = thetas_states_df | |
TF = thetas_states_ML == "free" | |
thetas_states_ML[TF] = ML_results$par | |
for (i in 1:ncol(thetas_states_ML)) | |
{ thetas_states_ML[,i] = as.numeric(thetas_states_ML[,i]) } | |
thetas_states_ML$time[2:length(thetas_states_ML$time)] = thetas_states_ML$time[2:length(thetas_states_ML$time)] + delay_val | |
# Plot the results | |
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_ML, times=times) | |
maxy = max(max(trajectory$I), max(data.points)) | |
xvals = c(trajectory$time, times) | |
yvals = c(trajectory$I, data.points) | |
plot(x=xvals, y=yvals, pch=".", col="white", xlim=c(0, max(trajectory$time)), ylim=c(0, maxy), xlab="Day", ylab="Number of individuals") | |
lines(x=trajectory$time, y=trajectory$I, lwd=3, col="firebrick2") | |
points(times, data.points) | |
legend(x="topleft", legend=c("Active COVID-19 case count", 'ML-fitted projection of "I" (Infected)'), lty=c("blank", "solid"), lwd=c(1,3), pch=c(1, "."), col=c("black","firebrick2"), cex=0.8) | |
titletxt = paste0("ML fit, EUDC active cases from: ", country_name, "\nModel 3 (a 3-regime model) max lnL = ", round(ML_results$value, 2)) | |
title(titletxt) | |
# Save this model's parameters and log-likelihood | |
thetas_states_ML_model3 = thetas_states_ML | |
total_lnL_Model3 = ML_results$value | |
####################################################### | |
# Model 4: A 4-regime model | |
####################################################### | |
times = 1:nrow(d5) # Number of days since first case | |
time = c(1, 16, 20, 24) # NZ Level 4 lockdown went in on March 26, day 24 since 1st case | |
R0 = c(3.0, 1.0, 1.0, 0.3) | |
D_inf = c(14.0, 14.0, 14.0, 14.0) | |
S = c(as.numeric(d5$popData2018[1]), 0, 0, 0) # putting in the country's population size | |
I = c(1, 0, 0, 0) | |
R = c(0, 0, 0, 0) | |
thetas_states_table = cbind(time, R0, D_inf, S, I, R) | |
thetas_states_df = as.data.frame(thetas_states_table, stringsAsFactors=FALSE) | |
# Make some parameters into "free" parameters, to be inferred | |
thetas_states_df$R0 = c("free", "free", "free", "free") | |
thetas_states_df$I[1] = "free" | |
thetas_states_df | |
# Run the params_to_likelihood_v1() function once, to see your starting likelihood | |
data.points = d5$active_cases | |
params = c(3.0, 2.0, 1.0, 0.3, 0.03) # starting parameter values | |
params_to_likelihood_v1(params, data.points, thetas_states_df, delay_val=14, printvals=TRUE, makePlots=FALSE) | |
# Running the Maximum Likelihood search with the optim() function. | |
# LOOK AT THE OUTPUT THAT PRINTS TO SCREEN!! | |
delay_val = 14 | |
ML_results = optim(par=params, fn=params_to_likelihood_v1, data.points=data.points, thetas_states_df=thetas_states_df, delay_val=delay_val, printvals=TRUE, method="L-BFGS-B", lower=0.0, control=list(fnscale=-1)) | |
# Graphing the ML model | |
# Take the learned parameters from "ML_results", put them | |
# into a theta_states data.frame for simulation and plotting | |
thetas_states_ML = thetas_states_df | |
TF = thetas_states_ML == "free" | |
thetas_states_ML[TF] = ML_results$par | |
for (i in 1:ncol(thetas_states_ML)) | |
{ thetas_states_ML[,i] = as.numeric(thetas_states_ML[,i]) } | |
thetas_states_ML$time[2:length(thetas_states_ML$time)] = thetas_states_ML$time[2:length(thetas_states_ML$time)] + delay_val | |
# Plot the results | |
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_ML, times=times) | |
maxy = max(max(trajectory$I), max(data.points)) | |
xvals = c(trajectory$time, times) | |
yvals = c(trajectory$I, data.points) | |
plot(x=xvals, y=yvals, pch=".", col="white", xlim=c(0, max(trajectory$time)), ylim=c(0, maxy), xlab="Day", ylab="Number of individuals") | |
lines(x=trajectory$time, y=trajectory$I, lwd=3, col="firebrick2") | |
points(times, data.points) | |
legend(x="topleft", legend=c("Active COVID-19 case count", 'ML-fitted projection of "I" (Infected)'), lty=c("blank", "solid"), lwd=c(1,3), pch=c(1, "."), col=c("black","firebrick2"), cex=0.8) | |
titletxt = paste0("ML fit, EUDC active cases from: ", country_name, "\nModel 4 (a 4-regime model) max lnL = ", round(ML_results$value, 2)) | |
title(titletxt) | |
# Save this model's parameters and log-likelihood | |
thetas_states_ML_model4 = thetas_states_ML | |
total_lnL_Model4 = ML_results$value | |
# Calculate AIC and AIC model weights by | |
# | |
# (Option A) Re-purposing the R code from | |
# the Week 6 exercise | |
# | |
# (Option B) Editing the example Excel or Google Sheet files: | |
# | |
# | |
# Example spreadsheet calculating AIC and AIC model weights: | |
# | |
# Excel file on Canvas: | |
# BIOSCI 220 -> Files -> Module 2 -> Week 6 -> Week6_AIC_with_SIRs_fit_to_NZ_EUDC_data.xlsx | |
# | |
# On public web: | |
# http://phylo.wikidot.com/local--files/biosci220:quantitative-biology/Week6_AIC_with_SIRs_fit_to_NZ_EUDC_data.xlsx | |
# | |
# On Google Sheets: | |
# (You will have to copy-paste to your own Google Sheet, this one is edit-locked.) | |
# https://docs.google.com/spreadsheets/d/1g6rTd0O2Ec48AdR6-_B7N5-BjGtMEI1n2bvNCnmY8co/edit?usp=sharing | |
# |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment