nmatzke/week_06_Discussion_example_option1_v2.R

## week_06_Discussion_example_option1_v2.R

#######################################################
# DISCUSSION ASSIGNMENT: FIT R0s FOR A COUNTRY OF INTEREST
#
# For the final part of your assignment, you will
# pretend you are an epidemiologist writing a report
# on the spread of Covid-19 in a particular country,
# and commenting on what you can tell about the effectiveness
# of interventions at this point.
#
#
# You will use ML to fit three (3) or more SIR models
# to some country's  Covid-19 active cases. You will
# then compare these model fits with AIC.
#
# You will have to choose what "regimes" to use --
# i.e., the points at which R0 might change due to
# the citizenry doing physical distancing, lockdown,
# and/or release of these.
#
# This means you will have to decide how many days
# after the first case to start each regime.  You
# will do this by changing "time" as before. These
# decisions are made by you, the researcher, based on
# looking at the chart of case counts, and/or based
# on using google etc. to research when a particular
# country went into lockdown.
#
# You might also manually change "delay_val," which
# represents "how long after a change in R0, do you
# start to see it show up in the active cases data?"
#
# I found that a "delay_val" of 14 worked well, but
# you are welcome to change it if you want.
#
# There are 2 ways to do this assignment:
#
# Option #1. IF YOU HAVE R ON YOUR COMPUTER, AND HAVE
#            RELIABLE INTERNET.  Use country case count
#            data provided by the European Union.
#
# Option #2: IF YOU ARE USING AN ONLINE VERSION OF R,
#            OR HAVE WEAK INTERNET, you might not
#            be able to access the online EU data
#            from your R session. Instead, update
#            the New Zealand data
#            by hand so that it extends to the present
#            day, and then fit your models to that.
#            You can get the updated data from the
#            "Total and active cases" chart at:
#
# Hancock, Faith (2020). Covid-19 in NZ. NZ Herald.
# https://www.newsroom.co.nz/2020/03/27/1101216/covid-19-in-nz-the-numbers
#
#            We just want the "active cases" count (hover your
#            mouse over the blue line to see the numbers).
#
# I am providing example scripts for both options. These
# scripts will include all of the required functions.
#
# You will just have to:
#
# 1. Copy the whole example script to a text editor where
#    you can make/save changes.
#
# 2. Copy-paste the whole script into R
#    and run it once,
#    just to make sure it works for you. If you hit some
#    problem with Option #1, go to the Option #2 script and
#    try it.
#
# 3. Modify the script as follows:
#
#    For option #1, change the country you want to study
#    from something other than New Zealand. Change the
#    "time" options for each model as desired.
#
#    For option #2, edit the New Zealand active cases data
#    in the script to make it up-to-date. Change the "time"
#    options for each model as desired.
#
#    You are free to experiment with other changes if you
#    like, of course. (E.g., changing plot titles, line
#    colors, other features of the model, whatever.)
#
# 4. Get the log-likelihoods under each model.
#
# 5. Calculate AIC, relative likelihoods, and AIC weights
#    for each model. You can do it in R, or modify the
#    Excel or Google-Sheet examples I have provided.
#
# 6. In a Discussion post:
#
#    * include a screenshot of your AIC table, including
#      at least these columns:
#      - Model name
#      - log-likelihood
#      - number of free parameters
#      - AIC
#      - AIC model weights
#
#    * include a screenshot that plots your AIC-best
#      model against the data for your country
#
#    * Summarize your results and critically evaluate:
#
#      E.g., "I fit 4 SIR models
#      to the active-case data from country X. The models
#      had 1, 2, 3, or 4 regimes with different R0 values.
#      The AIC-best model was Model 3, gaining 55% of the
#      AIC weight. Under this best model, R0 was 6.1 for
#      the first phase of the epidemic, indicating extremely
#      rapid spread. However, after the country imposed a
#      lockdown on day 14, R0 dropped to 0.1.  However,
#      political disputes in the country led to the lockdown
#      being lifted after 7 days, causing a second wave of
#      cases with R0 estimated as 4.5. This wave appears to
#      be ongoing.
#
#      Based on these model results, country X clearly needs
#      to return to lockdown. However, visual inspection of
#      the plot of the best-fit SIR model against the data
#      suggests that the available models were not a
#      perfect fit, perhaps because of a change in case
#      definitions on Day 23 that made it appear as if
#      1000 cases occurred suddenly on that day. A more
#      sophisticated model would account for this change
#      in case definition, as well as XYZ.
#
#      References (if any)
#
# 7. Comment on at least one other student's summary,
#    pointing out what you found interesting, making
#    suggestions, or asking questions. As always, stay
#    professional.
#
#######################################################


#######################################################
# Example script for Option #1, Week 6 Discussion assignment
# (doing the assignment on a computer with R installed)
#######################################################

# Download EU worldwide data
data = read.csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv", na.strings = "", fileEncoding = "UTF-8-BOM")

# See the list of countries named in the database:
unique(data$countriesAndTerritories)

# Subset by country
country_name = "New_Zealand" # NOTE how the underscore ("_") is required in the country name!
TF = data$countriesAndTerritories == country_name  # TF = TRUE/FALSE for rows matching the country
d1 = data[TF,]

# Convert the dates to R's "Date" format, then order d by date
dates = as.Date(d1$dateRep, format="%d/%m/%Y")
d2 = cbind(dates, d1)
d2 = d2[order(d2$dates),]

# Find the first day with 1 case
TF = d2$cases > 0
first_row_with_case = (1:nrow(d2))[TF][1]

# Subset the country's table to start with the first case in that country
d3 = d2[first_row_with_case:nrow(d2),]
d3

# Messy code to "add-in" in any missing days
d4 = NULL
numdays = d3$dates[nrow(d3)] - d3$dates[1] + 1
d3_rownum = 0
for (i in 1:numdays)
	{
	if (i == 1)
		{
		d3_rownum = d3_rownum + 1
		d4 = rbind(d4, d3[d3_rownum,])
		} else {
		if ( (d3$dates[d3_rownum+1] - d4[(i-1),]$dates[1]) == 1 )
			{
			d3_rownum = d3_rownum + 1
			d4 = rbind(d4, d3[d3_rownum,])
			} else {
			newdate = d4[(i-1),]$dates[1] + 1
			day = format(newdate, "%d")
			month = format(newdate, "%m")
			year = format(newdate, "%Y")
			dateRep = paste(day, month, year, sep="/")
			newrow = as.data.frame(matrix(data=c(newdate, dateRep, day, month, year, 0, 0, country_name, d4[(i-1),]$geoId[1], d4[(i-1),]$countryterritoryCode[1], d4[(i-1),]$popData2018[1], d4[(i-1),]$continentExp[1]), nrow=1), stringsAsFactors=FALSE)
			names(newrow) = c("dates", "dateRep", "day", "month", "year", "cases", "deaths", "countriesAndTerritories", "geoId", "countryterritoryCode", "popData2018", "continentExp")
			newrow$dates[1] = as.character(as.Date(paste(year, month, day, sep="-"), format="%Y-%m-%d"))
			d4 = rbind(d4, newrow)
			class(d4[,1]) = "Date"
			}
		}
	}
d4

# The number of cumulative cases on a particular day equals the new cases that day,
# plus the number of cases on the 13 previous days (Note: we are ASSUMING the Duration of
# Infection, D_inf, equals 14 here - this database doesn't collect the number of
# Recovered).

D_inf = 14
active_cases = NULL
recovered = NULL
for (i in 1:nrow(d4))
	{
	startrow = i - (D_inf-1)
	if (startrow <= 1)
		{
		startrow = 1
		tmp_recovered = 0
		} else {
		tmp_recovered = sum(as.numeric(d4$cases[1:(startrow-1)]))
		}

	tmp_active = sum(as.numeric(d4$cases[startrow:i]))

	# Store these results
	active_cases = c(active_cases, tmp_active)
	recovered = c(recovered, tmp_recovered)
	} # END for (i in 1:nrow(d4))

# Add in the new columns:
d5 = cbind(d4, active_cases, recovered)

# Look at your data table:
head(d5)  # look at first 6 rows
tail(d5)  # look at last 6 rows
dim(d5)   # look at the number of rows & columns

# Plot the data:
plot(x=d5$dates, y=d5$active_cases, xlab="Date", ylab="Active cases", col="firebrick2")
titletxt = paste0("Active COVID-19 cases in ", country_name, "\n(EUDC data, assuming D_inf=14)")
title(titletxt)
points(x=d5$dates, y=d5$recovered, col="green3")


#######################################################
# LOAD OUR FUNCTIONS FOR SIR SIMULATION AND INFERENCE
#######################################################

library(deSolve)

likelihood_single_point <- function(data.point, model.point, log=TRUE)
	{
	# Get the probability density of the count, assuming
	# the observed count is a poisson process with an
	# average value matching the true process.
	#
	# This allows for variation, e.g. if a case is reported
	# days after it occurs.
	likelihood = dpois(x=data.point, lambda=model.point, log=log)

	# Return the likelihood value
	return(likelihood)
	} # END of likelihood_single_point

likelihood_time_series <- function(data.points, model.points, log=TRUE)
	{
	# Error check - return a stop() message if check fails
	if (length(data.points) != length(model.points))
		{ stop("ERROR in likelihood_time_series(): the lengths of data.points and model.points must match.") }

	# Calculate the likelihoods of the data.points
	likelihoods = mapply(FUN=likelihood_single_point, data.point=data.points, model.point=model.points, MoreArgs=list(log=log))

	# If you calculated log-likelihoods, then sum them
	# If you calculated raw likelihoods, then multiply them
	if (log == TRUE)
		{
		total_likelihood = sum(likelihoods)
		} else {
		total_likelihood = prod(likelihoods)  # prod=take the product
		} # END if/else statement: if (log == TRUE)

	# Return that value
	return (total_likelihood)
	} # END of likelihood_time_series

#######################################################
# Functions from Week 5
#######################################################
library(deSolve)
SIR_ode <- function(time, state, parameters)
	{
	# The above inputs are:
	# time = a time point
	# state = an R list of the 3 state values, i.e. the
	#         population sizes of S, I, and R
	# parameters = an R list of the parameters R0 and D_inf

	# Convert the input parameters "R0" and "D_inf"
	# into the rates "beta" and "nu"
	beta <- parameters[["R0"]] / parameters[["D_inf"]]
	nu <- 1 / parameters[["D_inf"]]

	# Extract the current values of the states S, I, and R
	S <- state[["S"]]
	I <- state[["I"]]
	R <- state[["R"]]

	# Calculate N, the total of S+I+R,
	# i.e. the population size
	N <- S + I + R

	# Write out the system of
	# Ordinary Differential Equations (ODEs)
	dS <- -beta * I/N * S
	dI <- beta * I/N * S - nu * I
	dR <- nu * I

	# Return the current rates of change of S, I, and R:
	return(list(c(dS, dI, dR)))
	} # End of function SIR_ode

simulate_SIR <- function(theta, init.state, times)
	{
	trajectory <- data.frame(ode(y = init.state,
                times = times,
                func = SIR_ode,
                parms = theta,
                method = "ode45"))
	return(trajectory)
	} # End of simulate_SIR()

simulate_SIR_changes <- function(thetas_states_df, times)
	{
	# Loop through the rows
	trajectory = NULL
	for (rn in 1:nrow(thetas_states_df))
		{
		if (rn == 1)
			{
			init.state = c(S=thetas_states_df$S[rn], I=thetas_states_df$I[rn], R=thetas_states_df$R[rn])
			theta = list(R0=thetas_states_df$R0[rn], D_inf=thetas_states_df$D_inf[rn])
			} else {
			last_row = trajectory[nrow(trajectory),]
			init.state = c(S=last_row$S+thetas_states_df$S[rn], I=last_row$I+thetas_states_df$I[rn], R=last_row$R+thetas_states_df$R[rn])
			theta = list(R0=thetas_states_df$R0[rn], D_inf=thetas_states_df$D_inf[rn])
			}

		# Read in the parameters for the initial, versus later, time-bins
		if (rn >= nrow(thetas_states_df))
			{
			start_time = thetas_states_df$time[rn]
			end_time = max(times)
			} else {
			start_time = thetas_states_df$time[rn]
			end_time = thetas_states_df$time[rn+1]
			}

		TF1 = times >= start_time
		TF2 = times <= end_time
		TF = (TF1 + TF2) == 2
		tmp_times = times[TF]

		tmp_trajectory = simulate_SIR(theta=theta, init.state=init.state, times=tmp_times)

		if (rn == 1)
			{
			trajectory = rbind(trajectory, tmp_trajectory)
			} else {
			trajectory_minus_last_state = trajectory[-nrow(trajectory),]
			trajectory = rbind(trajectory_minus_last_state, tmp_trajectory)
			} # if (rn == 1)
		} # END for (rn in 1:nrow(thetas_states_df))
	return(trajectory)
	} # End simulate_SIR_changes()


params_to_likelihood_v1 <- function(params, data.points, thetas_states_df, delay_val=14, printvals=TRUE, makePlots=FALSE)
	{
	# The times are just the length of the data.points
	times = 1:length(data.points)

	# (1) input the free parameters
	thetas_states_temp = thetas_states_df
	TF = thetas_states_temp == "free"
	thetas_states_temp[TF] = params

	# Error check
	if ((max(thetas_states_temp$time)+delay_val) > max(times))
		{
		stop("ERROR in params_to_likelihood_v1: there is a 'thetas_states_df$time'+delay_val greater than found in 'times'")
		}

	# Make sure everything in thetas_states_df is numeric, instead of character
	for (i in 1:ncol(thetas_states_temp))
		{
		thetas_states_temp[,i] = as.numeric(thetas_states_temp[,i])
		}

	# Ad2 the delay (allows time for interventions to show up
	# in the detections)
	thetas_states_temp$time[2:length(thetas_states_temp$time)] = thetas_states_temp$time[2:length(thetas_states_temp$time)] + delay_val

	# Calculate the trajectory given the parameters,
	# input into model.points
	trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_temp, times)
	model.points = trajectory$I


	lnL = likelihood_time_series(data.points, model.points, log=TRUE)

	if (is.finite(lnL)== FALSE)
		{
		lnL=-10e100
		}

	# Print, if desired
	if (printvals == TRUE)
		{
		print(thetas_states_temp)
		cat("log-likelihood = ", lnL, "\n", sep="")
		}
	# Make plots, if desired
	if (makePlots == TRUE)
		{
		plot_trajectory_lines(trajectory, theta=NULL, thetas_states_df=thetas_states_df, title_prefix="Plot #5:")
		}
	return(lnL)
	} # END params_to_likelihood_v1

#######################################################
# FINISHED LOADING OUR FUNCTIONS FOR SIR SIMULATION AND INFERENCE
#######################################################


#######################################################
# Model 1: A 1-regime model
#######################################################
times = 1:nrow(d5)      # Number of days since first case
time = c(1, nrow(d5)-1) # The 2nd entry is the 2nd-to-last day,
                        # this forces most of the data to
                        # use the first R0
R0 = c(3.0, 2.5)
D_inf = c(14.0, 14.0)
S = c(as.numeric(d5$popData2018[1]), 0) # putting in the country's population size
I = c(1, 0)
R = c(0, 0)
thetas_states_table = cbind(time, R0, D_inf, S, I, R)
thetas_states_df = as.data.frame(thetas_states_table, stringsAsFactors=FALSE)

# NOTE: ONLY 2 FREE PARAMETERS
thetas_states_df$R0[1] = "free"
thetas_states_df$I[1] = "free"
thetas_states_df

# Run the params_to_likelihood_v1() function once, to see your starting likelihood
data.points = d5$active_cases
params = c(2.5, 0.03) # starting parameter values
params_to_likelihood_v1(params, data.points, thetas_states_df, delay_val=0, printvals=TRUE, makePlots=FALSE)

# Running the Maximum Likelihood search with the optim() function.
# LOOK AT THE OUTPUT THAT PRINTS TO SCREEN!!
delay_val = 0
ML_results = optim(par=params, fn=params_to_likelihood_v1, data.points=data.points, thetas_states_df=thetas_states_df, delay_val=delay_val, printvals=TRUE, method="L-BFGS-B", lower=0.0, control=list(fnscale=-1))

# Graphing the ML model
# Take the learned parameters from "ML_results", put them
# into a theta_states data.frame for simulation and plotting
thetas_states_ML = thetas_states_df
TF = thetas_states_ML == "free"
thetas_states_ML[TF] = ML_results$par
for (i in 1:ncol(thetas_states_ML))
	{ thetas_states_ML[,i] = as.numeric(thetas_states_ML[,i]) }
thetas_states_ML$time[2:length(thetas_states_ML$time)] = thetas_states_ML$time[2:length(thetas_states_ML$time)] + delay_val


# Plot the results
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_ML, times=times)
maxy = max(max(trajectory$I), max(data.points))
xvals = c(trajectory$time, times)
yvals = c(trajectory$I, data.points)
plot(x=xvals, y=yvals, pch=".", col="white", xlim=c(0, max(trajectory$time)), ylim=c(0, maxy), xlab="Day", ylab="Number of individuals")
lines(x=trajectory$time, y=trajectory$I, lwd=3, col="firebrick2")
points(times, data.points)
legend(x="topleft", legend=c("Active COVID-19 case count", 'ML-fitted projection of "I" (Infected)'), lty=c("blank", "solid"), lwd=c(1,3), pch=c(1, "."), col=c("black","firebrick2"), cex=0.8)

titletxt = paste0("ML fit, EUDC active cases from: ", country_name, "\nModel 1 (a 1-regime model) max lnL = ", round(ML_results$value, 2))
title(titletxt)

# Save this model's parameters and log-likelihood
thetas_states_ML_model1 = thetas_states_ML
total_lnL_Model1 = ML_results$value


#######################################################
# Model 2: A 2-regime model
#######################################################
times = 1:nrow(d5)      # Number of days since first case
time = c(1, 24) # NZ Level 4 lockdown went in on March 26, day 24 since 1st case
R0 = c(3.0, 0.3)
D_inf = c(14.0, 14.0)
S = c(as.numeric(d5$popData2018[1]), 0) # putting in the country's population size
I = c(1, 0)
R = c(0, 0)
thetas_states_table = cbind(time, R0, D_inf, S, I, R)
thetas_states_df = as.data.frame(thetas_states_table, stringsAsFactors=FALSE)

# Make some parameters into "free" parameters, to be inferred
thetas_states_df$R0 = c("free", "free")
thetas_states_df$I[1] = "free"
thetas_states_df

# Run the params_to_likelihood_v1() function once, to see your starting likelihood
data.points = d5$active_cases
params = c(3.0, 0.3, 0.03) # starting parameter values
params_to_likelihood_v1(params, data.points, thetas_states_df, delay_val=14, printvals=TRUE, makePlots=FALSE)

# Running the Maximum Likelihood search with the optim() function.
# LOOK AT THE OUTPUT THAT PRINTS TO SCREEN!!
delay_val = 14
ML_results = optim(par=params, fn=params_to_likelihood_v1, data.points=data.points, thetas_states_df=thetas_states_df, delay_val=delay_val, printvals=TRUE, method="L-BFGS-B", lower=0.0, control=list(fnscale=-1))

# Graphing the ML model
# Take the learned parameters from "ML_results", put them
# into a theta_states data.frame for simulation and plotting
thetas_states_ML = thetas_states_df
TF = thetas_states_ML == "free"
thetas_states_ML[TF] = ML_results$par
for (i in 1:ncol(thetas_states_ML))
	{ thetas_states_ML[,i] = as.numeric(thetas_states_ML[,i]) }
thetas_states_ML$time[2:length(thetas_states_ML$time)] = thetas_states_ML$time[2:length(thetas_states_ML$time)] + delay_val


# Plot the results
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_ML, times=times)
maxy = max(max(trajectory$I), max(data.points))
xvals = c(trajectory$time, times)
yvals = c(trajectory$I, data.points)
plot(x=xvals, y=yvals, pch=".", col="white", xlim=c(0, max(trajectory$time)), ylim=c(0, maxy), xlab="Day", ylab="Number of individuals")
lines(x=trajectory$time, y=trajectory$I, lwd=3, col="firebrick2")
points(times, data.points)
legend(x="topleft", legend=c("Active COVID-19 case count", 'ML-fitted projection of "I" (Infected)'), lty=c("blank", "solid"), lwd=c(1,3), pch=c(1, "."), col=c("black","firebrick2"), cex=0.8)

titletxt = paste0("ML fit, EUDC active cases from: ", country_name, "\nModel 2 (a 2-regime model) max lnL = ", round(ML_results$value, 2))

# Save this model's parameters and log-likelihood
thetas_states_ML_model2 = thetas_states_ML
total_lnL_Model2 = ML_results$value


#######################################################
# Model 3: A 3-regime model
#######################################################
times = 1:nrow(d5)      # Number of days since first case
time = c(1, 16, 24) # NZ Level 4 lockdown went in on March 26, day 24 since 1st case
R0 = c(3.0, 1.0, 0.3)
D_inf = c(14.0, 14.0, 14.0)
S = c(as.numeric(d5$popData2018[1]), 0, 0) # putting in the country's population size
I = c(1, 0, 0)
R = c(0, 0, 0)
thetas_states_table = cbind(time, R0, D_inf, S, I, R)
thetas_states_df = as.data.frame(thetas_states_table, stringsAsFactors=FALSE)

# Make some parameters into "free" parameters, to be inferred
thetas_states_df$R0 = c("free", "free", "free")
thetas_states_df$I[1] = "free"
thetas_states_df

# Run the params_to_likelihood_v1() function once, to see your starting likelihood
data.points = d5$active_cases
params = c(3.0, 2.0, 0.3, 0.03) # starting parameter values
params_to_likelihood_v1(params, data.points, thetas_states_df, delay_val=14, printvals=TRUE, makePlots=FALSE)

# Running the Maximum Likelihood search with the optim() function.
# LOOK AT THE OUTPUT THAT PRINTS TO SCREEN!!
delay_val = 14
ML_results = optim(par=params, fn=params_to_likelihood_v1, data.points=data.points, thetas_states_df=thetas_states_df, delay_val=delay_val, printvals=TRUE, method="L-BFGS-B", lower=0.0, control=list(fnscale=-1))

# Graphing the ML model
# Take the learned parameters from "ML_results", put them
# into a theta_states data.frame for simulation and plotting
thetas_states_ML = thetas_states_df
TF = thetas_states_ML == "free"
thetas_states_ML[TF] = ML_results$par
for (i in 1:ncol(thetas_states_ML))
	{ thetas_states_ML[,i] = as.numeric(thetas_states_ML[,i]) }
thetas_states_ML$time[2:length(thetas_states_ML$time)] = thetas_states_ML$time[2:length(thetas_states_ML$time)] + delay_val


# Plot the results
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_ML, times=times)
maxy = max(max(trajectory$I), max(data.points))
xvals = c(trajectory$time, times)
yvals = c(trajectory$I, data.points)
plot(x=xvals, y=yvals, pch=".", col="white", xlim=c(0, max(trajectory$time)), ylim=c(0, maxy), xlab="Day", ylab="Number of individuals")
lines(x=trajectory$time, y=trajectory$I, lwd=3, col="firebrick2")
points(times, data.points)
legend(x="topleft", legend=c("Active COVID-19 case count", 'ML-fitted projection of "I" (Infected)'), lty=c("blank", "solid"), lwd=c(1,3), pch=c(1, "."), col=c("black","firebrick2"), cex=0.8)

titletxt = paste0("ML fit, EUDC active cases from: ", country_name, "\nModel 3 (a 3-regime model) max lnL = ", round(ML_results$value, 2))
title(titletxt)

# Save this model's parameters and log-likelihood
thetas_states_ML_model3 = thetas_states_ML
total_lnL_Model3 = ML_results$value


#######################################################
# Model 4: A 4-regime model
#######################################################
times = 1:nrow(d5)      # Number of days since first case
time = c(1, 16, 20, 24) # NZ Level 4 lockdown went in on March 26, day 24 since 1st case
R0 = c(3.0, 1.0, 1.0, 0.3)
D_inf = c(14.0, 14.0, 14.0, 14.0)
S = c(as.numeric(d5$popData2018[1]), 0, 0, 0) # putting in the country's population size
I = c(1, 0, 0, 0)
R = c(0, 0, 0, 0)
thetas_states_table = cbind(time, R0, D_inf, S, I, R)
thetas_states_df = as.data.frame(thetas_states_table, stringsAsFactors=FALSE)

# Make some parameters into "free" parameters, to be inferred
thetas_states_df$R0 = c("free", "free", "free", "free")
thetas_states_df$I[1] = "free"
thetas_states_df

# Run the params_to_likelihood_v1() function once, to see your starting likelihood
data.points = d5$active_cases
params = c(3.0, 2.0, 1.0, 0.3, 0.03) # starting parameter values
params_to_likelihood_v1(params, data.points, thetas_states_df, delay_val=14, printvals=TRUE, makePlots=FALSE)

# Running the Maximum Likelihood search with the optim() function.
# LOOK AT THE OUTPUT THAT PRINTS TO SCREEN!!
delay_val = 14
ML_results = optim(par=params, fn=params_to_likelihood_v1, data.points=data.points, thetas_states_df=thetas_states_df, delay_val=delay_val, printvals=TRUE, method="L-BFGS-B", lower=0.0, control=list(fnscale=-1))

# Graphing the ML model
# Take the learned parameters from "ML_results", put them
# into a theta_states data.frame for simulation and plotting
thetas_states_ML = thetas_states_df
TF = thetas_states_ML == "free"
thetas_states_ML[TF] = ML_results$par
for (i in 1:ncol(thetas_states_ML))
	{ thetas_states_ML[,i] = as.numeric(thetas_states_ML[,i]) }
thetas_states_ML$time[2:length(thetas_states_ML$time)] = thetas_states_ML$time[2:length(thetas_states_ML$time)] + delay_val


# Plot the results
trajectory = simulate_SIR_changes(thetas_states_df=thetas_states_ML, times=times)
maxy = max(max(trajectory$I), max(data.points))
xvals = c(trajectory$time, times)
yvals = c(trajectory$I, data.points)
plot(x=xvals, y=yvals, pch=".", col="white", xlim=c(0, max(trajectory$time)), ylim=c(0, maxy), xlab="Day", ylab="Number of individuals")
lines(x=trajectory$time, y=trajectory$I, lwd=3, col="firebrick2")
points(times, data.points)
legend(x="topleft", legend=c("Active COVID-19 case count", 'ML-fitted projection of "I" (Infected)'), lty=c("blank", "solid"), lwd=c(1,3), pch=c(1, "."), col=c("black","firebrick2"), cex=0.8)

titletxt = paste0("ML fit, EUDC active cases from: ", country_name, "\nModel 4 (a 4-regime model) max lnL = ", round(ML_results$value, 2))
title(titletxt)

# Save this model's parameters and log-likelihood
thetas_states_ML_model4 = thetas_states_ML
total_lnL_Model4 = ML_results$value


# Calculate AIC and AIC model weights by
#
# (Option A) Re-purposing the R code from
# the Week 6 exercise
#
# (Option B) Editing the example Excel or Google Sheet files:
#
#
# Example spreadsheet calculating AIC and AIC model weights:
#
# Excel file on Canvas:
# BIOSCI 220 -> Files -> Module 2 -> Week 6 -> Week6_AIC_with_SIRs_fit_to_NZ_EUDC_data.xlsx
#
# On public web:
# http://phylo.wikidot.com/local--files/biosci220:quantitative-biology/Week6_AIC_with_SIRs_fit_to_NZ_EUDC_data.xlsx
#
# On Google Sheets:
# (You will have to copy-paste to your own Google Sheet, this one is edit-locked.)
# https://docs.google.com/spreadsheets/d/1g6rTd0O2Ec48AdR6-_B7N5-BjGtMEI1n2bvNCnmY8co/edit?usp=sharing
#