nmatzke/lab04_exercise_v5.R

## lab04_exercise_v5.R

# Module 2: "Model-based Inference"
# https://canvas.auckland.ac.nz/courses/48087/pages/module-2-model-based-inference?module_item_id=747842

# You can find the Lab 4 Worksheet questions at:
#
# Canvas -> BIOSCI 220 -> Quizzes -> Lab 4 Worksheet

# The below Lab 4 code is posted at:
#
# https://gist.github.com/nmatzke/e7a49f963912174136c3d319e2981231


#######################################################
# WEEK 4 LAB
#######################################################

# When I planned Week 4 and recorded the lectures,
# COVID-19 was a faraway news story. Now, of course,
# it has taken over our lives.
#
# My goals for BIOSCI220 have been to (1) communicate
# the importance of model-based inference in modern
# science, but also (2) encourage you to think critically
# about models, and how decisions are made based on them.
#
# I usually like to emphasize #2, because there is a
# tendency, inside and outside of science, to naively
# belive whatever some model says, without much
# understanding of what goes into the model, or what
# the consequences might be if the model is wrong.
#
# However, in the light of COVID-19, we are now in the
# midst of situation where pretty literal life-and-
# death decisions are based on models -- frankly,
# rightly so, because we are in a situation where
# ignoring the models and trying to carry on with
# "business as usual" would have devastating
# consequences.
#
# Even in this situation, it is better to understand
# what is going on, and to understand why decisions
# are being made the way they are, than to just
# shrug and decide to believe or disbelieve the
# authorities on faith.
#
# To help you in this endeavour, I have scrapped some
# of my original material to focus "Module 2: Model-based
# Inference" on epidemiological models. The basic plan is:
#
# Week 4: Basics of model-based inference:
#         Lectures: What are models?
#         Lab: Likelihood and maximizing the likelihood
#
# Week 5: The SIR model (Susceptible, Infectious, Recovered)
#         Lectures: Intro to SIR models
#         Lab: Forecasting epidemics with SIR models in R (simulation)
#
# Week 6: Inference and model comparison
#         Lectures: Methods of inference
#         Lab: Inference of SIR model parameters from data
#
#######################################################


#######################################################
#
# LAB 4 SETUP INSTRUCTIONS
#
# For lab 4, you will need access to these:
#
# 1. Web browser, and access to CANVAS (for quizzes,
#    lectures, online disucssions etc.).
#
# 2. Ability to copy/paste text. This is possible even
#    on a smartphone, but probably is much easier on
#    a desktop/laptop/tablet.
#
# 3. R. You can either install R on your desktop/laptop
#    machine, or you can use R through the web, for
#    example with this website: https://rdrr.io/snippets/
#
#    Note that, for the R-through-the-web pages, you
#    will have to paste ALL necessary code in one go -
#    it does not remember whatever you ran previously.
#    This may mean that, later in the lab exercise, you
#    have to paste in a large chunk of the lab code into
#    the run window at once.
#
# 4. The TAs and I will take questions on Piazza all
#    week. I also plan to do live Zoom help sessions,
#    so if you want to access these, please install
#    Zoom.
#
#######################################################


#######################################################
# LAB 4, PART 1: READING AND THINKING --
#                ADVANTAGES/DISADVANTAGES OF MODELS
#######################################################
#
# In lieu of an in-person discussion in class, I ask
# all students to contribute to a Canvas discussion
# on "models and COVID".
#
# You will recall that we have 3 assigned readings for
# the week:
#
# 1. Asimov - The Relativity of Wrong
# 2. O'Neill - Weapons of Math Destruction
# 3. Stevens - "Why outbreaks like coronavirus spread
#               exponentially, and how to "flatten the
#               curve" (free online). March 14, 2020
#               https://www.washingtonpost.com/graphics/2020/world/corona-simulator/
#
# ASSIGNMENT: Each student should read the above
#    articles, think about what they mean about
#    models, their importance, and where and how
#    critical thinking should be applied to them.
#
#    THEN, with those thoughts in mind, *find another
#    media article* that discusses the modeling of
#    COVID-19 in New Zealand or elsewhere. There
#    are *many* such articles up right now. They cover
#    topics like describing the model projections,
#    describing debates between epidemiologists (some
#    of which have been very public!), or describing
#    the reaction/nonreaction of politicians or the
#    public to model projections.
#
#    Instructions:
#
#    1. Make a post on the CANVAS discussion page:
#
#    2. Start the post with the reference for the article:
#       Author, date, title, publication, link
#
#    3. Write a short paragraph summarizing the
#       article.
#
#    4. Write a second short paragraph giving your
#       reaction.  Don't overthink this - we are
#       in a stressful time, and I want this to be
#       open to anything from a highly academic
#       evaluation ("This article gets the science
#       wrong because...") to questions ("I wish
#       this article answered question X that I have
#       because..."), to personal reactions ("It makes
#       me angry/sad/happy that person/country X/Y
#       did/did not do Z because"...).
#
#       Include the reference & 2 paragraphs in a SINGLE post.
#
#    5. After a number of posts have gone up,
#       please post a short reply to at least one
#       other post. Ideally, a discussion of
#       interesting topics will naturally emerge, but
#       I would like to see every post get at least
#       one reaction. OF COURSE, REMEMBER TO BE POLITE
#       AND COLLEGIAL AND SCHOLARLY.
#
#    6. It's OK if different posts discuss the same articles,
#       although, for your own interest, you might try to
#       find a new article.
#
#    7. This discussion assignment will be worth 10%
#       of your lab 4 grade. The only ways to lose points
#       are (a) not do it, (b) do something insulting/
#       flippant/mean.
#
####################################################


#######################################################
# LAB 4, PART 2: INTRODUCTION TO ML INFERENCE
########################################################
#
# In Week 4, we will begin our exploration of
# model-based inference with a very simple dataset
# and model.
#
# In Weeks 5 & 6, we will apply these basic skills to
# SIR models from epidemiology, which are not nearly
# as complex or hard to understand as you might have
# thought. Basic SIR models can be coded and run in
# a few lines of R code.
#
# BACKGROUND
#
# Many biological problems, in many different fields,
# from ecology to epidemiology, are concerned with
# trying to estimate the prevalence or frequency of
# something.
#
# For example:
#
# - Out of 1000 ponds in a region, how many contain a
#   particular endangered frog species?
#
# - Out of a million people, how many are resistant
#   to measles (due to e.g. vaccination or previous
#   infection)?
#
# These examples (leaving out numerous real-world
# complexities) boil down to this: there are 2
# possible outcomes, and we are interested in how
# often each outcome occurs.
#
# Another process with 2 outcomes is coin-flipping.
#
# Coin-flipping, and any other 2-outcome process,
# can be described with a one-parameter model, using
# a binomial distribution ("bi-nomial" = "two names").
#
# (We will use coin-flipping as an example, but you can
# mentally substitute any other binomial process if
# you like.)
#
# The single parameter describing a binomial process
# is "p", the probability of one of the outcomes. The
# probability of the other outcome is 1-p.
#
# Because there are so many different probabilities and
# "p" terms used in probability
# and statistics, I will use "P_heads" - the probability
# that a coin will produce "Heads" on a single flip - to
# describe this parameter.
#
########################################################

# Let's consider some coin-flip data.
#
# Here are 100 coin flips:

coin_flips = c('H','T','H','T','H','H','T','H','H','H','T','H','H','T','T','T','T','H','H','H','H','H','H','H','H','H','H','H','H','H','H','H','H','T','T','T','H','T','T','T','H','T','T','T','H','H','H','T','T','H','H','H','T','H','H','H','T','T','H','H','H','H','H','H','H','T','T','H','H','H','H','T','T','H','H','H','T','T','H','H','H','H','H','H','T','T','T','H','H','H','H','H','H','T','H','T','H','H','T','T')

# Look at the data
coin_flips

# What is your guess at "P_heads", the probability of heads?
#
#
# In the case of binomial data, we actually have a simple
# formula to calculate the best estimate of P_heads:

# Find the heads
heads_TF = (coin_flips == "H")
heads_TF

# Find the tails
tails_TF = (coin_flips == "T")
tails_TF

numHeads = sum(heads_TF)
numHeads

numTails = sum(tails_TF)
numTails

numTotal = length(coin_flips)
numTotal

# Here's the formula:
P_heads_ML_estimate = numHeads / numTotal
P_heads_ML_estimate

# PLEASE ANSWER LAB 4 QUESTION #1, NOW:
#
# 1. What number (exactly) was reported for P_heads_ML_estimate?
#
# (Obviously the answer is 0.65, but I want you to realize that
# the computer will auto-grade this.  So, 0.65 will be marked correct,
# but .65, 0.650, etc., will be marked incorrect.  Extra spaces may
# also cause the answer to be marked incorrect, e.g. "0.65" is correct,
# but " 0.65" or "0.65 " are incorrect. For future answers, when numbers
# are asked for, be sure to report the exact number R gave, without
# spaces etc.)


# Well, duh, that seems pretty obvious.  At least it would have been, if we
# weren't thinking of coins, where we have a strong prior belief that the
# coin is probably fair.
#
# It turns out that this formula can be justified through a technique
# known as Maximum Likelihood.
#
# What does it mean to say we have a "maximum likelihood" estimate of P_heads?
#
# "Likelihood", in statistics, means "the probability of the data under the model"
#
# So, a "Maximum Likelihood estimate" of P_heads is the value of the parameter
# P_heads that maximizes the probability of seeing the data you saw, namely,
# 65 Heads and 35 Tails.
#


# Comparing likelihoods for different possible parameter values of P_heads
#
# Let's calculate the probability of these same coin flip data under the
# hypothesis/model that P_heads=0.5
#
# We'll be very inefficient, and use a for-loop, and
# if/else statements

# Loop through all 100 flips
# Make a list of the probability of
# each datum
P_heads_guess = 0.5

# Empty list of probabilities
probs_list = rep(NA, times=length(coin_flips))
probs_list

for (i in 1:length(coin_flips))
    {
    # Print an update
    cat("\nAnalysing coin flip #", i, "/", length(coin_flips), sep="")

    # Get the current coin flip
    coin_flip = coin_flips[i]

    # If the coin flip is heads, give that datum
    # probability P_heads_guess.
    # If tails, give it (1-P_heads_guess)

    if (coin_flip == "H")
        {
        probs_list[i] = P_heads_guess
        } # End if heads

    if (coin_flip == "T")
        {
        probs_list[i] = (1-P_heads_guess)
        } # End if tails
    } # End for-loop

# Look at the resulting probabilities
probs_list

# We get the probability of all the data by multiplying
# all the probabilities, with the prod() function.
likelihood_of_data_given_P_heads_guess1 = prod(probs_list)
likelihood_of_data_given_P_heads_guess1

# PLEASE ANSWER LAB 4, QUESTION #2, NOW:
# 2. What number (exactly) was reported for likelihood_of_data_given_P_heads_guess1?


# That's a pretty small number!  You'll see that it's
# just 0.5^100:
0.5^100

# A probability of 0.5 is not small, but multiply it
# 100 values of 0.5 together, and you get a small value.
# That's the probability of that specific sequence of
# heads/tails, given the hypothesis that the true
# probability is P_heads_guess.

# Let's try another parameter value:

# Loop through all 100 flips
# Make a list of the probability of
# each datum
P_heads_guess = 0.7

# Empty list of probabilities
probs_list = rep(NA, times=length(coin_flips))
probs_list

for (i in 1:length(coin_flips))
    {
    # Print an update
    cat("\nAnalysing coin flip #", i, "/", length(coin_flips), sep="")

    # Get the current coin flip
    coin_flip = coin_flips[i]

    # If the coin flip is heads, give that datum
    # probability P_heads_guess.
    # If tails, give it (1-P_heads_guess)

    if (coin_flip == "H")
        {
        probs_list[i] = P_heads_guess
        } # End if heads

    if (coin_flip == "T")
        {
        probs_list[i] = (1-P_heads_guess)
        } # End if tails
    } # End for-loop

# Look at the resulting probabilities
probs_list

# We get the probability of all the data by multiplying
# all the probabilities
likelihood_of_data_given_P_heads_guess2 = prod(probs_list)
likelihood_of_data_given_P_heads_guess2

# PLEASE ANSWER LAB 4, QUESTION #3, NOW:
# 3. What number (exactly) was reported for likelihood_of_data_given_P_heads_guess2?

# We got a different likelihood. It's also very small.
# But that's not important. What's important is,
# how many times higher is it?

likelihood_of_data_given_P_heads_guess2 / likelihood_of_data_given_P_heads_guess1

# PLEASE ANSWER LAB 4, QUESTION #4, NOW:
# 4. What number (exactly) was reported for
# likelihood_of_data_given_P_heads_guess2 / likelihood_of_data_given_P_heads_guess1 ?


# Whoa!  That's a lot higher!  This means the coin flip data is 54 times more
# probable under the hypothesis that P_heads=0.7 than under the
# hypothesis that P_heads=0.5.

# Maximum likelihood: Hopefully you can see that the data is in some sense
# "best explained" by the parameter value (the valueof P_heads) that
# maximizes the probability of the data. This is what we call the Maximum
# Likelihood solution.

# To try different values of P_heads, we could keep copying and pasting code,
# but that seems annoying.  Let's define an R "function" instead.
#
# Functions, in R, collect a number of commands, and run them when you
# type the name of the function. Functions are simple programs. All of
# the R commands you have ever run, e.g. mean(), prod(), etc. are all
# just functions that R has pre-programmed.
#
# NOTES ON FUNCTIONS IN R:
# * To run, functions have to be followed by (). Usually, but not always,
#   inputs go into the ().  One example of a function that needs no inputs
#   is getwd(), which gets your current working directory (wd) and prints it
#   to screen:

getwd()

# * To see the code that is running inside the function, you can often
#   just type the name of the function, without the (). This works for
#   most R functions except for "primitive"  ones (these are core R
#   functions, like getwd, mean, length, etc., that are programmed
#   a different way to be super-fast).
#
# * Typing the names of functions and figuring out exactly what is
#   happening in the code is a great way to learn R and learn a
#   specific scientific field, since you can see *exactly* what
#   some scientist is doing in their analysis.
#
# * For functions that are part of packages, you can get the help
#   page by type a ? before the function name. E.g., ?mean will open
#   the help page for the "mean" function.  R help is often not
#   written for beginners, but it is better than nothing.
#
# * When you try to run a function and get an error, this usually
#   means you left out a required input, or you put in the wrong
#   kind of input (e.g., if the input is supposed to be a number,
#   and you put in a word like "hello", you will get an error
#   message.
#
# When you type

mean("hello")

# ...what happens?

# PLEASE ANSWER LAB 4, QUESTION #5, NOW:
# 5. When you run 'mean("hello")', paste in the last part of the error message you get back.
# Your paste should being with "argument" and end with "NA".


# Let's define a function
#
# This function calculates the probability of a
# sequence of coin flip data, given a value of P_heads_guess
calc_prob_coin_flip_data <- function(P_heads_guess, coin_flips)
    {
    # Empty list of probabilities
    probs_list = rep(NA, times=length(coin_flips))
    probs_list

    for (i in 1:length(coin_flips))
        {
        # Print an update
        #cat("\nAnalysing coin flip #", i, "/", length(coin_flips), sep="")

        # Get the current coin flip
        coin_flip = coin_flips[i]

        # If the coin flip is heads, give that datum
        # probability P_heads_guess.
        # If tails, give it (1-P_heads_guess)

        if (coin_flip == "H")
            {
            probs_list[i] = P_heads_guess
            } # End if heads

        if (coin_flip == "T")
            {
            probs_list[i] = (1-P_heads_guess)
            } # End if tails
        } # End for-loop

    # Look at the resulting probabilities
    probs_list

    # We get the probability of all the data by multiplying
    # all the probabilities
    likelihood_of_data_given_P_heads_guess = prod(probs_list)

    # Return result
    return(likelihood_of_data_given_P_heads_guess)
    }

# Now, we can just use this function:
calc_prob_coin_flip_data(P_heads_guess=0.5, coin_flips=coin_flips)
calc_prob_coin_flip_data(P_heads_guess=0.6, coin_flips=coin_flips)
calc_prob_coin_flip_data(P_heads_guess=0.7, coin_flips=coin_flips)

# Look at that!  We did all of that work in a split-second.

# PLEASE ANSWER LAB 4, QUESTION #6, NOW:
# 6. What number (exactly) was reported for
# calc_prob_coin_flip_data(P_heads_guess=0.6, coin_flips=coin_flips) ?

# In fact, we can make another for-loop, and search for the ML
# value of P_heads by trying all of the values and plotting them.

# Sequence of 50 possible values of P_heads between 0 and 1
P_heads_values_to_try = seq(from=0, to=1, length.out=50)
likelihoods = rep(NA, times=length(P_heads_values_to_try))

for (i in 1:length(P_heads_values_to_try))
    {
    # Get the current guess at P_heads_guess
    P_heads_guess = P_heads_values_to_try[i]

    # Calculate likelihood of the coin flip data under
    # this value of P_heads
    likelihood = calc_prob_coin_flip_data(P_heads_guess=P_heads_guess, coin_flips=coin_flips)

    # Store the likelihood value
    likelihoods[i] = likelihood
    } # End for-loop

# Here are the resulting likelihoods:
likelihoods

# Let's try plotting the likelihoods to see if there's a peak
plot(x=P_heads_values_to_try, y=likelihoods)
lines(x=P_heads_values_to_try, y=likelihoods)

# Whoa! That's quite a peak!  You can see that the likelihoods
# vary over several orders of magnitude.
#
# Partially because of this extreme variation, we often use the
# logarithm of the likelihood, also known as
#
# * log-likelihood
# * lnL
#
# In R, the log() function returns the natural logarithm, while
# log10 returns the logarithms at base 10. I mention this because
# on some calculators, the LN button returns the natural log,
# and the LOG button returns the log10 logarithm.
#
#
# There are several reasons scientists, and scientific papers,
# often use the log-likelihood instead of the raw likelihood.
# These include:
#
# * Computers have a minimum numerical precision. If a likelihood
#   value gets too small, it gets rounded to 0. You can see this
#   happen if you calculate the raw likelihood of over 1000 coin-flips
#   using the parameter P_heads = 0.5
#
# Machine precision can vary between computers, so I will give you
# the results on my machine (but, try it on yours):

0.5^1073

# ...equals 9.881313e-324, which is a very small number

0.5^1074

# ...equals 4.940656e-324, which is half the size of the previous number,
# which makes sense, because we just multiplied it by 0.5
#
# However, on my machine,

0.5^1075

# ...equals 0. Instead of one more coin-flip causing the likelihood to be
# halved again, zero is infinitely less than 4.940656e-324. So the
# computer has made a "mistake" here.
#
# Log-likelihoods do not have this problem. Compare:

1073 * log(0.5)
1074 * log(0.5)
1075 * log(0.5)
1076 * log(0.5)

# We can convert log-likelihoods to raw-likelihood with exp(), which reverses
# the log() operation.

exp(1073 * log(0.5))
exp(1074 * log(0.5))
exp(1075 * log(0.5))
exp(1076 * log(0.5))

# You can see that, working in log-likelihood space, we have no
# numerical precision issues, but with raw likelihoods, we do.

# PLEASE ANSWER LAB 4, QUESTION #7, NOW:
# 7. What number (exactly) was reported for 1075 * log(0.5) ?


# Other reasons scientists use log-likelihoods instead of likelihoods:
#
# * The log-likelihood and raw-likelihood will have their maximums
#   at the same parameter values.
#
# * Taking the logarithm of an equation converts multiplication to
#   addition, which can make calculus much easier.
#
# * Log-likelihoods are easier to read and interpret than raw
#   likelihoods, once you get used to log-likelihoods. For any
#   large dataset, the raw likelihood will be a very tiny
#   number, hard to read/remember/interpret.
#
# * It turns out that a number of fundamental statistical measures
#   and tests use the log-likelihood as an input. These include
#   the Likelihood Ratio Test (LRT) and Akaike Information Criterion (AIC).
#   We will talk about these in future weeks.


# PLOTTING THE LOG-LIKELIHOOD

# We will repeat the code to calculate the likelihood
# for a sequence of 50 possible values of P_heads between 0 and 1:

P_heads_values_to_try = seq(from=0, to=1, length.out=50)
likelihoods = rep(NA, times=length(P_heads_values_to_try))

for (i in 1:length(P_heads_values_to_try))
    {
    # Get the current guess at P_heads_guess
    P_heads_guess = P_heads_values_to_try[i]

    # Calculate likelihood of the coin flip data under
    # this value of P_heads
    likelihood = calc_prob_coin_flip_data(P_heads_guess=P_heads_guess, coin_flips=coin_flips)

    # Store the likelihood value
    likelihoods[i] = likelihood
    } # End for-loop

# Here are the resulting likelihoods:
likelihoods

# Let's take the log (the natural log, i.e. the base is exp(1)).
log_likelihoods = log(likelihoods, base=exp(1))

plot(x=P_heads_values_to_try, y=log_likelihoods)
lines(x=P_heads_values_to_try, y=log_likelihoods)

# Let's plot the raw- and log-likelihood together:
par(mfrow=c(2,1))
plot(x=P_heads_values_to_try, y=likelihoods, main="Likelihood (L) of the data")
lines(x=P_heads_values_to_try, y=likelihoods)

plot(x=P_heads_values_to_try, y=log_likelihoods, main="Log-likelihood (LnL) of the data")
lines(x=P_heads_values_to_try, y=log_likelihoods)

# PLEASE ANSWER LAB 4, QUESTION #8, NOW:
# 8. Compare the raw-likelihood curve and the log-likelihood curve. Does it look
# like the top of each curve appears in about the same location on the
# x-axis? (i.e., at the same parameter value for P_heads?)


#
# Maximum likelihood optimization
#
# You can see that the maximum likelihood of the data occurs when
# P_heads is somewhere around 0.6 or 0.7.  What is it
# exactly?
#
# We could just keep trying more values until we find whatever
# precision we desire.  But, R has a function for
# maximum likelihood optimization!
#
# It's called optim().  Optim() takes a function as an input.
# Fortunately, we've already written a function!
#
# Let's modify our function a bit to return the log-likelihood,
# and print the result:

# Function that calculates the probability of coin flip data
# given a value of P_heads_guess
calc_prob_coin_flip_data2 <- function(P_heads_guess, coin_flips)
    {
    # Empty list of probabilities
    probs_list = rep(NA, times=length(coin_flips))
    probs_list

    for (i in 1:length(coin_flips))
        {
        # Print an update
        #cat("\nAnalysing coin flip #", i, "/", length(coin_flips), sep="")

        # Get the current coin flip
        coin_flip = coin_flips[i]

        # If the coin flip is heads, give that datum
        # probability P_heads_guess.
        # If tails, give it (1-P_heads_guess)

        if (coin_flip == "H")
            {
            probs_list[i] = P_heads_guess
            } # End if heads

        if (coin_flip == "T")
            {
            probs_list[i] = (1-P_heads_guess)
            } # End if tails
        } # End for-loop

    # Look at the resulting probabilities
    probs_list

    # We get the probability of all the data by multiplying
    # all the probabilities
    likelihood_of_data_given_P_heads_guess = prod(probs_list)

    # Get the log-likelihood
    LnL = log(likelihood_of_data_given_P_heads_guess)
    LnL

    # Error correction: if -Inf, reset to a low value
    if (is.finite(LnL) == FALSE)
        {
        LnL = -1000
        }

    # Print some output
    print_txt = paste("\nWhen P_heads=", P_heads_guess, ", LnL=", LnL, sep="")
    cat(print_txt)

    # Return result
    return(LnL)
    }

# Try the function out:
LnL = calc_prob_coin_flip_data2(P_heads_guess=0.1, coin_flips=coin_flips)
LnL = calc_prob_coin_flip_data2(P_heads_guess=0.2, coin_flips=coin_flips)
LnL = calc_prob_coin_flip_data2(P_heads_guess=0.3, coin_flips=coin_flips)

# Looks like it works!  Let's use optim() to search for he
# best P_heads value:

# Set a starting value of P_heads
starting_value = 0.1

# Set the limits of the search
limit_bottom = 0
limit_top = 1

optim_result = optim(par=starting_value, fn=calc_prob_coin_flip_data2, coin_flips=coin_flips, method="L-BFGS-B", lower=limit_bottom, upper=limit_top, control=list(fnscale=-1))

# You can see the search print out as it proceeds.

# Let's see what ML search decided on:
optim_result

# Let's compare the LnL from ML search, with the binomial mean
optim_result$par

# Here's the formula:
P_heads_ML_estimate = numHeads / numTotal
P_heads_ML_estimate

# PLEASE ANSWER LAB 4, QUESTION #9, NOW:
# 9. Is the ML value of P_heads_guess found by numerical optimization
# (found in optim_result$par ) very close to the ML value found
# by analytical solution (P_heads_ML_estimate)?

# PLEASE ANSWER LAB 4, QUESTION #10, NOW:
# 10. What was the maximized log-likelihood found by optim (hint: look at
# optim_result$value)?  Numerical optimization results might vary
# slightly, so round this value to 2 decimal places in your answer.


# Where does that formula -- the "analytic solution" for P_heads_ML_estimate
# come from?  We will discuss this further in future lectures, but here is a summary.
#
# It turns out that, in simple cases, we can use calculus to derive
# "analytical solutions" for maximizing the likelihood.  Basically, this involves:
#
# - taking the derivative of the likelihood function
#   (the derivative is the slope of a function)
# - setting the formula for the derivative to equal 0
#   (because, when the derivative=0, you are at a maximum or minimum)
# - solving for the parameter value that causes the derivative to
#   equal 0.
#
# This same basic calculus-based procedure provides a justification for
# the formulas that are used in e.g. linear regression.
#
# The nice things about calculus-derived formulas for maximizing likelihood
# -- known as "analytic solutions" -- include:
#
# - it is very fast
# - the solution can be found without a computer (this was useful back in
#   the 20th century, when statistics was being developed without computers)
# - there is no chance of optimization error (optimization errors can occur
#   if likelihood surfaces are too flat, too bumpy, etc.)
#
# This raises the question:
# Why would anyone ever go through all the optimization rigamarole, when
# they could just calculate P_heads directly?
#
# Well, only in simple cases do we have a formula for the maximum likelihood
# estimate.  The optim() strategy works whether or not
# there is a simple formula.
#
# In real life science, with complex models of biological processes (such as
# epidemiological models), ML optimization gets used A LOT, but most scientists
# don't learn it until graduate school, if then.
#
# WHAT WE LEARNED
#
# You can now see that there is nothing magical, or even particularly
# difficult, about these concepts:
#
# - likelihood
# - log-likelihood
# - maximum likelihood (ML)
# - inferring a parameter through ML
#
#
# IMPLICATIONS
#
# Despite their simplicity, it turns out that likelihood is one of
# the core concepts in model-based inference, and statistics
# generally.  Bayesian analyses have likelihood as a core component
# (we will talk about this more later), and methods from introductory
# statistics (linear models etc.)
#######################################################


#######################################################
# PART 3: BUILDING BLOCKS OF MODELS IN R
#######################################################
#
# R has a number of functions that speed up the process
# of building models, and simulating from them.
#
# For example, for binomial processes, these are:
#
# dbinom(x, size, prob, log) -- the probability density
# (the likelihood) of x "heads"
#
# size = the number of tries (the number of coin flips)
#
# prob = the parameter value (e.g. P_heads)
#
# log = should the likelihood be log-transformed when output?
#
#
# The other functions are:
#
# rbinom(n, size, prob) -- simulate n binomial outcomes,
# using "size" and "prob" as above
#
# pbinom(q, ...) - used for calculating tail probabilities
# (proportion of the distribution less than the value q)
#
# qbinom(p, ...) - reverse of pbinom()
#
#
# Any time you hear about "probability densities", "distributions"
# "random number generators", "random draws", or "simulations"
# in R, functions like these are being used.
#
# For uniform distributions, the equivalent functions are:
#
# dunif, runif, punif, qunif
#
# For normal (=Gaussian/bell-curve) distributions, the equivalents are:
#
#
# dnorm, rnorm, pnorm, qnorm


# Random seeds, and random number generators

# Random numbers in computers are not truly random. To make random numbers
# repeatable, a "seed" number is always used.

# Compare:

rnorm(n=1, mean=6, sd=1)
rnorm(n=1, mean=6, sd=1)
rnorm(n=1, mean=6, sd=1)

set.seed(54321)
rnorm(n=1, mean=6, sd=1)
rnorm(n=1, mean=6, sd=1)

set.seed(54321)
rnorm(n=1, mean=6, sd=1)
rnorm(n=1, mean=6, sd=1)

set.seed(54321)
rnorm(n=1, mean=6, sd=1)
rnorm(n=1, mean=6, sd=1)

# By re-setting the seed, we re-set the random number generator

# PLEASE ANSWER LAB 4, QUESTION #11, NOW:
# 11. If you set the random number seed to 54322, what is the first
# random number generated by rnorm(n=1, mean=6, sd=1)?


# Generate 1000 random numbers from a normal distribution, and
# then plot a histogram of them, with this code:

random_numbers = rnorm(n=1000, mean=6, sd=1)
hist(random_numbers)

# PLEASE ANSWER LAB 4, QUESTION #12, NOW:
# 12. What shape do you see?


# PLEASE ANSWER LAB 4, QUESTION #13, NOW:
# 13. I am 6 feet tall. What is the raw likelihood of my height,
# assuming that I am a random draw from a population that is
# 7 feet tall, with a standard deviation of 0.5 feet?

dnorm(x=6.0, mean=7.0, sd=0.5)

# PLEASE ANSWER LAB 4, QUESTION #14, NOW:
# 14. What is the log-likelihood of my height?
log(dnorm(x=6.0, mean=7.0, sd=0.5))

# PLEASE ANSWER LAB 4, QUESTION #15, NOW:
# 15. If we change the model of my population from mean height=7
# feet to mean height=6 feet, will the likelihood of my 6-foot
# height go up or down?

# PLEASE ANSWER LAB 4, QUESTION #16, NOW:
# 16. What is the raw likelihood of my height, assuming I am a
# random draw from a population that is 6 feet tall, with
# a standard deviation of 0.5 feet?

dnorm(x=6.0, mean=6.0, sd=0.5)

# NOTE: The "d" in "dnorm" refers to probability density. For continuous data,
# the probability of any exact value is 0, because there are an infinite number
# of possible values for a continuous variable. So we talk about the probability
# density instead of the exact probability (or probability mass). Probability
# densities work the same as probability masses for our purposes.


# PLEASE ANSWER LAB 4, QUESTION #17, NOW:
# 17. What is the corresponding log-likelihood?


# PLEASE ANSWER LAB 4, QUESTION #18, NOW:
# 18. What shape does dnorm() make, for a variety of
# height values?  Try:
#
height_values = seq(4, 8, 0.01)
probability_densities = dnorm(x=height_values, mean=6.0, sd=0.5)
plot(x=height_values, y=probability_densities)


#######################################################
# LAB 4 EXERCISE: PLEASE FILL OUT THE EXERCISE ON CANVAS.
# IT IS DUE BY 11:59 PM, SATURDAY, APRIL 4.
#######################################################


#######################################################
# LAB 5: WHEN YOU FINISH LAB 4, WHICH WAS JUST
# INTRODUCTORY MATERIAL, YOU ARE WELCOME TO MOVE
# ON TO LAB 5, WHEN I POST IT. IT WILL INVOLVE
# SIR MODELS OF EPIDEMICS, AND SO IT WILL BE MUCH MORE
# INTERACTIVE.
#######################################################