jacobcvt12/abc_convergence.R

## abc_convergence.R
set.seed(1) # reproducibility

S <- 1000 # number of desired samples from posterior

theta_true <- 3.4 # true mean
n <- 1000 # number of data points
y <- rnorm(n, theta_true) # observed data

theta <- rnorm(S, 0, 10) # allocate chain for mean

s <- 1 # sample tracker
j <- 1 # iteration tracker (just FYI)

epsilon <- 0.5

while (s <= S) {
    # 1. Draw theta_i ~ pi(theta)
    theta_i <- rnorm(1, 0, 10)

    # 2. Simulate x_i ~ p(x | theta_i)
    x_i <- rnorm(n, theta_i)

    # 3. Reject theta_i if rho(S(x_i), S(y)) > epsilon
    if (abs(mean(x_i) - mean(y)) < epsilon) {
        theta[s] <- theta_i
        s <- s + 1
    }

    j <- j + 1
}

# MCMC chain plot
ts.plot(theta)

# posterior summaries
quantile(theta, c(0.025, 0.5, 0.975))

# algorithm 3

theta <- rnorm(S, 0, 10) # allocate chain for mean
theta_t <- theta[1]  # initialize by sampling from prior

j <- 1 # acceptance tracker (just FYI)

epsilon <- 0.5
delta <- 2 # proposal density SD

for (s in 2:S) {
    # 1. Simulate theta_prime ~ F(theta_t, delta)
    theta_prime <- rnorm(1, theta_t, delta)

    # 2. Simulate x ~ p(x | theta_prime)
    x <- rnorm(n, theta_prime)

    # 3. If rho(S(x), S(y)) < epsilon
    if (abs(mean(x) - mean(y)) < epsilon) {
        # a. u ~ U(0, 1)
        u <- log(runif(1, 0, 1))

        # b. if u...
        alpha <- dnorm(theta_prime, 0, 10, log=TRUE) -
                 dnorm(theta_t, 0, 10, log=TRUE)

        if (u < alpha) {
            theta[s] <- theta_prime
            theta_t <- theta_prime
            j <- j + 1 # track acceptance
        } else {
            theta[s] <- theta_t
        }
    }

    theta[s] <- theta_t
}

# MCMC chain plot
ts.plot(theta)

# acceptance ratio
j / S

# posterior summaries
quantile(theta, c(0.025, 0.5, 0.975))

## abc_convergence.Rmd
---
title: "ABC and Convergence of MCMC"
author: "Jacob Carey"
date: \today
output:
    beamer_presentation:
        toc: true
        theme: "Szeged"
        slide_level: 2
        fig_caption: no
---

# Approximate Bayesian Computation

## Posterior Approximation

>- Likelihood: $P(X | \theta)$

>- Prior: $P(\theta)$

>- Posterior: $P(\theta | X) \propto P(X | \theta) P(\theta)$

## Example - Normal Model with Known Variance

- Likelihood: $X \sim \text{Normal}(\mu)$ ($\sigma^2$ known)

- Prior: $\mu \sim \text{Normal}(\mu_0, \sigma^2_0)$

- Posterior: $P(\mu | X)$

## Metropolis-Hastings (Type of MCMC Algorithm)

>- Generic (nearly) always works instance of MCMC

>- Easy to implement

>- Draw $\theta_{t+1} \sim F(\theta | \theta_t, \delta)$ ($F$ is proposal distribution)

>- Accept $\theta_{t+1}$ with probability $\alpha=P(X | \theta_{t+1})P(\theta_{t+1})/(P(X | \theta_t)P(\theta_t))$

## MCMC Overview

- Draw $S$ samples from $P(\theta | X)$

- Repeat $L$ times (creating $L$ chains of $S$ samples each)

- Compare each set of $S$ samples to check convergence (next section!)

- Combine posterior samples from chains, determine posterior summaries (e.g. mean of samples for posterior mean), publish

## Likelihood

>- Sometimes, likelihood is computationally difficult

>- Sometimes, it's not really possible to formulate a likelihood

>- This is where Approximate Bayesian Computation (ABC) can help

>- Also called Likelihood-free method

>- Note: is generally slower than MH

## ABC Alg. 1

- Given observation $y$, repeat until $S$ points have been accepted

>- Draw $\theta_i \sim p(\theta)$

>- Simulate $X_i \sim p(x | \theta_i)$

>- Accept $\theta_i$ if $X_i = y$

## ABC Alg. 2

- Given observation $y$, repeat until $S$ points have been accepted

>- Draw $\theta_i \sim p(\theta)$

>- Simulate $X_i \sim p(x | \theta_i)$

>- Accept $\theta_i$ if $\rho(S(X_i), S(y)) < \epsilon$

## ABC Alg. 2 Example

See code

## ABC Alg. 3 (MCMC-ABC)

>- Draw $\theta_{t+1} \sim F(\theta | \theta_t, \delta)$ ($F$ is proposal distribution)

>- Simulate $X_i \sim p(x | \theta_i)$

>- If $\theta_i$ if $\rho(S(X_i), S(y)) < \epsilon$

>- Accept $\theta_{t+1}$ with probability $\alpha=P(\theta_{t+1})/P(\theta_t)$

>- If rejected at any point, retain previous value

## ABC Alg. 3 Example

See code

## ABC Further

- Many improvements that can be made

- One of which is assigning a prior to epsilon

- Others involve SMC

- See review by Beaumont (2010) for more information

# Convergence

## MCMC Overview

- Draw $S$ samples from $P(\theta | X)$

- Repeat $L$ times (creating $L$ chains of $S$ samples each)

- Compare each set of $S$ samples to check convergence (this section!)

- Combine posterior samples from chains, determine posterior summaries (e.g. mean of samples for posterior mean), publish

## Gelman-Rubin Diagnostic (Estimated Potential Scale Reduction Factor)

- Compute m independent Markov chains
- Compares variance of each chain to pooled variance
- Provides estimate of how much variance could be reduced by running chains longer

http://astrostatistics.psu.edu/RLectures/diagnosticsMCMC.pdf

## R implementation (Convergence)

```{r gelm_diag}
library(coda); set.seed(1)
x <- mcmc.list(mcmc(rnorm(1000)),
               mcmc(rnorm(1000)),
               mcmc(rnorm(1000)))
gelman.diag(x)
```

## R implementation (Not converged)

```{r gelm_bad}
x <- mcmc.list(mcmc(rnorm(1000, 1)),
               mcmc(rnorm(1000, 2)),
               mcmc(rnorm(1000, 1)))
gelman.diag(x)
```

## Gelman-Rubin with CoGAPS

```{r cogaps, message=FALSE, warning=FALSE, results='hide'}
library(CoGAPS)
data(SimpSim)
A <- vector("list", 3)
atoms <- vector("list", 3)
for (l in 1:3) {
  out <- gapsRun(SimpSim.D, SimpSim.S, nFactor=3,
                 numSnapshots=1000)
  A[[l]] <- out$ASnapshots
  atoms[[l]] <- mcmc(out$atomsASamp)
}
```

```{r cogaps-clean, echo=FALSE}
atoms_mcmc <- mcmc.list(atoms)
A_11 <- mcmc.list(mcmc(A[[1]][1, 1, ]),
                  mcmc(A[[2]][1, 1, ]),
                  mcmc(A[[3]][1, 1, ]))
```

## Checking Convergence

```{r convergence}
gelman.diag(atoms_mcmc)
gelman.diag(A_11)
```
	set.seed(1) # reproducibility

	S <- 1000 # number of desired samples from posterior

	theta_true <- 3.4 # true mean
	n <- 1000 # number of data points
	y <- rnorm(n, theta_true) # observed data

	theta <- rnorm(S, 0, 10) # allocate chain for mean

	s <- 1 # sample tracker
	j <- 1 # iteration tracker (just FYI)

	epsilon <- 0.5

	while (s <= S) {
	# 1. Draw theta_i ~ pi(theta)
	theta_i <- rnorm(1, 0, 10)

	# 2. Simulate x_i ~ p(x \| theta_i)
	x_i <- rnorm(n, theta_i)

	# 3. Reject theta_i if rho(S(x_i), S(y)) > epsilon
	if (abs(mean(x_i) - mean(y)) < epsilon) {
	theta[s] <- theta_i
	s <- s + 1
	}

	j <- j + 1
	}

	# MCMC chain plot
	ts.plot(theta)

	# posterior summaries
	quantile(theta, c(0.025, 0.5, 0.975))

	# algorithm 3

	theta <- rnorm(S, 0, 10) # allocate chain for mean
	theta_t <- theta[1] # initialize by sampling from prior

	j <- 1 # acceptance tracker (just FYI)

	epsilon <- 0.5
	delta <- 2 # proposal density SD

	for (s in 2:S) {
	# 1. Simulate theta_prime ~ F(theta_t, delta)
	theta_prime <- rnorm(1, theta_t, delta)

	# 2. Simulate x ~ p(x \| theta_prime)
	x <- rnorm(n, theta_prime)

	# 3. If rho(S(x), S(y)) < epsilon
	if (abs(mean(x) - mean(y)) < epsilon) {
	# a. u ~ U(0, 1)
	u <- log(runif(1, 0, 1))

	# b. if u...
	alpha <- dnorm(theta_prime, 0, 10, log=TRUE) -
	dnorm(theta_t, 0, 10, log=TRUE)

	if (u < alpha) {
	theta[s] <- theta_prime
	theta_t <- theta_prime
	j <- j + 1 # track acceptance
	} else {
	theta[s] <- theta_t
	}
	}

	theta[s] <- theta_t
	}

	# MCMC chain plot
	ts.plot(theta)

	# acceptance ratio
	j / S

	# posterior summaries
	quantile(theta, c(0.025, 0.5, 0.975))
	---
	title: "ABC and Convergence of MCMC"
	author: "Jacob Carey"
	date: \today
	output:
	beamer_presentation:
	toc: true
	theme: "Szeged"
	slide_level: 2
	fig_caption: no
	---

	# Approximate Bayesian Computation

	## Posterior Approximation

	>- Likelihood: $P(X \| \theta)$

	>- Prior: $P(\theta)$

	>- Posterior: $P(\theta \| X) \propto P(X \| \theta) P(\theta)$

	## Example - Normal Model with Known Variance

	- Likelihood: $X \sim \text{Normal}(\mu)$ ($\sigma^2$ known)

	- Prior: $\mu \sim \text{Normal}(\mu_0, \sigma^2_0)$

	- Posterior: $P(\mu \| X)$

	## Metropolis-Hastings (Type of MCMC Algorithm)

	>- Generic (nearly) always works instance of MCMC

	>- Easy to implement

	>- Draw $\theta_{t+1} \sim F(\theta \| \theta_t, \delta)$ ($F$ is proposal distribution)

	>- Accept $\theta_{t+1}$ with probability $\alpha=P(X \| \theta_{t+1})P(\theta_{t+1})/(P(X \| \theta_t)P(\theta_t))$

	## MCMC Overview

	- Draw $S$ samples from $P(\theta \| X)$

	- Repeat $L$ times (creating $L$ chains of $S$ samples each)

	- Compare each set of $S$ samples to check convergence (next section!)

	- Combine posterior samples from chains, determine posterior summaries (e.g. mean of samples for posterior mean), publish

	## Likelihood

	>- Sometimes, likelihood is computationally difficult

	>- Sometimes, it's not really possible to formulate a likelihood

	>- This is where Approximate Bayesian Computation (ABC) can help

	>- Also called Likelihood-free method

	>- Note: is generally slower than MH

	## ABC Alg. 1

	- Given observation $y$, repeat until $S$ points have been accepted

	>- Draw $\theta_i \sim p(\theta)$

	>- Simulate $X_i \sim p(x \| \theta_i)$

	>- Accept $\theta_i$ if $X_i = y$

	## ABC Alg. 2

	- Given observation $y$, repeat until $S$ points have been accepted

	>- Draw $\theta_i \sim p(\theta)$

	>- Simulate $X_i \sim p(x \| \theta_i)$

	>- Accept $\theta_i$ if $\rho(S(X_i), S(y)) < \epsilon$

	## ABC Alg. 2 Example

	See code

	## ABC Alg. 3 (MCMC-ABC)

	>- Draw $\theta_{t+1} \sim F(\theta \| \theta_t, \delta)$ ($F$ is proposal distribution)

	>- Simulate $X_i \sim p(x \| \theta_i)$

	>- If $\theta_i$ if $\rho(S(X_i), S(y)) < \epsilon$

	>- Accept $\theta_{t+1}$ with probability $\alpha=P(\theta_{t+1})/P(\theta_t)$

	>- If rejected at any point, retain previous value

	## ABC Alg. 3 Example

	See code

	## ABC Further

	- Many improvements that can be made

	- One of which is assigning a prior to epsilon

	- Others involve SMC

	- See review by Beaumont (2010) for more information

	# Convergence

	## MCMC Overview

	- Draw $S$ samples from $P(\theta \| X)$

	- Repeat $L$ times (creating $L$ chains of $S$ samples each)

	- Compare each set of $S$ samples to check convergence (this section!)

	- Combine posterior samples from chains, determine posterior summaries (e.g. mean of samples for posterior mean), publish

	## Gelman-Rubin Diagnostic (Estimated Potential Scale Reduction Factor)

	- Compute m independent Markov chains
	- Compares variance of each chain to pooled variance
	- Provides estimate of how much variance could be reduced by running chains longer

	http://astrostatistics.psu.edu/RLectures/diagnosticsMCMC.pdf

	## R implementation (Convergence)

	```{r gelm_diag}
	library(coda); set.seed(1)
	x <- mcmc.list(mcmc(rnorm(1000)),
	mcmc(rnorm(1000)),
	mcmc(rnorm(1000)))
	gelman.diag(x)
	```

	## R implementation (Not converged)

	```{r gelm_bad}
	x <- mcmc.list(mcmc(rnorm(1000, 1)),
	mcmc(rnorm(1000, 2)),
	mcmc(rnorm(1000, 1)))
	gelman.diag(x)
	```

	## Gelman-Rubin with CoGAPS

	```{r cogaps, message=FALSE, warning=FALSE, results='hide'}
	library(CoGAPS)
	data(SimpSim)
	A <- vector("list", 3)
	atoms <- vector("list", 3)
	for (l in 1:3) {
	out <- gapsRun(SimpSim.D, SimpSim.S, nFactor=3,
	numSnapshots=1000)
	A[[l]] <- out$ASnapshots
	atoms[[l]] <- mcmc(out$atomsASamp)
	}
	```

	```{r cogaps-clean, echo=FALSE}
	atoms_mcmc <- mcmc.list(atoms)
	A_11 <- mcmc.list(mcmc(A[[1]][1, 1, ]),
	mcmc(A[[2]][1, 1, ]),
	mcmc(A[[3]][1, 1, ]))
	```

	## Checking Convergence

	```{r convergence}
	gelman.diag(atoms_mcmc)
	gelman.diag(A_11)
	```