chrishanretty/multivariate_normal_approximation.md

## multivariate_normal_approximation.md

      
    Raw
  

              multivariate_normal_approximation.md
            
          
    % Constrained multivariate normals in Stan
% Chris Hanretty
% 2014/02/27
Suppose we are interested in recovered the location parameters of a multivariate normal distribution with a known covariance matrix.
Specifically, suppose we are interested in recovering the location parameters given a covariance matrix that forces draws from this multivariate normal to sum to one.
This would be useful for recovering information about `true' latent levels of party support given manifest polling data.
In this case, we could use polling figures ($y_i$ for $i=1,...,I$) to construct a covariance matrix of the following form:
$$
\Sigma =
\begin{pmatrix}
y_1(1-y_1) & -y_1y_2 & \cdots &  -y_1y_I \\
-y_2y_1 & y_2(1-y_2) & \cdots & -y_2y_I \\
\vdots  & \vdots  & \ddots & \vdots  \\
-y_Iy_1 & -y_Iy_2 & \cdots & y_I(1-y_I)
\end{pmatrix}
$$
We could then pass this to Stan. However, Stan turns out to be very strict about the positive-definiteness of the covariance matrices it receives.
This note is an aide memoire that this strictness can cause seemingly valid code to choke.
We begin by loading all the libraries we need for this note.
Here, $corpcor$ is used for a utility function which adjusts matrices so that they become positive definite.
$MCMCpack$ is used to generate samples from a Dirichlet distribution which we will use as if they were polling data.
library(corpcor)
library(MCMCpack)
library(MASS)
library(rstan)
set.seed(1982)
We can now generate some combinations of party support -- about fifty realizations for eight parties should be enough.
nSims <- 50
nParties <- 8
shape <- rdirichlet(1, rep(1, nParties))
shape
##         [,1]    [,2]   [,3]    [,4]    [,5]  [,6]    [,7]    [,8]
## [1,] 0.08621 0.03882 0.0337 0.02104 0.07379 0.673 0.02248 0.05096

party.means <- rdirichlet(nSims, shape)
summary(party.means)
##        V1               V2               V3               V4        
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0002   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.1009   Mean   :0.0371   Mean   :0.0395   Mean   :0.0189  
##  3rd Qu.:0.0844   3rd Qu.:0.0001   3rd Qu.:0.0001   3rd Qu.:0.0000  
##  Max.   :0.9205   Max.   :0.5739   Max.   :0.7923   Max.   :0.4899  
##        V5               V6               V7               V8        
##  Min.   :0.0000   Min.   :0.0404   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.3976   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.8040   Median :0.0000   Median :0.0000  
##  Mean   :0.0514   Mean   :0.6775   Mean   :0.0377   Mean   :0.0370  
##  3rd Qu.:0.0256   3rd Qu.:0.9679   3rd Qu.:0.0000   3rd Qu.:0.0105  
##  Max.   :0.7460   Max.   :1.0000   Max.   :0.9399   Max.   :0.6055

We now create the covariance matrices described above, and create a separate covariance matrix for each realization of party support.
These we save in the list $res$.
compCov <- function(x) {
    retval <- outer(x, -x, "*")
    diag(retval) <- x * (1 - x)
    return(retval)
}
## And now apply to each draw
res <- vector("list", nSims)
for (i in 1:nSims) {
    res[[i]] <- compCov(party.means[i, ])
}
We can draw multivariate normal samples using these covariance matrices without any problems in R.
Here is just such a draw, and a test to show that the draws are all (approximately) equal to one.
out <- vector("list", nSims)
for (i in 1:nSims) {
    out[[i]] <- mvrnorm(n = nSims, mu = party.means[i, ], Sigma = res[[i]])
}
## Do the samples sum to one ?
sum1.test <- unlist(lapply(out, function(x) all.equal(rowSums(x), rep(1, nrow(x)))))
table(sum1.test)
## sum1.test
## Mean relative difference: 1.718e-08 Mean relative difference: 1.814e-08 
##                                   1                                   1 
##                                TRUE 
##                                  48

The problem arises because not all of these matrices are strictly positive definite for certain tolerance thresholds.
MASS's $mvrnorm$ command seems to accept all of these as positive definite.
Other packages do not.
We can see this by applying $is.positive.definite$ from the $corpcor$ package.
## Are all covariance matrices positive-definite?
PosDef.test <- unlist(lapply(res, is.positive.definite))
table(PosDef.test)
## PosDef.test
## FALSE 
##    50

We must therefore transform these matrices so that they become positive definite for certain tolerance thresholds.
Here, I create two lists of positive definite matrices.
res2 <- lapply(res, make.positive.definite)
res3 <- lapply(res, make.positive.definite, tol = 1e-05)
### How close are these?
cor(unlist(res), unlist(res2))
## [1] 1

cor(unlist(res), unlist(res3))
## [1] 1

cor(unlist(res2), unlist(res3))
## [1] 1

The problem is that Stan will not accept the first of these two lists, because it is too picky.
Here is some Stan code that will try and recover the means of these realizations.
stan_code <- '
	data{
		int<lower=0> k; // size of vectors
		int<lower=0> nObs; // 
		vector[k] y[nObs]; // 
		cov_matrix[k] Sigma[nObs];
	}
	parameters {
		simplex[k] mu;
	}
	model {
		for (n in 1:nObs)
			y[n] ~ multi_normal(mu, Sigma[n]);
	}
'
We can estimate this by passing the following data to R.
forStan <- list(k = nParties, nObs = nSims, Sigma = res2, y = party.means)
fit <- stan(model_code = stan_code, data = forStan, iter = 2000, chains = 2)
### Error: Invalid value of Sigma: Error in function
### model32aa4661e444_stan_code_namespace::model32aa4661e444_stan_code(d):
### Sigma[k0__]Sigma[k0__] is not positive definite. Sigma[k0__](0,0) is
### 0.00028837592811670703.  failed to create the sampler; sampling not done
... but we would get the message shown above.
We must therefore pass the second list of covariance matrices, which Stan just about accepts.
### Try with a slightly different set of cov. mats
forStan <- list(k = nParties, nObs = nSims, Sigma = res3, y = party.means)
fit <- stan(model_code = stan_code, data = forStan, iter = 2000, chains = 2)
## 
## TRANSLATING MODEL 'stan_code' FROM Stan CODE TO C++ CODE NOW.
## COMPILING THE C++ CODE FOR MODEL 'stan_code' NOW.
## SAMPLING FOR MODEL 'stan_code' NOW (CHAIN 1).
## 
Iteration:    1 / 2000 [  0%]  (Warmup)
Iteration:  200 / 2000 [ 10%]  (Warmup)
Iteration:  400 / 2000 [ 20%]  (Warmup)
Iteration:  600 / 2000 [ 30%]  (Warmup)
Iteration:  800 / 2000 [ 40%]  (Warmup)
Iteration: 1000 / 2000 [ 50%]  (Warmup)
Iteration: 1200 / 2000 [ 60%]  (Sampling)
Iteration: 1400 / 2000 [ 70%]  (Sampling)
Iteration: 1600 / 2000 [ 80%]  (Sampling)
Iteration: 1800 / 2000 [ 90%]  (Sampling)
Iteration: 2000 / 2000 [100%]  (Sampling)
## Elapsed Time: 58.52 seconds (Warm-up)
##               50.62 seconds (Sampling)
##               109.14 seconds (Total)
## 
## SAMPLING FOR MODEL 'stan_code' NOW (CHAIN 2).
## 
Iteration:    1 / 2000 [  0%]  (Warmup)
Iteration:  200 / 2000 [ 10%]  (Warmup)
Iteration:  400 / 2000 [ 20%]  (Warmup)
Iteration:  600 / 2000 [ 30%]  (Warmup)
Iteration:  800 / 2000 [ 40%]  (Warmup)
Iteration: 1000 / 2000 [ 50%]  (Warmup)
Iteration: 1200 / 2000 [ 60%]  (Sampling)
Iteration: 1400 / 2000 [ 70%]  (Sampling)
Iteration: 1600 / 2000 [ 80%]  (Sampling)
Iteration: 1800 / 2000 [ 90%]  (Sampling)
Iteration: 2000 / 2000 [100%]  (Sampling)
## Elapsed Time: 63.05 seconds (Warm-up)
##               81.29 seconds (Sampling)
##               144.34 seconds (Total)

In this instance, because there's almost no information, Stan ends up with one mode where mu[6] = 1;
fit
## Inference for Stan model: stan_code.
## 2 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=2000.
## 
##         mean se_mean  sd   2.5%  25%    50%    75%  97.5% n_eff Rhat
## mu[1]    0.0     0.0 0.0    0.0    0    0.0    0.0    0.0  1304    1
## mu[2]    0.0     0.0 0.0    0.0    0    0.0    0.0    0.0  1082    1
## mu[3]    0.0     0.0 0.0    0.0    0    0.0    0.0    0.0  1311    1
## mu[4]    0.0     0.0 0.0    0.0    0    0.0    0.0    0.0  1404    1
## mu[5]    0.0     0.0 0.0    0.0    0    0.0    0.0    0.0  1500    1
## mu[6]    1.0     0.0 0.0    1.0    1    1.0    1.0    1.0  1276    1
## mu[7]    0.0     0.0 0.0    0.0    0    0.0    0.0    0.0  1541    1
## mu[8]    0.0     0.0 0.0    0.0    0    0.0    0.0    0.0  1380    1
## lp__  -111.9     0.1 2.1 -117.1 -113 -111.6 -110.3 -108.9   433    1
## 
## Samples were drawn using NUTS(diag_e) at Thu Feb 27 15:20:51 2014.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).