SirmaXX/solution.R

## gistfile1.txt
---
title: "R Notebook"
output: html_notebook
---


# Chapter 10  Canonical correlation Analysis  (CH10 p 539)

Chapter is related to Partitioning the covariance matrix (p73 ch2) .(ABbrevation :kısaltma )

CCA seeks (araştırma) to identify  and quantity associations(ilişkili) between two sets of variables .
they are denotes data sets  $X^{1},X^{2}}$
(Developed by hotelling )

CCA focuses on the correlation between a linear combination f the variables in one set ,and a linear combination of the variables in another set .

The idea in CCA ,is first to  determine the pair of linear combinations having the largest  correlations.
(what is the purpose of CCA (exam question))


Next, we determine the pair of linear combinations having largest correlations  among( compare 3>2 thing it means compare ) all pairs uncorrelated with the initial selected pair and so on.


The pairs are linear combinations are called $\textbf{"the Canonical variables "}$,

Their correlations are called .$\textbf{" canonical correlations "}$ (are called :adlandırılmak)


## Canonical Variates and canonical correlations

For two random vectors (data array) $\textbf{X^{1}}$ and $\textbf{X^{2}}$ expectaion of mean  $E(X^{1})=\mu^{1}$ ,$E(X^{2})=\mu^{2}$
(işlemler full rank olmak  zorunda çünkü full rank olmazsa tersini alamayız)


$$cov(x^{1}, x^{2})=
\begin{bmatrix}
\sigma_{11} & ... & \sigma_{12} \\
\sigma_{21} & ... & \sigma_{22}\\
\end{bmatrix} \\
$$


as
different covarinces
$cov(x^1)=\sum_{11}$,$cov(x^2)=\sum_{22}$

data array denotes
$\textbf{x}= [X^{1},X^{2}]^T=[X_1^{1},X_2^{1},..X_1^{n},|X_1^{2},X_2^{2},..X_1^{n}]$

mean vectors

$\textbf{E(x)}= [\mu_1^{1},\mu_2^{2}]^T=[\mu^{1},\mu^{2}]^T $ mean vector=$\mu_{(p+g)x1}$


linear combinations prvide simply summary measures of a set variables.

Set $U=a' X^{1}$,$V=b' X^{2}$ $\textbf{a}$ and $\textbf{b}$ are coefficient vectors !

for some pair of coeffiecient vectors $\textbf{a}$ and $\textbf{b}$ .Then we obtain the covariances as
(this calculations from chapter 2 )

$Cov(U)=Cov(a' X^{1})= a' x \sum_{11} x a$
$Cov(V)=Cov(b' X^{1})= b' x \sum_{22} x b$


 $Cov(U,V)=a' \sum_{12} b$

 $Corr(U,V)=\frac{a' \sum_{12} b}{\sqrt{a' x \sum_{11} x a}\sqrt{b' x \sum_{22} x b}} $


 Not:scaler ,matris ,vector nedir bak
 linear bağlantı sayın çoksa correlasyonlar matristir


korelasyon olabildiğince büyük olmalıdır.

### we define following

the first pair of canonical variables(canonical variate ) is the pair of linear combination of $U_{1}$,$V_{1}$having unit variances which maximaize the correlation.

The second pair of canonical variables is the pair of linear combinations of $U_2$ ,$V_{1}$having unit variances which maximaize the correlation among all choices that are uncorrelated with the pair of canonical variables .

....


At the $k^{th}$ step,

The $k^{th}$ pair of canonical variables is the pair of linear combinations $U_k$ ,$V_{k}$ having unit variances ,which maximize the correlation among all choices "uncorrelated" with the previous(k-1) canonical variable pairs.


"memorize canonical covariance "vb tanımları sorabilir!!!


The correlation between the $k^{th}$ pair of canonical variables is called $k^{th}$ canonical correlation.

(In cca our purpose largest posible correlation,maximum corr(u,v))


$max_{a,b} \quad corr(U,V)=\rho_1^{*}$ is attained by the linear combinations .$U_1=a'X^{1}$ and$V_1=b'X^{2}$


(standart pca bak)

not :a'katsaylar için eigenvectors bul


spectral decomposition $A=\sum{i=1}{k} \lambda_i eiei'$ help to find cov matrix inverse


$A^{-1}=\sum{i=1}{k} \frac{1}{\lambda_i} eiei'$

$U_{1}=\textbf{e_{1}^1} \sum_{11}^{-1/2} X^{(1)}$
$V_{1}=\textbf{f_{1}^1} \sum_{11}^{-1/2} X^{(2)}$


here,$\rho_1{*} >= \rho_2{*} >= ...>=\rho_K{*} $ are the eigenvalues of $\sum_{11}^{-1/2}\sum_{12}\sum_{22}^{-1}\sum_{21}\sum_{11}^{-1/2}$
and e1,e2,..,ek are the eigenvectors of the corresponding eigenvalues.


$f_{1},f_{2},..,f_{k} $ are the eigenvectors of $\sum_{22}^{-1/2}\sum_{21}\sum_{11}^{-1}\sum_{12}\sum_{22}^{-1/2}$


fi is proportional to$\sum_{12}^{-1/2}\sum_{21}\sum_{11}^{-1/2}(matrix) .ei(vector) $


## uncorrelated canonical variances

$Var(Uk)=var(V_k)=1 $
1$Cov(U_k,U_l)=0 \quad k\neq l$
2.$Cov(V_k,V_l)=0 \quad k\neq l$
3.$Cov(U_k,V_l)=0 \quad k\neq l$ , $k,l=1,.....,p$


### Example ( page 543)

Suppose $Z^{1}=[Z_1^{1},Z_2^{1}]'$ are standardized variables and  $Z^{2}=[Z_1^{2},Z_2^{2}]'$ are also  standartied varables .


## solution.R
resource :https://stat.ethz.ch/pipermail/r-help/2008-April/160662.html

p11 <- matrix(c(1.0, 0.4,
              0.4,1.0),ncol = 2, byrow = TRUE)
p12 <- matrix(c(0.5, 0.6,
                0.3,0.4),ncol = 2, byrow = TRUE)

p21 <- matrix(c(0.5, 0.3,
                  0.6,0.4),ncol = 2, byrow = TRUE)

p22 <- matrix(c(1.0, 0.2,
                0.2,1.0),ncol = 2, byrow = TRUE)

"%^%" <- function(x, n)
  with(eigen(x), vectors %*% (values^n * t(vectors)))

result =(p11%^% (-0.5)) %*% p12 %*% solve(p22) %*% p21 %*% (p11%^% (-0.5))

result
	---
	title: "R Notebook"
	output: html_notebook
	---


	# Chapter 10 Canonical correlation Analysis (CH10 p 539)

	Chapter is related to Partitioning the covariance matrix (p73 ch2) .(ABbrevation :kısaltma )

	CCA seeks (araştırma) to identify and quantity associations(ilişkili) between two sets of variables .
	they are denotes data sets $X^{1},X^{2}}$
	(Developed by hotelling )

	CCA focuses on the correlation between a linear combination f the variables in one set ,and a linear combination of the variables in another set .

	The idea in CCA ,is first to determine the pair of linear combinations having the largest correlations.
	(what is the purpose of CCA (exam question))


	Next, we determine the pair of linear combinations having largest correlations among( compare 3>2 thing it means compare ) all pairs uncorrelated with the initial selected pair and so on.


	The pairs are linear combinations are called $\textbf{"the Canonical variables "}$,

	Their correlations are called .$\textbf{" canonical correlations "}$ (are called :adlandırılmak)


	## Canonical Variates and canonical correlations

	For two random vectors (data array) $\textbf{X^{1}}$ and $\textbf{X^{2}}$ expectaion of mean $E(X^{1})=\mu^{1}$ ,$E(X^{2})=\mu^{2}$
	(işlemler full rank olmak zorunda çünkü full rank olmazsa tersini alamayız)




	$$cov(x^{1}, x^{2})=
	\begin{bmatrix}
	\sigma_{11} & ... & \sigma_{12} \\
	\sigma_{21} & ... & \sigma_{22}\\
	\end{bmatrix} \\
	$$


	as
	different covarinces
	$cov(x^1)=\sum_{11}$,$cov(x^2)=\sum_{22}$

	data array denotes
	$\textbf{x}= [X^{1},X^{2}]^T=[X_1^{1},X_2^{1},..X_1^{n},\|X_1^{2},X_2^{2},..X_1^{n}]$

	mean vectors

	$\textbf{E(x)}= [\mu_1^{1},\mu_2^{2}]^T=[\mu^{1},\mu^{2}]^T $ mean vector=$\mu_{(p+g)x1}$


	linear combinations prvide simply summary measures of a set variables.

	Set $U=a' X^{1}$,$V=b' X^{2}$ $\textbf{a}$ and $\textbf{b}$ are coefficient vectors !

	for some pair of coeffiecient vectors $\textbf{a}$ and $\textbf{b}$ .Then we obtain the covariances as
	(this calculations from chapter 2 )

	$Cov(U)=Cov(a' X^{1})= a' x \sum_{11} x a$
	$Cov(V)=Cov(b' X^{1})= b' x \sum_{22} x b$


	$Cov(U,V)=a' \sum_{12} b$

	$Corr(U,V)=\frac{a' \sum_{12} b}{\sqrt{a' x \sum_{11} x a}\sqrt{b' x \sum_{22} x b}} $


	Not:scaler ,matris ,vector nedir bak
	linear bağlantı sayın çoksa correlasyonlar matristir



	korelasyon olabildiğince büyük olmalıdır.

	### we define following

	the first pair of canonical variables(canonical variate ) is the pair of linear combination of $U_{1}$,$V_{1}$having unit variances which maximaize the correlation.

	The second pair of canonical variables is the pair of linear combinations of $U_2$ ,$V_{1}$having unit variances which maximaize the correlation among all choices that are uncorrelated with the pair of canonical variables .

	....


	At the $k^{th}$ step,

	The $k^{th}$ pair of canonical variables is the pair of linear combinations $U_k$ ,$V_{k}$ having unit variances ,which maximize the correlation among all choices "uncorrelated" with the previous(k-1) canonical variable pairs.


	"memorize canonical covariance "vb tanımları sorabilir!!!



	The correlation between the $k^{th}$ pair of canonical variables is called $k^{th}$ canonical correlation.

	(In cca our purpose largest posible correlation,maximum corr(u,v))


	$max_{a,b} \quad corr(U,V)=\rho_1^{*}$ is attained by the linear combinations .$U_1=a'X^{1}$ and$V_1=b'X^{2}$


	(standart pca bak)

	not :a'katsaylar için eigenvectors bul



	spectral decomposition $A=\sum{i=1}{k} \lambda_i eiei'$ help to find cov matrix inverse


	$A^{-1}=\sum{i=1}{k} \frac{1}{\lambda_i} eiei'$

	$U_{1}=\textbf{e_{1}^1} \sum_{11}^{-1/2} X^{(1)}$
	$V_{1}=\textbf{f_{1}^1} \sum_{11}^{-1/2} X^{(2)}$


	here,$\rho_1{} >= \rho_2{} >= ...>=\rho_K{*} $ are the eigenvalues of $\sum_{11}^{-1/2}\sum_{12}\sum_{22}^{-1}\sum_{21}\sum_{11}^{-1/2}$
	and e1,e2,..,ek are the eigenvectors of the corresponding eigenvalues.


	$f_{1},f_{2},..,f_{k} $ are the eigenvectors of $\sum_{22}^{-1/2}\sum_{21}\sum_{11}^{-1}\sum_{12}\sum_{22}^{-1/2}$


	fi is proportional to$\sum_{12}^{-1/2}\sum_{21}\sum_{11}^{-1/2}(matrix) .ei(vector) $


	## uncorrelated canonical variances

	$Var(Uk)=var(V_k)=1 $
	1$Cov(U_k,U_l)=0 \quad k\neq l$
	2.$Cov(V_k,V_l)=0 \quad k\neq l$
	3.$Cov(U_k,V_l)=0 \quad k\neq l$ , $k,l=1,.....,p$


	### Example ( page 543)

	Suppose $Z^{1}=[Z_1^{1},Z_2^{1}]'$ are standardized variables and $Z^{2}=[Z_1^{2},Z_2^{2}]'$ are also standartied varables .
	resource :https://stat.ethz.ch/pipermail/r-help/2008-April/160662.html

	p11 <- matrix(c(1.0, 0.4,
	0.4,1.0),ncol = 2, byrow = TRUE)
	p12 <- matrix(c(0.5, 0.6,
	0.3,0.4),ncol = 2, byrow = TRUE)

	p21 <- matrix(c(0.5, 0.3,
	0.6,0.4),ncol = 2, byrow = TRUE)

	p22 <- matrix(c(1.0, 0.2,
	0.2,1.0),ncol = 2, byrow = TRUE)

	"%^%" <- function(x, n)
	with(eigen(x), vectors %% (values^n t(vectors)))

	result =(p11%^% (-0.5)) %% p12 %% solve(p22) %% p21 %% (p11%^% (-0.5))

	result