petered/kasper-em-on-graph

## kasper-em-on-graph
$$
\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
\newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right|_{#3}}
\newcommand{\norm}[1]{\frac12\| #1 \|_2^2}
\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
\newcommand{\blue}[1]{\color{blue}{#1}}
\newcommand{\red}[1]{\color{red}{#1}}
\newcommand{\numel}[1]{|#1|}
\newcommand{\switch}[3]{\begin{cases} #2 & \text{if } {#1} \\ #3 &\text{otherwise}\end{cases}}
\newcommand{\pderivdim}[4]{\overset{\big[#3 \times #4 \big]}{\frac{\partial #1}{\partial #2}}}
\newcommand{\softmax}{\operatorname{softmax}}
\newcommand{\Bern}{\operatorname{Bern}}
\newcommand{\Cat}{\operatorname{Cat}}
\newcommand{\sigm}{\operatorname{sigm}}
\newcommand{\logfrac}[2]{\log \left( \frac{#1}{#2} \right)}
$$


We've assumed the following graphical model:

![enter image description here](https://docs.google.com/drawings/d/e/2PACX-1vRal4bK4gu7zruAjhV3R0CvjpqDP9sbAUHGop1FojAeaJnZmedx6bwoBQY762f-MTnWuuOkpyCoG8DX/pub?w=186&h=213)

This graph tells us that we can factorize our distribution as:

\begin{align}
p(X, C, F, ID; \theta)=p(X|C, F;\theta) p(C|ID;\theta) p(ID;\theta) p(F;\theta)
\end{align}

(Where we use $\theta$ to summarize all model parameters)


Now.  How do we do EM on such a model?

**E-Step**
Find "responsibilities": $p(C | X, F, ID; \theta_{old})$

Using Bayes Rule, and looking at the dependencies in our graph, we can rewrite this so that we can directly solve for all the terms.

\begin{align}
p(C | X, F, ID; \theta_{old}) &= \frac{p(X, C, F, ID; \theta_{old})}{p(X, F, ID;\theta_{old})} \\
&= \frac{p(X, C, F, ID; \theta_{old})}{\sum_{c\in |C|}p(c, X, F, ID;\theta_{old})} \\
&= \frac{p(X|C, F;\theta_{old}) p(C|ID;\theta_{old}) p(ID;\theta_{old}) p(F;\theta_{old})}{\sum_{c\in |C|}p(X|C, F;\theta_{old}) p(C|ID;\theta_{old}) p(ID;\theta_{old}) p(F;\theta_{old})} \\
&:= \gamma(c)
\end{align}


**M-Step**
Maximize parameters:

\begin{align}
\theta_{new} &\leftarrow \argmax{\theta} \sum_{c \in |C|} p(C=c | X, F, ID; \theta_{old}) p(X, C=c, F, ID; \theta) \\
&=\argmax{\theta} \sum_{c \in |C|} \gamma(c) p(X, C=c, F, ID; \theta) \\
&=\argmax{\theta} \sum_{c \in |C|} \gamma(c) p(X|C=c, F;\theta) p(C=c|ID;\theta) p(ID;\theta) p(F;\theta)
\end{align}
	$$
	\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
	\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
	\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
	\newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right\|_{#3}}
	\newcommand{\norm}[1]{\frac12\\| #1 \\|_2^2}
	\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
	\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
	\newcommand{\blue}[1]{\color{blue}{#1}}
	\newcommand{\red}[1]{\color{red}{#1}}
	\newcommand{\numel}[1]{\|#1\|}
	\newcommand{\switch}[3]{\begin{cases} #2 & \text{if } {#1} \\ #3 &\text{otherwise}\end{cases}}
	\newcommand{\pderivdim}[4]{\overset{\big[#3 \times #4 \big]}{\frac{\partial #1}{\partial #2}}}
	\newcommand{\softmax}{\operatorname{softmax}}
	\newcommand{\Bern}{\operatorname{Bern}}
	\newcommand{\Cat}{\operatorname{Cat}}
	\newcommand{\sigm}{\operatorname{sigm}}
	\newcommand{\logfrac}[2]{\log \left( \frac{#1}{#2} \right)}
	$$


	We've assumed the following graphical model:

	![enter image description here](https://docs.google.com/drawings/d/e/2PACX-1vRal4bK4gu7zruAjhV3R0CvjpqDP9sbAUHGop1FojAeaJnZmedx6bwoBQY762f-MTnWuuOkpyCoG8DX/pub?w=186&h=213)

	This graph tells us that we can factorize our distribution as:

	\begin{align}
	p(X, C, F, ID; \theta)=p(X\|C, F;\theta) p(C\|ID;\theta) p(ID;\theta) p(F;\theta)
	\end{align}

	(Where we use $\theta$ to summarize all model parameters)


	Now. How do we do EM on such a model?

	E-Step
	Find "responsibilities": $p(C \| X, F, ID; \theta_{old})$

	Using Bayes Rule, and looking at the dependencies in our graph, we can rewrite this so that we can directly solve for all the terms.

	\begin{align}
	p(C \| X, F, ID; \theta_{old}) &= \frac{p(X, C, F, ID; \theta_{old})}{p(X, F, ID;\theta_{old})} \\
	&= \frac{p(X, C, F, ID; \theta_{old})}{\sum_{c\in \|C\|}p(c, X, F, ID;\theta_{old})} \\
	&= \frac{p(X\|C, F;\theta_{old}) p(C\|ID;\theta_{old}) p(ID;\theta_{old}) p(F;\theta_{old})}{\sum_{c\in \|C\|}p(X\|C, F;\theta_{old}) p(C\|ID;\theta_{old}) p(ID;\theta_{old}) p(F;\theta_{old})} \\
	&:= \gamma(c)
	\end{align}


	M-Step
	Maximize parameters:

	\begin{align}
	\theta_{new} &\leftarrow \argmax{\theta} \sum_{c \in \|C\|} p(C=c \| X, F, ID; \theta_{old}) p(X, C=c, F, ID; \theta) \\
	&=\argmax{\theta} \sum_{c \in \|C\|} \gamma(c) p(X, C=c, F, ID; \theta) \\
	&=\argmax{\theta} \sum_{c \in \|C\|} \gamma(c) p(X\|C=c, F;\theta) p(C=c\|ID;\theta) p(ID;\theta) p(F;\theta)
	\end{align}