zhouzhuojie/Summary

## Summary
List
--------------

* Statistics - MCMC
* Statistics - One long run vs Many short runs
* CS - Non-backtracking
* CS - Tree Sampling vs Graph Sampling
* Other things


Statistics - MCMC
-------------

#### 1. Asymptotic Variance is Key

Consider random walk sequence as a AR(1) series.

$$ \mu = E(X_n) = \rho E(X_{n-1}) + E(e_n) $$

And starting from stationary distribution:

$$ \sigma^2_{CLT} = \gamma_0 + 2\sum_{k=1}^\infty \gamma_k,$$
where $\gamma_k = Cov(Y_j, Y_{j+k})$,

and $Y_j$ is some attribute function of $X_j$: $Y_j = f(X_j)$

#### 2. Esitmate the Asymptotic Variance Using Batch Means

* Batch Mean Method
* Overlapping batch mean method

#### 3. Effective Sample Size
Roughly $\sqrt{\frac{1+\rho}{1-\rho}}$

Statistics - One long run vs Many short runs
----------------

#### Questions

1. Samples in the sample pool are **dependent**, should we 'thin' it?
  * Yes, if space/storage is not computational or statistical efficient
  * Yes, if $g(X_i)$ is computational expensive
  * No, otherwise
  * typical, for Online Social Network, we do not need to 'thin' the random walk samples.
2. Should we discard "burn-in"?
  * Burn-ins are just pushing the initial distribution from a point {x} to {$P^nx$}
  * See experiment table [here](https://docs.google.com/spreadsheet/ccc?key=0AmIBiL5nyHqXdHZOdE9vcnpOV01ON2pMcmtHUENTLVE&usp=sharing). Basically we do not need burn-in if we adopt one long run random walk.
3. Multiple short runs?
  * "Diversity of the initial distribution". Still, there is no guarantee to improve statistical or computational efficiency.
  * Likely to give indication of convergence faster
4. How long is enough?
  * For one long run, just use the query budget.


Statistics - Rule of Thumb for Importance Sampling
----------------
#### Introduction

$$\hat{\mu} = \frac{1}{N}\sum_{k=1}^Nh(X_k)\frac{\pi(X_k)}{p(X_k)}$$
where

$h(\cdot)$ is the attribute function,

$\pi(\cdot)$ is the expected/target distribution density function,

$p(\cdot)$ is the sampling/obtained distribution density function. $p(\cdot)$ is easy to obtain, e.g. SRW: $p(v) = \frac{k_v}{2|E|}$

#### Ideal case for Important Sampling

$$p(X_k) \approx h(X_k)\pi(X_k)$$

Non-backtracking Random Walk
----------------------

#### Introduction
$P$ and $P'$ are two random walks' transition matrix. If:
$$
\begin{equation}
\begin{split}
P(j,i)P'(e_{ij}, e_{jk}) &= P(j,k)P'(e_{kj},e_{ji}) \\
P'(e_{ij}, e_{jk}) &\geq P(j,k)
\end{split}
\end{equation}
$$
then the random walk with $P'$ with have a unique stationary distribution $\pi'(e_{ij}) = \pi(i)P(i,j)$, and $\sigma'^2(f)\leq \sigma^2(f)$.

#### Design of the non-backtracking random walk

* Never go back from the income edge
* $P(j,i) = 1/d(j)$
* $P'(e_{ji}, e_{jk}) = 1/(d(j)-1)$
* $P'(e_{ji}, e_{jk}) \geq P(j,i)$

#### New Ideas
1. Pushing $P'(e_{ij}, e_{jk}) \geq P(j,k) $
  * Divide the neighbors into two groups, $C_1$ and $C_2$
    * Come in $C_1$ then go out to $C_2$, and vice versa.
    * Although sometime it increase the probability of going out to another group, the probability $P'(e_{ji}, e_{jk})$ remains the same.
      * $$ P'(e_{ji}, e_{jk}) = 0.5\cdot 0 + 0.5\cdot 1/(0.5\cdot d) = 1/d = P(j,k) $$
    * Maybe some experiments?


CS - Tree Sampling vs Graph Sampling
--------------

#### Questions
1. Is tree sampling better than graph sampling?
2. Find better starting node for tree sampling?
3. On-the-fly drill down for tree sampling. How to accurately calculate the probability of a node being accessed?

#### Some observations
1. Adding random edges to a tree will increase the conductance of the graph.
2. By selecting the nodes that are connected to cross-cutting edges, can we design random walk that are much faster than original simple random walk?
	List
	--------------

	* Statistics - MCMC
	* Statistics - One long run vs Many short runs
	* CS - Non-backtracking
	* CS - Tree Sampling vs Graph Sampling
	* Other things



	Statistics - MCMC
	-------------

	#### 1. Asymptotic Variance is Key

	Consider random walk sequence as a AR(1) series.

	$$ \mu = E(X_n) = \rho E(X_{n-1}) + E(e_n) $$

	And starting from stationary distribution:

	$$ \sigma^2_{CLT} = \gamma_0 + 2\sum_{k=1}^\infty \gamma_k,$$
	where $\gamma_k = Cov(Y_j, Y_{j+k})$,

	and $Y_j$ is some attribute function of $X_j$: $Y_j = f(X_j)$

	#### 2. Esitmate the Asymptotic Variance Using Batch Means

	* Batch Mean Method
	* Overlapping batch mean method

	#### 3. Effective Sample Size
	Roughly $\sqrt{\frac{1+\rho}{1-\rho}}$

	Statistics - One long run vs Many short runs
	----------------

	#### Questions

	1. Samples in the sample pool are dependent, should we 'thin' it?
	* Yes, if space/storage is not computational or statistical efficient
	* Yes, if $g(X_i)$ is computational expensive
	* No, otherwise
	* typical, for Online Social Network, we do not need to 'thin' the random walk samples.
	2. Should we discard "burn-in"?
	* Burn-ins are just pushing the initial distribution from a point {x} to {$P^nx$}
	* See experiment table [here](https://docs.google.com/spreadsheet/ccc?key=0AmIBiL5nyHqXdHZOdE9vcnpOV01ON2pMcmtHUENTLVE&usp=sharing). Basically we do not need burn-in if we adopt one long run random walk.
	3. Multiple short runs?
	* "Diversity of the initial distribution". Still, there is no guarantee to improve statistical or computational efficiency.
	* Likely to give indication of convergence faster
	4. How long is enough?
	* For one long run, just use the query budget.



	Statistics - Rule of Thumb for Importance Sampling
	----------------
	#### Introduction

	$$\hat{\mu} = \frac{1}{N}\sum_{k=1}^Nh(X_k)\frac{\pi(X_k)}{p(X_k)}$$
	where

	$h(\cdot)$ is the attribute function,

	$\pi(\cdot)$ is the expected/target distribution density function,

	$p(\cdot)$ is the sampling/obtained distribution density function. $p(\cdot)$ is easy to obtain, e.g. SRW: $p(v) = \frac{k_v}{2\|E\|}$

	#### Ideal case for Important Sampling

	$$p(X_k) \approx h(X_k)\pi(X_k)$$

	Non-backtracking Random Walk
	----------------------

	#### Introduction
	$P$ and $P'$ are two random walks' transition matrix. If:
	$$
	\begin{equation}
	\begin{split}
	P(j,i)P'(e_{ij}, e_{jk}) &= P(j,k)P'(e_{kj},e_{ji}) \\
	P'(e_{ij}, e_{jk}) &\geq P(j,k)
	\end{split}
	\end{equation}
	$$
	then the random walk with $P'$ with have a unique stationary distribution $\pi'(e_{ij}) = \pi(i)P(i,j)$, and $\sigma'^2(f)\leq \sigma^2(f)$.

	#### Design of the non-backtracking random walk

	* Never go back from the income edge
	* $P(j,i) = 1/d(j)$
	* $P'(e_{ji}, e_{jk}) = 1/(d(j)-1)$
	* $P'(e_{ji}, e_{jk}) \geq P(j,i)$

	#### New Ideas
	1. Pushing $P'(e_{ij}, e_{jk}) \geq P(j,k) $
	* Divide the neighbors into two groups, $C_1$ and $C_2$
	* Come in $C_1$ then go out to $C_2$, and vice versa.
	* Although sometime it increase the probability of going out to another group, the probability $P'(e_{ji}, e_{jk})$ remains the same.
	* $$ P'(e_{ji}, e_{jk}) = 0.5\cdot 0 + 0.5\cdot 1/(0.5\cdot d) = 1/d = P(j,k) $$
	* Maybe some experiments?


	CS - Tree Sampling vs Graph Sampling
	--------------

	#### Questions
	1. Is tree sampling better than graph sampling?
	2. Find better starting node for tree sampling?
	3. On-the-fly drill down for tree sampling. How to accurately calculate the probability of a node being accessed?

	#### Some observations
	1. Adding random edges to a tree will increase the conductance of the graph.
	2. By selecting the nodes that are connected to cross-cutting edges, can we design random walk that are much faster than original simple random walk?