Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save GM3D/dcbdbedec6f9738bc2cd3e8fe2ea23fa to your computer and use it in GitHub Desktop.
Save GM3D/dcbdbedec6f9738bc2cd3e8fe2ea23fa to your computer and use it in GitHub Desktop.
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "## 近似推論\n\n何がしかの分布(しばしば事後分布 $p(z_1, \\ldots , z_n | D)$)をより計算しやすい近似分布 $q(z_1, \\ldots, z_n)$で表す手法\n\nしばしば平均場近似 $q(z_1, \\ldots, z_n) = q(z_1)\\ldots q(z_n) $と併用される\n\n平均場近似から来る近似精度の限界、仮定した近似分布と真の分布とのずれに注意\n\n## 変分推論\n平均場近似を採用することにすれば、個々のファクター q(z_i)を一つずつ順に最適化していくことで簡単に計算が可能になる。\n本文 (4.17) - (4.23) の変形により、一般に最適な q(z_i)は (4.25)の\n\n$ \\ln q(z_i) = <\\ln p(z_1, \\ldots z_n | D)>_{q(Z\\smallsetminus i)} + const = <\\ln p(D, z_1, \\ldots z_n)>_{q(Z\\smallsetminus i)} $\n\nで表される。実際には等号は達成できないかもしれないが、両辺のKLダイバージェンスを最小にするように変分を行う。"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 4.3 ポアソン混合モデルへの適用例\n### 1. ギブスサンプリング"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "using Distributions\nusing Plots\ngr()\nusing LinearAlgebra # for norm()",
"execution_count": 1,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "K = 3\n# hyper parameters for generating λ_k's\n# by the properties of gamma function, μ = αθ, v = αθ^2\n# the hyperparameters are chosen so that mean(λ) = 60, stddev(λ) = 30\nμ_λ = 60.0\nv_λ = 900.0\na = μ_λ^2 / v_λ\nθ = v_λ / μ_λ\n\np_λ = Gamma(a, θ)\nλ = rand(p_λ, K)\np_x = Poisson.(λ)",
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 3,
"data": {
"text/plain": "3-element Array{Poisson{Float64},1}:\n Poisson{Float64}(λ=118.00795488309349)\n Poisson{Float64}(λ=46.6318596876326) \n Poisson{Float64}(λ=64.20714993231182) "
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# hyper parameter of the prior for π\nα = rand(K)\nα = α./sum(α)\np_π = Dirichlet(α)\nπ = rand(p_π)",
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 4,
"data": {
"text/plain": "3-element Array{Float64,1}:\n 1.9049715232217527e-9\n 0.8906382252489854 \n 0.109361772846043 "
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# generate training data. S is represented as one-hot vector in the text, but we implement it as an integer label for the categories\n# for the sake of memory efficiency.\nN = 100000\np_s = Categorical(π)\ns = rand(p_s, N)\nx = rand.(p_x[s])\n# histogram(x, nbins=max(N, 200))",
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 5,
"data": {
"text/plain": "100000-element Array{Int64,1}:\n 47\n 64\n 57\n 52\n 39\n 45\n 52\n 39\n 60\n 43\n 50\n 47\n 64\n ⋮\n 51\n 42\n 40\n 55\n 48\n 52\n 59\n 39\n 44\n 51\n 58\n 52"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "From now on, suppose that we don't know the values of $\\pi$, $\\lambda$, $s$, but we assume we know the distribution is a mixed Poisson for sure. We then aim to guess the values for these parameters and latent variables, i.e. obtain their posterior distributions, in the form of sampling. Therefore, we want the distribution $p(S, \\lambda, \\pi | X)$. Further we assume the latent variable $S$ and parameters $\\lambda$, $\\pi$ are sampled separately as follows:\n\\begin{align}\nS & \\sim p(S|X, \\lambda, \\pi)\\\\\n\\lambda, \\pi &\\sim p(\\lambda, \\pi | X, S)\n\\end{align}\n\n$p(S|X, \\lambda, \\pi)$ can be evaluated (apart from factors independent of $S$) as\n\n\\begin{align}\np(S|X, \\lambda, \\pi) & \\propto p(X, S, \\lambda, \\pi) \\\\\n& \\propto p(X|S, \\lambda) p(S|\\pi) \\\\\n& = \\prod_{n=1}^N p(x_n | s_n, \\lambda) p(s_n | \\pi)\n\\end{align}\nwhere in our model $p(x_n | s_n, \\lambda)$ is given by Possion distribution and $p(s_n | \\pi)$ a categorical one.\nBy the calculation in (4.34) - (4.38), $s$ is sampled from a categorical distribution $s_n \\sim Cat(s_n|\\eta_n)$, where the parameter $\\eta_{n, k}$ is given by\n\\begin{align}\n\\eta_{n, k} &=\n\\exp (x_n \\ln \\lambda_k - \\lambda_k + \\ln \\pi_k) / \\sum_{k=1}^K \\exp (x_n \\ln \\lambda_k - \\lambda_k + \\ln \\pi_k)\n\\end{align}\nNote that we pretend to know $\\lambda$ and $\\pi$ here, since we are going to sample $(\\lambda, \\pi)$ and $S$ alternately.\n(We used the fact $\\sum_{k=1}^K s_{n, k} = 1$ in the above derivation)\n\nNext we consider sampling of $\\lambda$ and $\\pi$, provided we already have the value for $S$.\n\\begin{align}\np(\\lambda, \\pi | X, S) &\\sim p(X, S, \\lambda, \\pi)\\\\\n&= p(X|S, \\lambda)p(\\lambda) \\cdot p(S|\\pi)p(\\pi)\n\\end{align}\n\ni. e. $\\pi$ and $\\lambda$ can be sampled independently, given $X$ and $S$.\nFirst, look at the distribution for $\\lambda$. As in the calculation (4.40) - (4.41),\n\n$\\lambda_k \\sim Gam(\\lambda_k | \\hat{a}_k, \\hat{b}_k)$, where $(\\hat{a}_k, \\hat{b}_k) = (a, b) + (\\sum_{n=1}^N s_{n, k}x_n, \\sum_{n=1}^N s_{n, k})$.\n\nNext look at the distribution of $\\pi$. \n\n\\begin{align}\n\\ln p(S|\\pi)p(\\pi) &= \\sum_{n=1}^N \\ln Cat(s_n|\\pi) + \\ln Dir(\\pi|\\alpha)\\\\\n&= \\sum_{k=1}^K (\\sum_{n=1}^N s_{n, k} + \\alpha_k - 1) \\ln \\pi_k + const\n\\end{align}\n\nThis means $\\pi \\sim Dir(\\pi|\\hat{\\alpha})$, where $\\hat{\\alpha}_k = \\alpha_k + \\sum_{n=1}^N s_{n, k}$.\n\nWe use these sampling iteratively untill they converge to some values. (Or just iterate for fixed times)"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "function η_hat(n, λ, π)\n eta = exp.(x[n] * log.(λ) - λ + log.(π))\n return eta./sum(eta)\nend\n\nfunction sample_S(λ, π)\n S = [rand(Categorical(η_hat(n, λ, π))) for n in 1:N]\n return S\nend",
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 6,
"data": {
"text/plain": "sample_S (generic function with 1 method)"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# these update formulae look different from the text because our s is an integer category label, but they are indeed the same.\n\nfunction a_hat_b_hat(a, b, S)\n a_hat = fill(Float64(a), K)\n b_hat = fill(Float64(b), K)\n for n in 1:N\n a_hat[s[n]] += x[n]\n b_hat[s[n]] += 1.0\n end\n return a_hat, b_hat\nend\n\nfunction sample_λ(a, b, S)\n ah, bh = a_hat_b_hat(a, b, S)\n return [rand(Gamma(ah[k], 1/bh[k])) for k in 1:K]\nend",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 7,
"data": {
"text/plain": "sample_λ (generic function with 1 method)"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "function α_hat(α, s)\n α_h = α\n for n in 1:N\n α_h[s[n]] += 1\n end\n return α_h\nend\n\nfunction sample_π(α, s)\n α_h = α_hat(α, s)\n return rand(Dirichlet(α_h))\nend",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 8,
"data": {
"text/plain": "sample_π (generic function with 1 method)"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# hyperparameters. should not have siginificant influence on the result.\na, b = 1.0, 1.0\nα = ones(K)\n\n# initial values for parameters\nλ_guess = ones(K)\nπ_guess = ones(K)./K\nS_guess = ones(N)\n\nMAXITER = 1000\nε = 1e-6\n# main update loop. see how each parameters/latent variables are updated sequentially.\n\nprintln(\"K = $K, N = $N\")\nprintln(\"Ground truth: λ = $λ, π = $π\")\n\nfor i in 1:MAXITER\n λ_old, π_old = λ_guess, π_guess\n S_guess = sample_S(π_guess, λ_guess)\n λ_guess = sample_λ(a, b, S_guess)\n π_guess = sample_π(α, S_guess)\n if norm(λ_guess - λ_old) <= ε && norm(π_guess - π_old) <= ε\n println(\"loop break at i = $i\")\n break\n end\n if i == 1 || i % 100 == 0\n println(\"i = $i\")\n println(\"λ = $λ_guess\")\n println(\"π = $π_guess\")\n end\nend\n\nprintln(\"Ground truth: λ = $λ, π = $π\")\nprintln(\"Estimation: λ = $λ_guess, π = $π_guess\")\n\nmatch = 0\nfor n in 1:N\n if s[n] == S_guess[n]\n match += 1\n end\nend\nratio = match/N\nprintln(\"Correct estimation rate for latent variable S = $ratio\")\n",
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"text": "K = 3, N = 100000\nGround truth: λ = [118.00795488309349, 46.6318596876326, 64.20714993231182], π = [1.9049715232217527e-9, 0.8906382252489854, 0.109361772846043]\ni = 1\nλ = [1.5416062854595833, 46.64364864829661, 64.19416377772673]\nπ = [0.32968283922348596, 0.3366454773176629, 0.333671683458851]\ni = 100\nλ = [0.31221582105593004, 46.621504060391054, 64.26285292743405]\nπ = [0.003398820693033906, 0.9878048417067425, 0.008796337600223542]\ni = 200\nλ = [0.7230700163717231, 46.6170432955912, 64.2750632390672]\nπ = [0.0016775132464311926, 0.9939380033727077, 0.004384483380861193]\ni = 300\nλ = [2.8027150846121534, 46.61700591071832, 64.38839714959498]\nπ = [0.0011199101575549062, 0.9959662283214522, 0.0029138615209927856]\ni = 400\nλ = [0.215208580045904, 46.58420315605968, 64.38209944603673]\nπ = [0.0008335991450396957, 0.996979885612969, 0.0021865152419912894]\ni = 500\nλ = [1.0813728925676536, 46.6469048404866, 64.48467055774918]\nπ = [0.0006729509289677877, 0.9975717564736041, 0.0017552925974281097]\ni = 600\nλ = [0.017266478136415787, 46.594139893550405, 64.28214369930886]\nπ = [0.0005642613950946934, 0.9979744989531113, 0.0014612396517940955]\ni = 700\nλ = [0.4782541282648395, 46.63425140851955, 64.26781857103164]\nπ = [0.000480404135454949, 0.998265479569746, 0.001254116294799106]\ni = 800\nλ = [1.532726492740051, 46.59986702668407, 64.28038375258559]\nπ = [0.0004222340389990922, 0.9984744757680901, 0.001103290192910978]\ni = 900\nλ = [0.04792213313403554, 46.632364217867334, 64.23942421855224]\nπ = [0.00037369882348920375, 0.9986500410222977, 0.0009762601542132641]\ni = 1000\nλ = [1.4125508629319141, 46.643754281982744, 64.38006599431712]\nπ = [0.0003367484554374458, 0.9987847031776255, 0.0008785483669371093]\nGround truth: λ = [118.00795488309349, 46.6318596876326, 64.20714993231182], π = [1.9049715232217527e-9, 0.8906382252489854, 0.109361772846043]\nEstimation: λ = [1.4125508629319141, 46.643754281982744, 64.38006599431712], π = [0.0003367484554374458, 0.9987847031776255, 0.0008785483669371093]\nCorrect estimation rate for latent variable S = 0.89013\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "julia-1.3",
"display_name": "Julia 1.3.0",
"language": "julia"
},
"language_info": {
"file_extension": ".jl",
"name": "julia",
"mimetype": "application/julia",
"version": "1.3.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment