Skip to content

Instantly share code, notes, and snippets.

@takatakamanbou
Created May 31, 2020 06:05
Show Gist options
  • Save takatakamanbou/4ea8028bf0cca5059e47cc626e5b28c6 to your computer and use it in GitHub Desktop.
Save takatakamanbou/4ea8028bf0cca5059e47cc626e5b28c6 to your computer and use it in GitHub Desktop.
Vision2020-ex04.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Vision2020-ex04.ipynb",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/takatakamanbou/4ea8028bf0cca5059e47cc626e5b28c6/vision2020-ex04.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NI3mY1FGZWgV",
"colab_type": "text"
},
"source": [
"# 正規分布とその最尤推定\n",
"\n",
"![hoge](https://www-tlab.math.ryukoku.ac.jp/~takataka/course/Vision/Vision-logo-96x96.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ffaGVK2o5VWe",
"colab_type": "text"
},
"source": [
"## はじめに"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1_zfjNUjvD1T",
"colab_type": "text"
},
"source": [
"### Notebook の動かし方\n",
"\n",
"この Notebook では,上の方のセルで作った変数や関数を後のセルで使うことがあります.そのため,上の方のセルを実行せずに下の方を実行すると,エラーになることがあります.上から順に実行していきましょう.\n",
"\n",
"また,メニューの「ランタイム」から,「すべてのセルを実行」したりすることも可能です."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L2ajEMYLWlhT",
"colab_type": "text"
},
"source": [
"## 準備"
]
},
{
"cell_type": "code",
"metadata": {
"id": "E13UrMaqWnFB",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "eXNmkfe8V-Qb",
"colab_type": "text"
},
"source": [
"## 1次元の正規分布"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fcPNZXjzWL8B",
"colab_type": "text"
},
"source": [
"1次元正規分布の確率密度関数\n",
"$$ p(x) = \\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp\\left(-\\frac{(x-\\mu)^2}{2\\sigma^2}\\right) $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5t6WgZ3lZfVl",
"colab_type": "text"
},
"source": [
"上記に相当する関数 Gaussian1 を定義.自分で式を書いても大したことないが,ここでは\n",
"SciPy の関数 norm を利用: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm"
]
},
{
"cell_type": "code",
"metadata": {
"id": "3zqAFWAnZMf1",
"colab_type": "code",
"colab": {}
},
"source": [
"from scipy.stats import norm\n",
"\n",
"def Gaussian1(x, mu, sigma2):\n",
" return norm.pdf(x, loc = mu, scale = np.sqrt(sigma2))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "KVx1dHxHaXdP",
"colab_type": "text"
},
"source": [
"1次元正規分布の確率密度関数を描画"
]
},
{
"cell_type": "code",
"metadata": {
"id": "rkrs311NW0Hm",
"colab_type": "code",
"colab": {}
},
"source": [
"x = np.linspace(-5,5, num = 100)\n",
"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 6))\n",
"mu, sig2 = 0.0, 1.0\n",
"ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
"mu, sig2 = 2.0, 0.5\n",
"ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
"mu, sig2 = -2.0, 2.0\n",
"ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
"ax.set_xlim(-5, 5)\n",
"ax.legend()\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ggzK7VbkbMOY",
"colab_type": "text"
},
"source": [
"## 1次元正規分布の最尤推定"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6biAxFFOzLYW",
"colab_type": "text"
},
"source": [
"np.random.normal で正規乱数を生成し,np.mean, np.var で平均と分散を推定\n",
"- https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.normal.html\n",
"- np.mean, np.var はデフォルトでは N で割る"
]
},
{
"cell_type": "code",
"metadata": {
"id": "1_GFryADbaD8",
"colab_type": "code",
"colab": {}
},
"source": [
"# loc: 平均, scale: 標準偏差, size: データ数 or shape\n",
"N = 10\n",
"data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
"print(N, np.mean(data), np.var(data))\n",
"N = 100\n",
"data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
"print(N, np.mean(data), np.var(data))\n",
"N = 1000\n",
"data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
"print(N, np.mean(data), np.var(data))"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "dQHvAW683SLD",
"colab_type": "text"
},
"source": [
"上記と同様の実験,ただし結果をグラフに描く.N をいろいろ変えて実行し直してみよう.同じ N でも実行のたびに乱数の値が変わるので,何度か実行して様子を眺めるのがよい."
]
},
{
"cell_type": "code",
"metadata": {
"id": "zASVP9if1HYH",
"colab_type": "code",
"colab": {}
},
"source": [
"# 真のパラメータ\n",
"mu_true, sigma_true = 0.0, 1.0\n",
"print('# mu_true = {0:.2f}, sigma_true = {1:.2f}'.format(mu_true, sigma_true))\n",
"\n",
"# 上記のパラメータをもつ1次元正規分布から標本を抽出\n",
"N = 10\n",
"data = np.random.normal(loc = mu_true, scale = sigma_true, size = N)\n",
"\n",
"# 上記の標本から正規分布のパラメータを最尤推定\n",
"mu_est, sigma_est = np.mean(data), np.sqrt(np.var(data))\n",
"print('# mu_est = {0:.2f}, sigma_est = {1:.2f}'.format(mu_est, sigma_est))\n",
"\n",
"# グラフ\n",
"fig, ax = plt.subplots(2, facecolor=\"white\", figsize=(8, 12))\n",
"\n",
"# 標本のヒストグラム\n",
"ax[0].hist(data, density = True)\n",
"\n",
"# 真の正規分布と推定された正規分布\n",
"x = np.linspace(-5,5, num = 100)\n",
"#fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 6))\n",
"ax[1].plot(x, Gaussian1(x, mu_true, sigma_true**2), '-', label = 'true')\n",
"ax[1].plot(x, Gaussian1(x, mu_est, sigma_est**2), '-', label = 'estimated')\n",
"ax[1].set_xlim(-5, 5)\n",
"ax[1].legend()\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "9FilFMmAzDXG",
"colab_type": "text"
},
"source": [
"## 多変量正規分布\n",
"$D$次元正規分布の確率密度関数\n",
"$$ p(\\mathbf{x}) = \\frac{1}{\\sqrt{(2\\pi)^D |\\Sigma|}} \\exp\\left( -\\frac{1}{2}(\\mathbf{x} - \\mathbf{\\mu})^{\\top}\\Sigma^{-1}(\\mathbf{x} - \\mathbf{\\mu}) \\right)$$\n",
"$\\mathbf{\\mu}$ は平均ベクトル,$\\Sigma$ は分散共分散行列."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "emv02H_s2C0N",
"colab_type": "text"
},
"source": [
"上記に相当する関数 Gaussian を定義.自分で式を書いても大したことないが,ここでは\n",
"SciPy の関数 multivariate_normal を利用: \n",
"\n",
"https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multivariate_normal.html"
]
},
{
"cell_type": "code",
"metadata": {
"id": "XSJR18wo2YBA",
"colab_type": "code",
"colab": {}
},
"source": [
"from scipy.stats import multivariate_normal\n",
"\n",
"def GaussianM(X, mean, cov):\n",
" return multivariate_normal.pdf(X, mean = mean, cov = cov)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "odulAy2d29Cf",
"colab_type": "code",
"colab": {}
},
"source": [
"x, y = np.mgrid[-3:3:0.01, -3:3:0.01]\n",
"X = np.dstack((x, y))\n",
"print(X.shape)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Uh3ymEeO4_CH",
"colab_type": "text"
},
"source": [
"$\\mathbf{\\mu} = \\mathbf{0}, \\Sigma = I$ のとき"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Ng7mQhXQ4wLA",
"colab_type": "code",
"colab": {}
},
"source": [
"# グラフ描画用のグリッドデータの作成\n",
"xmin, xmax = -4, 4\n",
"ymin, ymax = -4, 4\n",
"x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
"X = np.dstack((x, y))\n",
"\n",
"# 平均と共分散行列\n",
"mean = np.zeros(2)\n",
"cov = np.eye(2)\n",
"\n",
"# グラフを描く\n",
"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
"ax.contourf(x, y, GaussianM(X, mean , cov))\n",
"ax.axhline(0, color='white')\n",
"ax.axvline(0, color='white')\n",
"ax.set_xlim(xmin,xmax)\n",
"ax.set_ylim(ymin,ymax)\n",
"ax.set_aspect('equal')\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "kvKsR4zI4XHu",
"colab_type": "text"
},
"source": [
"$\\mathbf{\\mu} = \\begin{pmatrix} 2 \\\\ 1 \\end{pmatrix} , \\Sigma = \\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}$ のとき"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Hpmstog-3nX-",
"colab_type": "code",
"colab": {}
},
"source": [
"# グラフ描画用のグリッドデータの作成\n",
"xmin, xmax = -3, 5\n",
"ymin, ymax = -3, 5\n",
"x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
"X = np.dstack((x, y))\n",
"\n",
"# 平均と共分散行列\n",
"mean = np.array([2, 1])\n",
"cov = np.array([[2, 0], [0, 0.5]])\n",
"\n",
"# グラフを描く\n",
"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
"ax.contourf(x, y, GaussianM(X, mean , cov))\n",
"ax.axhline(0, color='white')\n",
"ax.axvline(0, color='white')\n",
"ax.set_xlim(xmin,xmax)\n",
"ax.set_ylim(ymin,ymax)\n",
"ax.set_aspect('equal')\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "jYuL4huM43dw",
"colab_type": "code",
"colab": {}
},
"source": [
"# 回転行列\n",
"def R(theta):\n",
" c, s = np.cos(theta), np.sin(theta)\n",
" return np.array([[c, -s], [s, c]])"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "6ItF5S-xWSqb",
"colab_type": "text"
},
"source": [
"$\\mathbf{\\mu} = \\begin{pmatrix} 2 \\\\ 1 \\end{pmatrix} , \\Sigma = R(\\pi/3)\\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix} R(\\pi/3)^{\\top} = \\begin{pmatrix}\\frac{1}{2} & -\\frac{\\sqrt{3}}{2} \\\\ \\frac{\\sqrt{3}}{2} & \\frac{1}{2} \\end{pmatrix}\\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}\\begin{pmatrix}\\frac{1}{2} & \\frac{\\sqrt{3}}{2} \\\\ -\\frac{\\sqrt{3}}{2} & \\frac{1}{2} \\end{pmatrix} = \\begin{pmatrix} \\frac{7}{8} & \\frac{3}{8}\\sqrt{3} \\\\ \\frac{3}{8}\\sqrt{3} & \\frac{13}{8}\\end{pmatrix}$ のとき"
]
},
{
"cell_type": "code",
"metadata": {
"id": "Rx9ShszRWQtY",
"colab_type": "code",
"colab": {}
},
"source": [
"# グラフ描画用のグリッドデータの作成\n",
"xmin, xmax = -3, 5\n",
"ymin, ymax = -3, 5\n",
"x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
"X = np.dstack((x, y))\n",
"\n",
"# 平均と共分散行列\n",
"mean = np.array([2, 1])\n",
"cov = np.array([[7.0/8, 3*np.sqrt(3)/8], [3*np.sqrt(3)/8, 13.0/8]])\n",
"print(cov)\n",
"cov = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
"print(cov)\n",
"\n",
"# グラフを描く\n",
"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
"ax.contourf(x, y, GaussianM(X, mean , cov))\n",
"ax.axhline(0, color='white')\n",
"ax.axvline(0, color='white')\n",
"ax.set_xlim(xmin,xmax)\n",
"ax.set_ylim(ymin,ymax)\n",
"ax.set_aspect('equal')\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "92i_FYoEctjT",
"colab_type": "text"
},
"source": [
"これは,$\\mathbf{\\mu} = \\mathbf{0} , \\Sigma = \\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}$ の正規分布を $\\theta$ 回転して $\\mathbf{\\mu}$ 平行移動したものになっている."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5lBhf2pJdOc0",
"colab_type": "text"
},
"source": [
"## 多変量正規分布の最尤推定"
]
},
{
"cell_type": "code",
"metadata": {
"id": "tppDoZ3GYnwD",
"colab_type": "code",
"colab": {}
},
"source": [
"# 2次元正規分布のデータ生成(標準正規分布の乱数から自分で回転平行移動)\n",
"\n",
"def gendat(N):\n",
" \n",
" # 標準正規分布からとったサンプルを 2 x N にならべる\n",
" x = np.random.normal(loc = 0.0, scale = 1.0, size = (2, N))\n",
" # \\Sigma^{\\frac{1}{2}} をかけて共分散行列 \\Sigma のサンプルにする\n",
" y = np.diag([np.sqrt(2), np.sqrt(0.5)]) @ x\n",
" # R(theta) 回転 して mu 平行移動\n",
" z = R(np.pi/3) @ y + np.array([2.0, 1.0])[:, np.newaxis]\n",
" \n",
" return z"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "28lSPbtTkUbM",
"colab_type": "code",
"colab": {}
},
"source": [
"# 2次元正規分布のデータ生成(np.random.multivariate_normal を使う)\n",
"\n",
"def gendat2(N):\n",
" \n",
" mu = np.array([2.0, 1.0])\n",
" cov = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
" x = np.random.multivariate_normal(mu, cov, size = N) # N x 2\n",
"\n",
" return x.T # 2 x N"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "0F0UcxBAeDQr",
"colab_type": "code",
"colab": {}
},
"source": [
"# 真の平均,共分散行列\n",
"mu_true = np.array([2.0, 1.0])\n",
"cov_true = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
"print('### 真の平均と共分散行列')\n",
"print(mu_true)\n",
"print(cov_true)\n",
"print()\n",
"\n",
"for N in [10, 100, 1000, 10000]:\n",
" print('### N = {0} での平均と共分散行列の推定値'.format(N))\n",
" data = gendat2(N)\n",
" mu_est = np.mean(data, axis = 1)\n",
" x = data - mu_est[:, np.newaxis]\n",
" cov_est = x @ x.T / N\n",
" print(mu_est)\n",
" print(cov_est)\n",
" print()"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "L78k7Yu3qNe4",
"colab_type": "text"
},
"source": [
"上記と同様の実験.ただしグラフを描く.N をいろいろ変えて実行し直してみよう.同じ N でも実行のたびに乱数の値が変わるので,何度か実行して様子を眺めるのがよい."
]
},
{
"cell_type": "code",
"metadata": {
"id": "muDLP9okmlgC",
"colab_type": "code",
"colab": {}
},
"source": [
"# データの生成\n",
"N = 10\n",
"data = gendat2(N)\n",
"\n",
"# 平均,共分散行列の推定\n",
"mu_est = np.mean(data, axis = 1)\n",
"x = data - mu_est[:, np.newaxis]\n",
"cov_est = x @ x.T / N\n",
"\n",
"# グラフ描画用のグリッドデータの作成\n",
"xmin, xmax = -3, 5\n",
"ymin, ymax = -3, 5\n",
"x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
"X = np.dstack((x, y))\n",
"\n",
"# グラフを描く\n",
"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
"ax.scatter(data[0, :], data[1, :], s = 10)\n",
"ax.contour(x, y, GaussianM(X, mu_est , cov_est), colors = ['#ffa0a0', 'r'], levels = [0.05, 0.1]) # 推定された正規分布\n",
"ax.contour(x, y, GaussianM(X, mu_true , cov_true), colors = ['#a0ffa0', 'g'], levels = [0.05, 0.1]) # 真の正規分布\n",
"ax.set_aspect('equal')\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment