takatakamanbou/vision2020-ex04.ipynb Secret

## vision2020-ex04.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Vision2020-ex04.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/takatakamanbou/4ea8028bf0cca5059e47cc626e5b28c6/vision2020-ex04.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NI3mY1FGZWgV",
        "colab_type": "text"
      },
      "source": [
        "# 正規分布とその最尤推定\n",
        "\n",
        "![hoge](https://www-tlab.math.ryukoku.ac.jp/~takataka/course/Vision/Vision-logo-96x96.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ffaGVK2o5VWe",
        "colab_type": "text"
      },
      "source": [
        "## はじめに"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1_zfjNUjvD1T",
        "colab_type": "text"
      },
      "source": [
        "### Notebook の動かし方\n",
        "\n",
        "この Notebook では，上の方のセルで作った変数や関数を後のセルで使うことがあります．そのため，上の方のセルを実行せずに下の方を実行すると，エラーになることがあります．上から順に実行していきましょう．\n",
        "\n",
        "また，メニューの「ランタイム」から，「すべてのセルを実行」したりすることも可能です．"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "L2ajEMYLWlhT",
        "colab_type": "text"
      },
      "source": [
        "## 準備"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "E13UrMaqWnFB",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import numpy as np\n",
        "import matplotlib.pyplot as plt"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eXNmkfe8V-Qb",
        "colab_type": "text"
      },
      "source": [
        "## 1次元の正規分布"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fcPNZXjzWL8B",
        "colab_type": "text"
      },
      "source": [
        "1次元正規分布の確率密度関数\n",
        "$$ p(x) = \\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp\\left(-\\frac{(x-\\mu)^2}{2\\sigma^2}\\right) $$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5t6WgZ3lZfVl",
        "colab_type": "text"
      },
      "source": [
        "上記に相当する関数 Gaussian1 を定義．自分で式を書いても大したことないが，ここでは\n",
        "SciPy の関数 norm を利用: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3zqAFWAnZMf1",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from scipy.stats import norm\n",
        "\n",
        "def Gaussian1(x, mu, sigma2):\n",
        "    return norm.pdf(x, loc = mu, scale = np.sqrt(sigma2))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KVx1dHxHaXdP",
        "colab_type": "text"
      },
      "source": [
        "1次元正規分布の確率密度関数を描画"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "rkrs311NW0Hm",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x = np.linspace(-5,5, num = 100)\n",
        "fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 6))\n",
        "mu, sig2 = 0.0, 1.0\n",
        "ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
        "mu, sig2 = 2.0, 0.5\n",
        "ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
        "mu, sig2 = -2.0, 2.0\n",
        "ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
        "ax.set_xlim(-5, 5)\n",
        "ax.legend()\n",
        "plt.show()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ggzK7VbkbMOY",
        "colab_type": "text"
      },
      "source": [
        "## 1次元正規分布の最尤推定"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6biAxFFOzLYW",
        "colab_type": "text"
      },
      "source": [
        "np.random.normal で正規乱数を生成し，np.mean, np.var で平均と分散を推定\n",
        "- https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.normal.html\n",
        "- np.mean, np.var はデフォルトでは N で割る"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1_GFryADbaD8",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# loc: 平均, scale: 標準偏差, size: データ数 or shape\n",
        "N = 10\n",
        "data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
        "print(N, np.mean(data), np.var(data))\n",
        "N = 100\n",
        "data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
        "print(N, np.mean(data), np.var(data))\n",
        "N = 1000\n",
        "data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
        "print(N, np.mean(data), np.var(data))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dQHvAW683SLD",
        "colab_type": "text"
      },
      "source": [
        "上記と同様の実験，ただし結果をグラフに描く．N をいろいろ変えて実行し直してみよう．同じ N でも実行のたびに乱数の値が変わるので，何度か実行して様子を眺めるのがよい．"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zASVP9if1HYH",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# 真のパラメータ\n",
        "mu_true, sigma_true = 0.0, 1.0\n",
        "print('# mu_true = {0:.2f}, sigma_true = {1:.2f}'.format(mu_true, sigma_true))\n",
        "\n",
        "# 上記のパラメータをもつ1次元正規分布から標本を抽出\n",
        "N = 10\n",
        "data = np.random.normal(loc = mu_true, scale = sigma_true, size = N)\n",
        "\n",
        "# 上記の標本から正規分布のパラメータを最尤推定\n",
        "mu_est, sigma_est = np.mean(data), np.sqrt(np.var(data))\n",
        "print('# mu_est = {0:.2f}, sigma_est = {1:.2f}'.format(mu_est, sigma_est))\n",
        "\n",
        "# グラフ\n",
        "fig, ax = plt.subplots(2, facecolor=\"white\", figsize=(8, 12))\n",
        "\n",
        "# 標本のヒストグラム\n",
        "ax[0].hist(data, density = True)\n",
        "\n",
        "# 真の正規分布と推定された正規分布\n",
        "x = np.linspace(-5,5, num = 100)\n",
        "#fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 6))\n",
        "ax[1].plot(x, Gaussian1(x, mu_true, sigma_true**2), '-', label = 'true')\n",
        "ax[1].plot(x, Gaussian1(x, mu_est, sigma_est**2), '-', label = 'estimated')\n",
        "ax[1].set_xlim(-5, 5)\n",
        "ax[1].legend()\n",
        "plt.show()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9FilFMmAzDXG",
        "colab_type": "text"
      },
      "source": [
        "## 多変量正規分布\n",
        "$D$次元正規分布の確率密度関数\n",
        "$$ p(\\mathbf{x}) = \\frac{1}{\\sqrt{(2\\pi)^D |\\Sigma|}} \\exp\\left( -\\frac{1}{2}(\\mathbf{x} - \\mathbf{\\mu})^{\\top}\\Sigma^{-1}(\\mathbf{x} - \\mathbf{\\mu}) \\right)$$\n",
        "$\\mathbf{\\mu}$ は平均ベクトル，$\\Sigma$ は分散共分散行列．"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "emv02H_s2C0N",
        "colab_type": "text"
      },
      "source": [
        "上記に相当する関数 Gaussian を定義．自分で式を書いても大したことないが，ここでは\n",
        "SciPy の関数 multivariate_normal を利用: \n",
        "\n",
        "https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multivariate_normal.html"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XSJR18wo2YBA",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "from scipy.stats import multivariate_normal\n",
        "\n",
        "def GaussianM(X, mean, cov):\n",
        "    return multivariate_normal.pdf(X, mean = mean, cov = cov)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "odulAy2d29Cf",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x, y = np.mgrid[-3:3:0.01, -3:3:0.01]\n",
        "X = np.dstack((x, y))\n",
        "print(X.shape)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Uh3ymEeO4_CH",
        "colab_type": "text"
      },
      "source": [
        "$\\mathbf{\\mu} = \\mathbf{0}, \\Sigma = I$ のとき"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Ng7mQhXQ4wLA",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# グラフ描画用のグリッドデータの作成\n",
        "xmin, xmax = -4, 4\n",
        "ymin, ymax = -4, 4\n",
        "x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
        "X = np.dstack((x, y))\n",
        "\n",
        "# 平均と共分散行列\n",
        "mean = np.zeros(2)\n",
        "cov = np.eye(2)\n",
        "\n",
        "# グラフを描く\n",
        "fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
        "ax.contourf(x, y, GaussianM(X, mean , cov))\n",
        "ax.axhline(0, color='white')\n",
        "ax.axvline(0, color='white')\n",
        "ax.set_xlim(xmin,xmax)\n",
        "ax.set_ylim(ymin,ymax)\n",
        "ax.set_aspect('equal')\n",
        "plt.show()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kvKsR4zI4XHu",
        "colab_type": "text"
      },
      "source": [
        "$\\mathbf{\\mu} = \\begin{pmatrix} 2 \\\\ 1 \\end{pmatrix} , \\Sigma = \\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}$ のとき"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Hpmstog-3nX-",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# グラフ描画用のグリッドデータの作成\n",
        "xmin, xmax = -3, 5\n",
        "ymin, ymax = -3, 5\n",
        "x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
        "X = np.dstack((x, y))\n",
        "\n",
        "# 平均と共分散行列\n",
        "mean = np.array([2, 1])\n",
        "cov = np.array([[2, 0], [0, 0.5]])\n",
        "\n",
        "# グラフを描く\n",
        "fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
        "ax.contourf(x, y, GaussianM(X, mean , cov))\n",
        "ax.axhline(0, color='white')\n",
        "ax.axvline(0, color='white')\n",
        "ax.set_xlim(xmin,xmax)\n",
        "ax.set_ylim(ymin,ymax)\n",
        "ax.set_aspect('equal')\n",
        "plt.show()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jYuL4huM43dw",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# 回転行列\n",
        "def R(theta):\n",
        "    c, s = np.cos(theta), np.sin(theta)\n",
        "    return np.array([[c, -s], [s, c]])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6ItF5S-xWSqb",
        "colab_type": "text"
      },
      "source": [
        "$\\mathbf{\\mu} = \\begin{pmatrix} 2 \\\\ 1 \\end{pmatrix}  , \\Sigma = R(\\pi/3)\\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix} R(\\pi/3)^{\\top} = \\begin{pmatrix}\\frac{1}{2} & -\\frac{\\sqrt{3}}{2} \\\\ \\frac{\\sqrt{3}}{2} & \\frac{1}{2} \\end{pmatrix}\\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}\\begin{pmatrix}\\frac{1}{2} & \\frac{\\sqrt{3}}{2} \\\\ -\\frac{\\sqrt{3}}{2} & \\frac{1}{2} \\end{pmatrix} = \\begin{pmatrix} \\frac{7}{8} & \\frac{3}{8}\\sqrt{3} \\\\ \\frac{3}{8}\\sqrt{3} & \\frac{13}{8}\\end{pmatrix}$ のとき"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Rx9ShszRWQtY",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# グラフ描画用のグリッドデータの作成\n",
        "xmin, xmax = -3, 5\n",
        "ymin, ymax = -3, 5\n",
        "x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
        "X = np.dstack((x, y))\n",
        "\n",
        "# 平均と共分散行列\n",
        "mean = np.array([2, 1])\n",
        "cov = np.array([[7.0/8, 3*np.sqrt(3)/8], [3*np.sqrt(3)/8, 13.0/8]])\n",
        "print(cov)\n",
        "cov = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
        "print(cov)\n",
        "\n",
        "# グラフを描く\n",
        "fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
        "ax.contourf(x, y, GaussianM(X, mean , cov))\n",
        "ax.axhline(0, color='white')\n",
        "ax.axvline(0, color='white')\n",
        "ax.set_xlim(xmin,xmax)\n",
        "ax.set_ylim(ymin,ymax)\n",
        "ax.set_aspect('equal')\n",
        "plt.show()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "92i_FYoEctjT",
        "colab_type": "text"
      },
      "source": [
        "これは，$\\mathbf{\\mu} = \\mathbf{0} , \\Sigma = \\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}$ の正規分布を $\\theta$ 回転して $\\mathbf{\\mu}$ 平行移動したものになっている．"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5lBhf2pJdOc0",
        "colab_type": "text"
      },
      "source": [
        "## 多変量正規分布の最尤推定"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tppDoZ3GYnwD",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# 2次元正規分布のデータ生成（標準正規分布の乱数から自分で回転平行移動）\n",
        "\n",
        "def gendat(N):\n",
        "        \n",
        "    # 標準正規分布からとったサンプルを 2 x N にならべる\n",
        "    x = np.random.normal(loc = 0.0, scale = 1.0, size = (2, N))\n",
        "    # \\Sigma^{\\frac{1}{2}} をかけて共分散行列 \\Sigma のサンプルにする\n",
        "    y = np.diag([np.sqrt(2), np.sqrt(0.5)]) @ x\n",
        "    # R(theta) 回転 して mu 平行移動\n",
        "    z =  R(np.pi/3) @ y + np.array([2.0, 1.0])[:, np.newaxis]\n",
        "    \n",
        "    return z"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "28lSPbtTkUbM",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# 2次元正規分布のデータ生成（np.random.multivariate_normal を使う）\n",
        "\n",
        "def gendat2(N):\n",
        "    \n",
        "    mu = np.array([2.0, 1.0])\n",
        "    cov = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
        "    x = np.random.multivariate_normal(mu, cov, size = N)  #  N x 2\n",
        "\n",
        "    return x.T  # 2 x N"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0F0UcxBAeDQr",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# 真の平均，共分散行列\n",
        "mu_true = np.array([2.0, 1.0])\n",
        "cov_true = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
        "print('### 真の平均と共分散行列')\n",
        "print(mu_true)\n",
        "print(cov_true)\n",
        "print()\n",
        "\n",
        "for N in [10, 100, 1000, 10000]:\n",
        "    print('### N = {0} での平均と共分散行列の推定値'.format(N))\n",
        "    data = gendat2(N)\n",
        "    mu_est = np.mean(data, axis = 1)\n",
        "    x = data - mu_est[:, np.newaxis]\n",
        "    cov_est = x @ x.T / N\n",
        "    print(mu_est)\n",
        "    print(cov_est)\n",
        "    print()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "L78k7Yu3qNe4",
        "colab_type": "text"
      },
      "source": [
        "上記と同様の実験．ただしグラフを描く．N をいろいろ変えて実行し直してみよう．同じ N でも実行のたびに乱数の値が変わるので，何度か実行して様子を眺めるのがよい．"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "muDLP9okmlgC",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "#  データの生成\n",
        "N = 10\n",
        "data = gendat2(N)\n",
        "\n",
        "# 平均，共分散行列の推定\n",
        "mu_est = np.mean(data, axis = 1)\n",
        "x = data - mu_est[:, np.newaxis]\n",
        "cov_est = x @ x.T / N\n",
        "\n",
        "# グラフ描画用のグリッドデータの作成\n",
        "xmin, xmax = -3, 5\n",
        "ymin, ymax = -3, 5\n",
        "x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
        "X = np.dstack((x, y))\n",
        "\n",
        "# グラフを描く\n",
        "fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
        "ax.scatter(data[0, :], data[1, :], s = 10)\n",
        "ax.contour(x, y, GaussianM(X, mu_est , cov_est), colors = ['#ffa0a0', 'r'], levels = [0.05, 0.1])  # 推定された正規分布\n",
        "ax.contour(x, y, GaussianM(X, mu_true , cov_true), colors = ['#a0ffa0', 'g'], levels = [0.05, 0.1])  # 真の正規分布\n",
        "ax.set_aspect('equal')\n",
        "plt.show()"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "Vision2020-ex04.ipynb",
	"provenance": [],
	"collapsed_sections": [],
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/takatakamanbou/4ea8028bf0cca5059e47cc626e5b28c6/vision2020-ex04.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "NI3mY1FGZWgV",
	"colab_type": "text"
	},
	"source": [
	"# 正規分布とその最尤推定\n",
	"\n",
	"![hoge](https://www-tlab.math.ryukoku.ac.jp/~takataka/course/Vision/Vision-logo-96x96.png)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "ffaGVK2o5VWe",
	"colab_type": "text"
	},
	"source": [
	"## はじめに"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "1_zfjNUjvD1T",
	"colab_type": "text"
	},
	"source": [
	"### Notebook の動かし方\n",
	"\n",
	"この Notebook では，上の方のセルで作った変数や関数を後のセルで使うことがあります．そのため，上の方のセルを実行せずに下の方を実行すると，エラーになることがあります．上から順に実行していきましょう．\n",
	"\n",
	"また，メニューの「ランタイム」から，「すべてのセルを実行」したりすることも可能です．"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "L2ajEMYLWlhT",
	"colab_type": "text"
	},
	"source": [
	"## 準備"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "E13UrMaqWnFB",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"import numpy as np\n",
	"import matplotlib.pyplot as plt"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "eXNmkfe8V-Qb",
	"colab_type": "text"
	},
	"source": [
	"## 1次元の正規分布"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "fcPNZXjzWL8B",
	"colab_type": "text"
	},
	"source": [
	"1次元正規分布の確率密度関数\n",
	"$$ p(x) = \\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp\\left(-\\frac{(x-\\mu)^2}{2\\sigma^2}\\right) $$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "5t6WgZ3lZfVl",
	"colab_type": "text"
	},
	"source": [
	"上記に相当する関数 Gaussian1 を定義．自分で式を書いても大したことないが，ここでは\n",
	"SciPy の関数 norm を利用: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "3zqAFWAnZMf1",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"from scipy.stats import norm\n",
	"\n",
	"def Gaussian1(x, mu, sigma2):\n",
	" return norm.pdf(x, loc = mu, scale = np.sqrt(sigma2))"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "KVx1dHxHaXdP",
	"colab_type": "text"
	},
	"source": [
	"1次元正規分布の確率密度関数を描画"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "rkrs311NW0Hm",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"x = np.linspace(-5,5, num = 100)\n",
	"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 6))\n",
	"mu, sig2 = 0.0, 1.0\n",
	"ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
	"mu, sig2 = 2.0, 0.5\n",
	"ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
	"mu, sig2 = -2.0, 2.0\n",
	"ax.plot(x, Gaussian1(x, mu, sig2), '-', label = '$\\mu = {0}, \\sigma^2 = {1}$'.format(mu, sig2))\n",
	"ax.set_xlim(-5, 5)\n",
	"ax.legend()\n",
	"plt.show()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "ggzK7VbkbMOY",
	"colab_type": "text"
	},
	"source": [
	"## 1次元正規分布の最尤推定"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "6biAxFFOzLYW",
	"colab_type": "text"
	},
	"source": [
	"np.random.normal で正規乱数を生成し，np.mean, np.var で平均と分散を推定\n",
	"- https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.random.normal.html\n",
	"- np.mean, np.var はデフォルトでは N で割る"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "1_GFryADbaD8",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# loc: 平均, scale: 標準偏差, size: データ数 or shape\n",
	"N = 10\n",
	"data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
	"print(N, np.mean(data), np.var(data))\n",
	"N = 100\n",
	"data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
	"print(N, np.mean(data), np.var(data))\n",
	"N = 1000\n",
	"data = np.random.normal(loc = 0.0, scale = 1.0, size = N)\n",
	"print(N, np.mean(data), np.var(data))"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "dQHvAW683SLD",
	"colab_type": "text"
	},
	"source": [
	"上記と同様の実験，ただし結果をグラフに描く．N をいろいろ変えて実行し直してみよう．同じ N でも実行のたびに乱数の値が変わるので，何度か実行して様子を眺めるのがよい．"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "zASVP9if1HYH",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# 真のパラメータ\n",
	"mu_true, sigma_true = 0.0, 1.0\n",
	"print('# mu_true = {0:.2f}, sigma_true = {1:.2f}'.format(mu_true, sigma_true))\n",
	"\n",
	"# 上記のパラメータをもつ1次元正規分布から標本を抽出\n",
	"N = 10\n",
	"data = np.random.normal(loc = mu_true, scale = sigma_true, size = N)\n",
	"\n",
	"# 上記の標本から正規分布のパラメータを最尤推定\n",
	"mu_est, sigma_est = np.mean(data), np.sqrt(np.var(data))\n",
	"print('# mu_est = {0:.2f}, sigma_est = {1:.2f}'.format(mu_est, sigma_est))\n",
	"\n",
	"# グラフ\n",
	"fig, ax = plt.subplots(2, facecolor=\"white\", figsize=(8, 12))\n",
	"\n",
	"# 標本のヒストグラム\n",
	"ax[0].hist(data, density = True)\n",
	"\n",
	"# 真の正規分布と推定された正規分布\n",
	"x = np.linspace(-5,5, num = 100)\n",
	"#fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 6))\n",
	"ax[1].plot(x, Gaussian1(x, mu_true, sigma_true**2), '-', label = 'true')\n",
	"ax[1].plot(x, Gaussian1(x, mu_est, sigma_est**2), '-', label = 'estimated')\n",
	"ax[1].set_xlim(-5, 5)\n",
	"ax[1].legend()\n",
	"plt.show()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "9FilFMmAzDXG",
	"colab_type": "text"
	},
	"source": [
	"## 多変量正規分布\n",
	"$D$次元正規分布の確率密度関数\n",
	"$$ p(\\mathbf{x}) = \\frac{1}{\\sqrt{(2\\pi)^D \|\\Sigma\|}} \\exp\\left( -\\frac{1}{2}(\\mathbf{x} - \\mathbf{\\mu})^{\\top}\\Sigma^{-1}(\\mathbf{x} - \\mathbf{\\mu}) \\right)$$\n",
	"$\\mathbf{\\mu}$ は平均ベクトル，$\\Sigma$ は分散共分散行列．"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "emv02H_s2C0N",
	"colab_type": "text"
	},
	"source": [
	"上記に相当する関数 Gaussian を定義．自分で式を書いても大したことないが，ここでは\n",
	"SciPy の関数 multivariate_normal を利用: \n",
	"\n",
	"https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multivariate_normal.html"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "XSJR18wo2YBA",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"from scipy.stats import multivariate_normal\n",
	"\n",
	"def GaussianM(X, mean, cov):\n",
	" return multivariate_normal.pdf(X, mean = mean, cov = cov)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "odulAy2d29Cf",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"x, y = np.mgrid[-3:3:0.01, -3:3:0.01]\n",
	"X = np.dstack((x, y))\n",
	"print(X.shape)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "Uh3ymEeO4_CH",
	"colab_type": "text"
	},
	"source": [
	"$\\mathbf{\\mu} = \\mathbf{0}, \\Sigma = I$ のとき"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "Ng7mQhXQ4wLA",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# グラフ描画用のグリッドデータの作成\n",
	"xmin, xmax = -4, 4\n",
	"ymin, ymax = -4, 4\n",
	"x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
	"X = np.dstack((x, y))\n",
	"\n",
	"# 平均と共分散行列\n",
	"mean = np.zeros(2)\n",
	"cov = np.eye(2)\n",
	"\n",
	"# グラフを描く\n",
	"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
	"ax.contourf(x, y, GaussianM(X, mean , cov))\n",
	"ax.axhline(0, color='white')\n",
	"ax.axvline(0, color='white')\n",
	"ax.set_xlim(xmin,xmax)\n",
	"ax.set_ylim(ymin,ymax)\n",
	"ax.set_aspect('equal')\n",
	"plt.show()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "kvKsR4zI4XHu",
	"colab_type": "text"
	},
	"source": [
	"$\\mathbf{\\mu} = \\begin{pmatrix} 2 \\\\ 1 \\end{pmatrix} , \\Sigma = \\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}$ のとき"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "Hpmstog-3nX-",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# グラフ描画用のグリッドデータの作成\n",
	"xmin, xmax = -3, 5\n",
	"ymin, ymax = -3, 5\n",
	"x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
	"X = np.dstack((x, y))\n",
	"\n",
	"# 平均と共分散行列\n",
	"mean = np.array([2, 1])\n",
	"cov = np.array([[2, 0], [0, 0.5]])\n",
	"\n",
	"# グラフを描く\n",
	"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
	"ax.contourf(x, y, GaussianM(X, mean , cov))\n",
	"ax.axhline(0, color='white')\n",
	"ax.axvline(0, color='white')\n",
	"ax.set_xlim(xmin,xmax)\n",
	"ax.set_ylim(ymin,ymax)\n",
	"ax.set_aspect('equal')\n",
	"plt.show()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "jYuL4huM43dw",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# 回転行列\n",
	"def R(theta):\n",
	" c, s = np.cos(theta), np.sin(theta)\n",
	" return np.array([[c, -s], [s, c]])"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "6ItF5S-xWSqb",
	"colab_type": "text"
	},
	"source": [
	"$\\mathbf{\\mu} = \\begin{pmatrix} 2 \\\\ 1 \\end{pmatrix} , \\Sigma = R(\\pi/3)\\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix} R(\\pi/3)^{\\top} = \\begin{pmatrix}\\frac{1}{2} & -\\frac{\\sqrt{3}}{2} \\\\ \\frac{\\sqrt{3}}{2} & \\frac{1}{2} \\end{pmatrix}\\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}\\begin{pmatrix}\\frac{1}{2} & \\frac{\\sqrt{3}}{2} \\\\ -\\frac{\\sqrt{3}}{2} & \\frac{1}{2} \\end{pmatrix} = \\begin{pmatrix} \\frac{7}{8} & \\frac{3}{8}\\sqrt{3} \\\\ \\frac{3}{8}\\sqrt{3} & \\frac{13}{8}\\end{pmatrix}$ のとき"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "Rx9ShszRWQtY",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# グラフ描画用のグリッドデータの作成\n",
	"xmin, xmax = -3, 5\n",
	"ymin, ymax = -3, 5\n",
	"x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
	"X = np.dstack((x, y))\n",
	"\n",
	"# 平均と共分散行列\n",
	"mean = np.array([2, 1])\n",
	"cov = np.array([[7.0/8, 3np.sqrt(3)/8], [3np.sqrt(3)/8, 13.0/8]])\n",
	"print(cov)\n",
	"cov = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
	"print(cov)\n",
	"\n",
	"# グラフを描く\n",
	"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
	"ax.contourf(x, y, GaussianM(X, mean , cov))\n",
	"ax.axhline(0, color='white')\n",
	"ax.axvline(0, color='white')\n",
	"ax.set_xlim(xmin,xmax)\n",
	"ax.set_ylim(ymin,ymax)\n",
	"ax.set_aspect('equal')\n",
	"plt.show()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "92i_FYoEctjT",
	"colab_type": "text"
	},
	"source": [
	"これは，$\\mathbf{\\mu} = \\mathbf{0} , \\Sigma = \\begin{pmatrix} 2 & 0 \\\\ 0 & \\frac{1}{2} \\end{pmatrix}$ の正規分布を $\\theta$ 回転して $\\mathbf{\\mu}$ 平行移動したものになっている．"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "5lBhf2pJdOc0",
	"colab_type": "text"
	},
	"source": [
	"## 多変量正規分布の最尤推定"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "tppDoZ3GYnwD",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# 2次元正規分布のデータ生成（標準正規分布の乱数から自分で回転平行移動）\n",
	"\n",
	"def gendat(N):\n",
	" \n",
	" # 標準正規分布からとったサンプルを 2 x N にならべる\n",
	" x = np.random.normal(loc = 0.0, scale = 1.0, size = (2, N))\n",
	" # \\Sigma^{\\frac{1}{2}} をかけて共分散行列 \\Sigma のサンプルにする\n",
	" y = np.diag([np.sqrt(2), np.sqrt(0.5)]) @ x\n",
	" # R(theta) 回転して mu 平行移動\n",
	" z = R(np.pi/3) @ y + np.array([2.0, 1.0])[:, np.newaxis]\n",
	" \n",
	" return z"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "28lSPbtTkUbM",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# 2次元正規分布のデータ生成（np.random.multivariate_normal を使う）\n",
	"\n",
	"def gendat2(N):\n",
	" \n",
	" mu = np.array([2.0, 1.0])\n",
	" cov = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
	" x = np.random.multivariate_normal(mu, cov, size = N) # N x 2\n",
	"\n",
	" return x.T # 2 x N"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "0F0UcxBAeDQr",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# 真の平均，共分散行列\n",
	"mu_true = np.array([2.0, 1.0])\n",
	"cov_true = R(np.pi/3) @ np.diag([2, 0.5]) @ R(np.pi/3).T\n",
	"print('### 真の平均と共分散行列')\n",
	"print(mu_true)\n",
	"print(cov_true)\n",
	"print()\n",
	"\n",
	"for N in [10, 100, 1000, 10000]:\n",
	" print('### N = {0} での平均と共分散行列の推定値'.format(N))\n",
	" data = gendat2(N)\n",
	" mu_est = np.mean(data, axis = 1)\n",
	" x = data - mu_est[:, np.newaxis]\n",
	" cov_est = x @ x.T / N\n",
	" print(mu_est)\n",
	" print(cov_est)\n",
	" print()"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "L78k7Yu3qNe4",
	"colab_type": "text"
	},
	"source": [
	"上記と同様の実験．ただしグラフを描く．N をいろいろ変えて実行し直してみよう．同じ N でも実行のたびに乱数の値が変わるので，何度か実行して様子を眺めるのがよい．"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "muDLP9okmlgC",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"# データの生成\n",
	"N = 10\n",
	"data = gendat2(N)\n",
	"\n",
	"# 平均，共分散行列の推定\n",
	"mu_est = np.mean(data, axis = 1)\n",
	"x = data - mu_est[:, np.newaxis]\n",
	"cov_est = x @ x.T / N\n",
	"\n",
	"# グラフ描画用のグリッドデータの作成\n",
	"xmin, xmax = -3, 5\n",
	"ymin, ymax = -3, 5\n",
	"x, y = np.mgrid[xmin:xmax:0.02, ymin:ymax:0.02]\n",
	"X = np.dstack((x, y))\n",
	"\n",
	"# グラフを描く\n",
	"fig, ax = plt.subplots(facecolor=\"white\", figsize=(8, 8))\n",
	"ax.scatter(data[0, :], data[1, :], s = 10)\n",
	"ax.contour(x, y, GaussianM(X, mu_est , cov_est), colors = ['#ffa0a0', 'r'], levels = [0.05, 0.1]) # 推定された正規分布\n",
	"ax.contour(x, y, GaussianM(X, mu_true , cov_true), colors = ['#a0ffa0', 'g'], levels = [0.05, 0.1]) # 真の正規分布\n",
	"ax.set_aspect('equal')\n",
	"plt.show()"
	],
	"execution_count": 0,
	"outputs": []
	}
	]
	}