Skip to content

Instantly share code, notes, and snippets.

@alinaselega
Last active September 26, 2019 21:27
Show Gist options
  • Save alinaselega/594450cef2bffd7ade8a6967c492b526 to your computer and use it in GitHub Desktop.
Save alinaselega/594450cef2bffd7ade8a6967c492b526 to your computer and use it in GitHub Desktop.
This notebook numerically checks the form for the marginal multinomial likelihood
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "multinomial_pmf_numerics.ipynb",
"provenance": [],
"collapsed_sections": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "9d8yzppwKINZ",
"colab_type": "text"
},
"source": [
"This notebook numerically checks the value for a marginal multinomial PMF.\n",
"\n",
"If $(X_1, ..., X_m)$ are a random variables counting the number of successes of each of the $m$ categories with probabilities of success $(p_1, ..., p_m)$ out of a total of $n$ trials, then the marginal likelihood of the subset of the random variables $(X_1, ..., X_k)$, where $k < m$ is given by:\n",
"\n",
"\\begin{eqnarray}\n",
" p(X_1 = x_1, ..., X_k = x_k) = \\frac{n!}{x_1! ... x_k! (n-z_k)!} p_1^{x_1} ... p_k^{x_k} q_k^{n-z_k} \\\\\n",
" z_k = x_1 + ... + x_k \\\\ \n",
" q_k = 1 - p_1 - ... - p_k\n",
"\\end{eqnarray}\n",
"\n",
"The general form of this marginal likelihood is given by:\n",
"\n",
"\\begin{eqnarray}\n",
" p(X_1 = x_1, ..., X_k = x_k) = \\sum_{x_{k+1}} ... \\sum_{x_m} p(X_1 = x_1, ..., X_m = x_m) = \\sum_{x_{k+1}} ... \\sum_{x_m} \\frac{n!}{x_1! ... x_m!} p_1^{x_1} ... p_k^{x_m}\n",
"\\end{eqnarray}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wON6XvmXbxwn",
"colab_type": "text"
},
"source": [
"Generate some multinomial observations:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "6P0zgo6LJNLu",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "2c5246a9-bdef-4f06-b94e-5a9636578b70"
},
"source": [
"import scipy.stats as st\n",
"import numpy as np\n",
"\n",
"G = 7\n",
"N = 21\n",
"\n",
"e = np.random.normal(size=G)\n",
"p = np.exp(e) / sum(np.exp(e))\n",
"\n",
"rv = st.multinomial(N, p)\n",
"obs = rv.rvs()\n",
"print(obs)"
],
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"text": [
"[[5 1 7 5 1 2 0]]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zOsumzhWb1oP",
"colab_type": "text"
},
"source": [
"This is the full log PMF:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "fJpYvjIuLntZ",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "c0e4c315-f7b5-41cf-a498-02e79b974b4f"
},
"source": [
"rv.logpmf(obs)"
],
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([-8.09912776])"
]
},
"metadata": {
"tags": []
},
"execution_count": 20
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2MUH0qvjb41I",
"colab_type": "text"
},
"source": [
"Here is the subset we care about; isolate corresponding observations:"
]
},
{
"cell_type": "code",
"metadata": {
"id": "JOt8CZWBMZV3",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "560fd719-9543-4655-80e3-7c1ba4f36797"
},
"source": [
"# Evaluate the marginal likelihood of the first 2 count variables\n",
"k = 2\n",
"subset = range(k)\n",
"\n",
"obs = obs[0][:]\n",
"print(obs)"
],
"execution_count": 21,
"outputs": [
{
"output_type": "stream",
"text": [
"[5 1 7 5 1 2 0]\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "8khJ29s0R4HA",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "8fbe4a4d-ea6a-4e7a-986e-fbb056af19db"
},
"source": [
"subset_sum = sum(obs[0:k])\n",
"subset_sum"
],
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"6"
]
},
"metadata": {
"tags": []
},
"execution_count": 22
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1SMfoG1Mb94_",
"colab_type": "text"
},
"source": [
"Make an array with every permutation of remaining categories' counts; note that the $\\sum_{i=1}^{m} x_i = n$, so the possible values for the remaining categories counts that we must sum over are bounded by $n - z_k$."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ivg-k4CVR6i4",
"colab_type": "code",
"colab": {}
},
"source": [
"from sympy.utilities.iterables import variations\n",
"\n",
"perms = list(variations(range(N - subset_sum + 1), G-k-1, True))\n",
"#print(perms)\n",
"\n",
"perms_array = np.zeros((len(perms), G - k))\n",
"\n",
"for i in range(len(perms)):\n",
" perms_array[i, 0:(G-k-1)] = perms[i]\n",
" sum_item = sum(perms[i])\n",
" perms_array[i, G-k-1] = max(N - subset_sum - sum_item, 0)\n",
" if (sum(perms_array[i,:]) > N - subset_sum):\n",
" perms_array[i,:] = np.nan\n",
" \n",
"mask = np.where(np.isnan(perms_array[:,0]))\n",
"perms_array = np.delete(perms_array, mask[0], 0)\n",
"#perms_array\n"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZT3dR0zicc9_",
"colab_type": "text"
},
"source": [
"Compute the exact log PMF."
]
},
{
"cell_type": "code",
"metadata": {
"id": "D6Do9wSyZYcn",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "6e4cd7c8-107b-49a4-8a2c-28af958098ae"
},
"source": [
"exact_pmf = 0\n",
"\n",
"values = np.zeros(G)\n",
"values[0:k] = obs[0:k]\n",
"\n",
"for item in perms_array:\n",
" values[k:G] = item\n",
" exact_pmf = exact_pmf + rv.pmf(values)\n",
" \n",
"np.log(exact_pmf)"
],
"execution_count": 24,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"-2.61695788104343"
]
},
"metadata": {
"tags": []
},
"execution_count": 24
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1ZrN-CMacgaA",
"colab_type": "text"
},
"source": [
"Now compute it with my implementation for evaluating the multinomial term and the marginal log-likelihood."
]
},
{
"cell_type": "code",
"metadata": {
"id": "ZBbjdwCOawC3",
"colab_type": "code",
"colab": {}
},
"source": [
"from scipy.special import binom\n",
"from scipy import special as special\n",
"\n",
"# Compute a multinomial coefficient as a product of binomials\n",
"def log_multinomial_coef(ks):\n",
" if len(ks) == 1:\n",
" return 0\n",
" return np.log(binom(sum(ks), ks[-1])) + log_multinomial_coef(ks[:-1])\n",
"\n",
"\n",
"# Compute a log marginal multinomial coefficient\n",
"def log_marginal_multinomial_coef(ys, n):\n",
" in_segment_counts = sum(ys)\n",
" component1 = log_multinomial_coef(ys)\n",
" \n",
" num = np.zeros(n - in_segment_counts)\n",
" denom = np.zeros(n - in_segment_counts)\n",
" for i in range(1, n - in_segment_counts + 1):\n",
" num[i-1] = in_segment_counts + i\n",
" denom[i-1] = i\n",
"\n",
" new_num = [x for x in num if x not in denom]\n",
" new_denom = [x for x in denom if x not in num]\n",
" \n",
" if((len(new_num) == 0) and (len(new_denom) == 0)):\n",
" component2 = 0\n",
" else:\n",
" component2 = sum(np.log(new_num)) - sum(np.log(new_denom))\n",
" \n",
" return component1 + component2"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "cD-n9IXDa7W3",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 71
},
"outputId": "08042636-a248-42c8-adb6-416dc10bead2"
},
"source": [
"y_marg = np.zeros(k + 1)\n",
"y_marg[0:k] = obs[0:k]\n",
"y_marg[-1] = N - subset_sum\n",
"\n",
"p_marg = np.zeros(k + 1)\n",
"p_marg[0:k] = p[0:k]\n",
"p_marg[-1] = 1 - sum(p[0:k])\n",
"\n",
"print(y_marg)\n",
"print(p_marg)\n",
"\n",
"# Log marginal loglik\n",
"loglik = log_marginal_multinomial_coef(obs[0:k], N) + sum(y_marg * np.log(p_marg))\n",
"loglik\n"
],
"execution_count": 26,
"outputs": [
{
"output_type": "stream",
"text": [
"[ 5. 1. 15.]\n",
"[0.23912625 0.03520192 0.72567183]\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"-2.6169578810434277"
]
},
"metadata": {
"tags": []
},
"execution_count": 26
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0xB9gmric0cv",
"colab_type": "text"
},
"source": [
"This is computing the multinomial factor directly as the values are small enough."
]
},
{
"cell_type": "code",
"metadata": {
"id": "E7k6USpecQ9o",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"outputId": "034f629d-dee1-47a6-80e8-37bf7bb4bd92"
},
"source": [
"#Let's directly compute that factor\n",
"\n",
"from scipy.special import factorial\n",
"lik = factorial(N) / (np.prod(factorial(obs[0:k])) * factorial(N - subset_sum)) * np.prod(np.power(p_marg, y_marg))\n",
"np.log(lik)"
],
"execution_count": 27,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"-2.616957881043428"
]
},
"metadata": {
"tags": []
},
"execution_count": 27
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bdsW5ei4c9iP",
"colab_type": "text"
},
"source": [
"All values match!"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment