Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save NickRSearcy/47b84d8f42bcf1f50ca6 to your computer and use it in GitHub Desktop.
Save NickRSearcy/47b84d8f42bcf1f50ca6 to your computer and use it in GitHub Desktop.
Murphy Machine Learning - Chapter 5 Problem 8
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using matplotlib backend: MacOSX\n",
"Populating the interactive namespace from numpy and matplotlib\n",
"IPython console for SymPy 0.7.6.1 (Python 3.5.0-64-bit) (ground types: python)\n",
"\n",
"These commands were executed:\n",
">>> from __future__ import division\n",
">>> from sympy import *\n",
">>> x, y, z, t = symbols('x y z t')\n",
">>> k, m, n = symbols('k m n', integer=True)\n",
">>> f, g, h = symbols('f g h', cls=Function)\n",
">>> init_printing()\n",
"\n",
"Documentation can be found at http://www.sympy.org\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING: pylab import has clobbered these variables: ['solve', 'Polygon', 'sin', 'nan', 'test', 'power', 'flatten', 'poly', 'floor', 'reshape', 'plotting', 'plot', 'eye', 'mod', 'diff', 'trace', 'gamma', 'conjugate', 'product', 'transpose', 'random', 'cos', 'cbrt', 'add', 'binomial', 'trunc', 'roots', 'sqrt', 'sinh', 'pi', 'tanh', 'prod', 'vectorize', 'diag', 'multinomial', 'det', 'zeros', 'ones', 'var', 'tan', 'interactive', 'source', 'Circle', 'seterr', 'f', 'cosh', 'take', 'invert', 'beta', 'sign', 'exp', 'log']\n",
"`%matplotlib` prevents importing * from pylab and numpy\n"
]
}
],
"source": [
"%pylab\n",
"import random\n",
"from sympy import *\n",
"init_session()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, find the number of times the coin, `x`, takes on each value and how many times the person, `y`, lies."
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"x = \"1 1 0 1 1 0 0\".split()\n",
"y = \"1 0 0 0 1 0 1\".split()\n",
"x_1 = x.count('1') #x_1 is the number of heads\n",
"x_0 = x.count('0')\n",
"y_1 = int(sum([x_i == y_i for x_i,y_i in zip(x,y)])) #y_1 is the number of 'lies' which is what we care about\n",
"y_0 = int(sum([x_i != y_i for x_i,y_i in zip(x,y)])) #bug: these must either all be ints or all be floats"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can put this all into a Binomial distribution for $P(\\mathcal{D}|t_1,t_2)$. We can ignore the coefficient because it would divide out anyway. ($t$ stands for $\\theta$ here. Not sure how to put that into sympy.)"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAOQAAAAbBAMAAACevT5yAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAEN0iVJnNiUSru3Yy\nZu9l18v4AAAACXBIWXMAAA7EAAAOxAGVKw4bAAADKUlEQVRIDaVVS2sTURQ+8zJtM1NChYoU2mAX\n7nQQ3AjFKLpS2iJEyULMQoLFLrJpQbIwuBFBUHxsLGpEXLiQxl9gFurWQNGFINRfYMQHVpB6zr3n\nzCummcELmXvud77vfGfunZkA/NdoVArZ9JkFyfJO12klsR3XEYEIx3YU9CWdutfrA3cCQoH9lXkL\nGfcJnBYr004i2MOWxtWsltOdtF7ME8FLtszPZLTMX87oKAL3GFu+zmoJTiujJwtyBlvW2XIuRR3P\nJ5L7XVHTCDYUkwUbbOn52tJqqeyQy12A6Y71h1ipBKPYoxYYs0ZbWU74U6Atd6kb0I4TkTjsYb4A\nMAlg+t5nAocKbKQZVRHYPbtc/tEFWICT5QvnqcJ7uvBYkIDmU3ph3H+Mlrk2uI0rqqNhgr0VOrkG\nnoMSOFXcGELO4u8olgKYpQsPQoPBlgCvkGdsBvhQQZ4MpoVv1gE+bpUA9gN46/cQtoqSwxnRcMQs\n4ZIkhguU5a6mFtxYf96maOTB7zWNQH6RgxhKWNyyLLThAmXpSF05LLsnFca6EgEQ6lZlHbf8IHBS\nsPKswCkRKEtP6shh0ZnqYZY4wAlRt1KUtVRQZwnHBTZLEilBzh+tMiACZWkXGaXDeos/OlPd3ngT\n34B9NJYVytTVWu1ArbakZPT4wDsV4iUhMH33G6UiAmXp4o7RMH4BvLmDwUwTuD2qIANRCLpLnGXM\nMiIwmxYWVSN2l2I50oOC2qJbYHB7Zknz6YroQMtrQjNLEmkBBH+GMcuRoqblqnZHWa7CaW4v+jQg\nOtDyn48PCcBscw8xS3l8nNtToCwrBvJUe7nFsGlCB23sOaElBbAimZilU9ew9aKrLY98QkC1JxtA\nBEL7LStfllC1rCvg21uUSAu8oGe2dB79fIhf4k5ICx933V70+4WsfksltYpBhYTgOtzklNylWh4K\n+BjIG8btRb/SmLU3hbtbAprxsy4jLnAvnnjCmZigIXSaxZLbG/OjSevMVlg8kpgM47hgfHu7F+aC\nyKgGIQbzeiHtWa1oclD8NEykEsT6OnxwTcmD9ubCagMjL7oVaQQbXOovAAjDMzFdJqAAAAAASUVO\nRK5CYII=\n",
"text/latex": [
"$$t_{1}^{4} t_{2}^{4} \\left(- t_{1} + 1\\right)^{3} \\left(- t_{2} + 1\\right)^{3}$$"
],
"text/plain": [
" 4 4 3 3\n",
"t₁ ⋅t₂ ⋅(-t₁ + 1) ⋅(-t₂ + 1) "
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t_1, t_2 = symbols('t_1 t_2')\n",
"f_1 = (t_1**x_1)*((1-t_1)**x_0)*(t_2**y_1)*((1-t_2)**y_0)\n",
"f_1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 'Maximum Likelihood Estimate' is just the `t_1`, `t_2` combination that maximizes $P(\\mathcal{D}|t_1,t_2)$. I'm not sure the best way to show this analytically, but the answer is just the empirical ratio."
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAAwCAMAAAAxbW3/AAAAOVBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACXHtMAAAAEnRSTlMAEN0i\nVJnNiUSru3YyZu/z+72BNYgjAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAGMUlEQVR4Ae1bjZKkKAwG\nFf9173z/h72EgMQEdavGmVOnrdruNhI/yBdCCLPG/JJr/CXjfMgw54f085d080PHKdHN4Dp72uor\nDRLCh44zO7rJGNeftfrKc4bwQDqUaZTgK7ZRuq4xZlqU+EIBQ3geHdUiIocSXGip8KrvnR0IEhAe\nR4dtBB1KcD0bdv7uBDQiPI6OwQo6lOByOoZ6uPyd2xeuCE+jY6wEHUqwHeg1d59gtWPHxgg6pKCE\nROiSayrTa8alSjdXIXAAExAeNjtgCFs6pKC8Ls3qPB8+qpdLIvk6BAIwHOE5dNgCqIAgvtKhBOjA\nNfPp5NAG2/7tVc2Uutnaa6DmFEQo+DqCADAM4c/fdvJ/b1dB7l85uBbcNsGlBCCbnH+kPrBt/qq2\ny7TtXBdzhcY/GpuhCX6Mb/gqggYwDOGffCdvKHXkrKaM1lIC6PTeJIht9bgEHdBgiAA2FzouQDgA\nyCHqPt9BUsDuGK6pXzpaV5XAmLLY6Wlom3l6QIepW6VwBcJKhwZ4CB1t1y99x+KKEnjDDdlYlW8b\nLH1Eh9OvuwIh0aEAHkIHrRQbX80tB31KgLJtxxDw2MMjOibd/AShcb0TNRyjEBIdCuAxdKjorwRg\n44JtD5jJDbUdXaeDmTJWWjtMpZsfIzSQhXWSQ4WQ6FAAj6EjRv/VwYNg448zBXtbF+tVY7kpKm+G\n39R4FbP/6hJ7yVqjzseOEXCdH5c1p95BOAB4Ch128ZZODh4EW38MxkqmpV+hLUQ87e7Kd9ns2Kdj\nB6GATpbU09RCIbyAjhYdFaNytGgQbP1xJ5Qo5WQrHdkZHa1m7wwBFzmxeBzQoQBOZ8cAu6DY+7Eb\nywlWy6aycKF0DR2GihMxdExw3ggNtILUEAoZDYJoICJXGHciHUGw9cedhVYpxwHh+1i6RuLkvGql\nNeYMAQKjfKFCOAAIdPgIyzq5/pyAiyoG1nZZFvzdwfey9CaFjuATMXRMYDgLBEkFpSEVlEaEcJ2v\nkCQ6VgGIVn9sdGaKI1nbRi7X4R3Tgcd04jpDMLrBAR0KgOhY60ACHLjGkB2H21ajnxNdC1ePP+P4\nwhlQDB09rmbgz1pBaEiFjAZBlD1OS4a4Crg/qsnvVczaNnaXxP5TGst181J3hJVx0jOETA1FIBwC\nEB2ZblKP7YIBYg75Y0uJi8Hbyv+OiuEMKIYOB2vqCPNIK6x0kIZUyGhECOrQqh9u4Yv7I9J7cIl3\nYUthLK6c3YEfI7Qwn1o0Gbv2ETTAv6BXuqJ24h3hdZS1FWF31brBLwnw0NI8DuPjZ0AYOmyx9Bg3\ntEI0J9PgChkNYUJxK2p6GecMI/FflV4q7T6BVELk+vD7EGGsq6ryYYNp7SNoAOpfzbdPnU/E8WOC\nJBpDUlydSpgRE63Z4S+PgnH4GZBvPM0LupFWiHQwDa6Q0RD2F7dG+KMPk8wW/Ce43cILLfxZ5nco\nsMsnRwizX1Slxt59BoDoCDsX/dddNDtmliyQZAynPGQcfgbkQ0fTmGoOZwRbhUAH01AKYSO1hViH\nJBxc+uN1h0Nmx+7XIWQAPB0tfTazmri0dsSNDdJFW6oYrz0d/FDIz2WL+bqFFF0rEB1MY6uQ0dhM\nB+Xgyh+vOjo14DH56yqEHIAnYoAsEsFpmUrBCibFJrNCuui4oQ4poLcVOxSi0EEONFRGKxAdSUMo\nZDQ2dOQN9B6pp8MNxps3kzVgQRn3HSUuUA3+86n9HBb3NXQQSzF0+ANMmBpaAYJYMJ7XkAoZjVXh\nPUbfHwmFqW7wiVWGDjB/4/fXPpA1cOff5ScNpmRhbQyHQjF0WAd7eZxvQkFpKAWpkSD2x/CiJ9FX\nych74fJFA773UD503IofRsc4LI7vP27Vz1/SGUZHfsQ3qejmO/c6aaAjUyyjod6lovs6w+cHRHTc\nv6Kb7/3rpHSUsbvVul9F93UMqAFBbv+ciq7q/QsFvqI7No0vaqUiyR0rui+0vhqSr+jiYao8q79f\nRVf1/X0CqujCYapa0u9W0X2f7TMjihVdKremYPWp6GaM9f2iWNG1+r+SfCq6329+idBSRdfq00As\nqH8qutJeP3JvXRn+IOFH4D4gxxYo4NxdZlbHGj/29D9Q5UggHw6scwAAAABJRU5ErkJggg==\n",
"text/latex": [
"$$\\left. t_{1}^{4} t_{2}^{4} \\left(- t_{1} + 1\\right)^{3} \\left(- t_{2} + 1\\right)^{3} \\right|_{\\substack{ t_{1}=0.571428571428571\\\\ t_{2}=0.571428571428571 }}$$"
],
"text/plain": [
"⎛ 4 4 3 3⎞│ \n",
"⎝t₁ ⋅t₂ ⋅(-t₁ + 1) ⋅(-t₂ + 1) ⎠│t₁=0.571428571428571, t₂=0.571428571428571"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"t_1_mle = x_1/(x_1+x_0)\n",
"t_2_mle = y_1/(y_1+y_0)\n",
"f_1_mle = Subs(f_1,(t_1, t_2), (t_1_mle, t_2_mle))\n",
"f_1_mle"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAM4AAAAVBAMAAAD8yMwbAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAMt2rmYlmIkR2uxDN\nVO+L8+I6AAAACXBIWXMAAA7EAAAOxAGVKw4bAAADNElEQVRIDb1UTWgTQRh9abLZxE1ikHpOQEXw\nYrSiiGBXmuLvIVJXUITmoiJIu0jVW83Vi40oiFDtird6cLEe1QZEQTw0gl70UH9AxZ/aWm2rrcT3\nzWyK2F6b7/Bm3rfzvZdvZjLA0kdow5ni0rsAsdpkI2wQe+Y3xqfUEBvEdl9uSENmKephxSbHKbAv\nq/1omkPI1mDmB0mf+M1V9Oy5AShuOoerXNjhIrSq3XFw19kCDSz6LwI9nQ1No7tWq+XIElVrLYcm\nT8MQjBngW+0nrBw6S1C8B8kpPCiaHxBl1aT1Hr2uAiD5mXX1EKL0Xg4y/LBvzWAnEJbvXcARDtc9\nDduAMeDRVhdhH6nXUHygiu/YDmxCxAVy4dcI2wow1DbLuiAUCfQklXLNWRSA/UJuAqM+zJMe94jw\nFmhNQxpNZRGe0PxCyZrDHNAJAzDSyyswxhUA8X98FNF6ogzDN2wOsbIQ/pX6qjASHvOEEb/uk5gV\nH825b9ZvYCDNgvMY9hCfULCIj9YTZcBpF1wmYP2gTxnHxUcBsN7H5hM75GOT+rHkuFPAF/ZTYtJD\niv3MKFjoE+hJdT0OysT8CWQKVpk+CtgmM7dwrsSPfQWC8NOrfW4aRopsushNQ2JKwUIfrcey+bDG\nZWqyn0zBAH0UsImc5JdlCatkprjxEYlysrMKDDP1BC/+aKifjyUrhWg9xQKIV2Si+3wmPgqAvPoe\nmQYitkw1H/Nx4mJnGuhnKnp15YSGus9wRZbSZ+G+qX+Mugejrwr0sQS41uaFlwMALqta8rPAbbaC\nVh9YJ0mE5eQEgvumN0EI78GorJ2P4ZyaDgLd91taWt9dEyjjOUyXZxyZRdIWA+E1X/s8ZP/6qU9I\nsUDgo1WFiJ4/b8JJxlaM/6t9nKQ8DSEbcTfCU6mgGbgi71Hc/cRL6Ha5fEUQ4nlGPyDjKljEJ9BT\n2gpGbR5qBU1V6w35ck/DKefARpg2ulxzjdOWheLHkPyFfv9eke8MfeJPrccaFvEJ9KgWRG9V3SXr\n0KU0b+vtSd5Xwkit9h3o6diLBJ+yLBSP5feUYOR3sdL8SjiQLwUQ2T63mQkdimi9eqph41+49C3r\naG1s4wAAAABJRU5ErkJggg==\n",
"text/latex": [
"$$7.0442522397991 \\cdot 10^{-5}$$"
],
"text/plain": [
"7.04425223979910e-5"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f_1_mle.doit()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want to find the likelihood of the data given the MLE and the 2 parameter model, $P(\\mathcal{D}|t_1,t_2,f_1)$. In order to do this we must marginalize over all possible parameters of $f_1$, giving us\n",
"\n",
"$$P(\\mathcal{D}|t_1,t_2,f_1) = \\frac{P(\\mathcal{D}|t_{1mle},t_{2mle})}{\\int_{t_1} \\int_{t_2} P(\\mathcal{D}|t_{1},t_{2}) dt_1 dt_2}$$\n",
"\n",
"The numerator, we've already found. The denominator requires us to compute the double definite integral. Sympy is very good at that."
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAADQAAAArBAMAAADF1raWAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAVO8Qq5l2zWYy3Yki\nRLuihtmPAAAACXBIWXMAAA7EAAAOxAGVKw4bAAABNElEQVQ4Ee2Qv0vDQBSAv7OmJKSpDv0DlO4i\noh3s0CBF6mQGtyI46eBgpXvon5BBdwVn6VgpSEY3nRyc+ge4tKk/QGh9dxGKQ8BRxMfl4+W+u8d7\nBz8LVc46t76XZCny/2r2NtbWW2P293ezaWb8lpmdq2a/T+V4Xzd0Dhu9my9gS/NjAuYXRV3DKZtx\nCqwYfNWh0IL8CvYSbssAXFmROyB3AXchRR+VGEgReMSbrLod6IYs+HhDA6MGsPveAzsOqQV4rwZa\n6fPutA0VQu67OB8GWtXkKz2NHuhqFWilodWRlPI5KatI1PeCMsxcjDOsIko6ULoNNZRLuXFas1Gq\n10fbxQF2YiDKeTG3eJbcjGzpkS15AZQor419IPkELlmLUkBhWfbOmvp5b6cNqoc7pJCNrPgE7XR+\nlfRREQgAAAAASUVORK5CYII=\n",
"text/latex": [
"$$\\frac{1}{78400}$$"
],
"text/plain": [
"1/78400"
]
},
"execution_count": 161,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"integrate(f_1, (t_1, 0, 1), (t_2, 0, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now to combine them. (Notice that the answer is greater than 1!)"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAJUAAAAOBAMAAADULUP8AAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAzXYQMplU74mrIma7\nRN0SDTw+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAACV0lEQVQ4EVWTP2gUQRjFf3eb3N3u/UXBRiQh\nkibVNqI23jVCKrMip0YEbQS7nAgRFMMhCZ4EdEWDYKHbhIiFud4iV4g2Ua6xztlIrJJ4IQQlWb/5\nZpU4xXsz8973bua7WVLHBy4iQ9mrr9ZQIPdgFk60XsKb1kc4t9BqNa3XKvNGNrB49UNSTTaOmyZL\n+TS5PRScY5zHucdS5AVMtVmJ4zhUjyoc4Uyk4IQ86SYpqUdjJgrlZ3AHhULAKm4DN0jvUO5xBVzr\nUSXfIFdRcH2yDauIww7lGXhdU1iLZLtcIbdZvE+1QhNmrVeVQkimr5Dt4W4nKf9lDfmSpfBJoqh2\nSG8Lyx2hOGwrVCmHFHcUUv0DWe9Ha+LETXjDl8WGvzU+Kl2Qc+2Bs24MhcSjSjWg+FNBlJLEaXXB\n98QuVsvFXZkXd52tNtNyQVK/8R5fMIZ567HKcpOBXQVR1pp/q+GLscowXAoVnNjnks8hbu7I+mFX\nDrcpE/FYZTkwWQZk74dVFK9FShium6nAPqx0yY+Mm35lb0O6YhTxqHLwjoNBovAU8z5IOG32DXyT\nrLbM3b4XUZDDlTqJRxVpe8b0PiPK6L/qW/Jbvqws30BKDbzQc8mzC8t9zaqGiUeVQod8X4FMwNuk\n2p4BlAcC0pHChPSL/F0momyH0i+YEIN6rNJg0LzVwQpnYcwqLJoKeYzK71pzr1Bwm8406cPOUfJt\npobljUmQelThOydrCt56a7JnFbzLC77595SH4ngfBeYmazBX78LX+mfpwdJ1rMcqp0aeg4GUfKg9\nq/wBiHf1OjXx4JYAAAAASUVORK5CYII=\n",
"text/latex": [
"$$5.5226937560025$$"
],
"text/plain": [
"5.52269375600250"
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f_1_mle.doit() / integrate(f_1, (t_1, 0, 1), (t_2, 0, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There's a similar drill for the next part. Do everything for the three parameter model (notice that in the equation the fourth 'parameter' is just a combination of the other three). Here, we're going to treat each combination of `x` and `y` as a separate parameter."
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"xy = [x_i+y_i for x_i,y_i in zip(x,y)] # combine x and y into pairs\n",
"xy_00 = xy.count('00')\n",
"xy_01 = xy.count('01')\n",
"xy_10 = xy.count('10')\n",
"xy_11 = xy.count('11')\n",
"t_00, t_01, t_10 = symbols('t_00 t_01 t_10')"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAARgAAAAbBAMAAABVQ42lAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAEN0iVJnNiUSru3Yy\nZu9l18v4AAAACXBIWXMAAA7EAAAOxAGVKw4bAAADe0lEQVRIDc1WPWhUQRCee/fCneTucijYKCQk\nooGgHoJNGmNI4w/5ETTE6lA5FCwO5NLYnDYiCAZFC1OIiIVF8JqQJuBZmEpIGmOjkMJCsEhCmijC\nOTO7+97My3vVJZCFfbs733zzvtmd9wOwX1rtXXG/SIFM6UCZxWSuT8SLSgTi3duxeqXcFvPvwcv4\nOIlAvHs7Vq+e3mZ+HyyXYgMlArHebRr9TQ7QD+fqsZESgVjvNo1ewwYYS6rkRKDNO8fQa8424CbR\nMQTSK1Fs19Z5KpL8uI2XcZNofAEMRrFdXD/HWI/gCUCqD+CrC3yQNNqmAICbzm4o4UrOJF/a+S7K\nECxGiwCHAXK3Rt4A+BuQrWabBhwLfCJAuIvogZSEJvnKZQflkoFTL16jmEwDCq0WRu0ow9LIRbRQ\nu2YGvioAvFIIIZLQJF+57KBYMQAf8dapNevsVWG91bKLEyKCAuCMRsRKTiVf2gGD6abEwG0DPv7w\nvuHcsq/+zro5SACN8wEQRUJA8UMzziLB0KLFXLXe8pjpZBcWsJrKCEoAYM6604BI5/0qd2HFqeIr\niIItGZqxazGr1lkeM55svurP5KZ6EJQAwDHrTgMi3+ACd2HFqeIrCCmfnxmasWsxw9YZjzlIE0/W\nq8Mm+D0I0vmHyfzCMrvRi+14k5B+6CpSR5/ADFQZIR+fWPLvvYs+HAzvaCnTlcqpSuUOAVzAsMxT\nSOG3Mkizuw7dTfjJYggQyaAY1xBJ/YPCeexDzmZGyVcIBYNhpg0ZQO+MFZPdAE6RNT+FFH6/T7IY\nApDvkhHHhMjRLfC+Y19RtwTJV4gNhr8sjqLFPDTembL/KUhzGi4vF2GAxSDQFMmIAkbkyyZ4i9iD\nB9EEk3wlxgbLhRQtZtV4d8wcIb0mzalUA3fmN4tBgHbWJRN8U7FMmVJY3IJCRIzkKzFhMEfRYiaN\nd3puhfSaNAd/UM1ssxgEWIxNxm4kkRDBmul6gL1ugrir5DsbjzYY0eoGcGKm1u/gfbjIGcDkwzTl\n02B3hpORnwNkiafJBHdXzXdWHsMCxKUTw0i6hwe6qDQ7x/0Z8NcsyA8AJ5MftyYzzON7hvqOpvkK\nHqX3uKMckhB+KIOm0qzViukrfywqknkbuNMkMznBXRl5ofgSPnt6NoFCvxBBS0wTRDJ7+XMlE01M\nUyaz17+duDX/AYnGBokxUxrJAAAAAElFTkSuQmCC\n",
"text/latex": [
"$$t_{00}^{2} t_{01} t_{10}^{2} \\left(- t_{00} - t_{01} - t_{10} + 1\\right)^{2}$$"
],
"text/plain": [
" 2 2 2\n",
"t₀₀ ⋅t₀₁⋅t₁₀ ⋅(-t₀₀ - t₀₁ - t₁₀ + 1) "
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f_2 = (t_00**xy_00)*(t_01**xy_01)*(t_10**xy_10)*((1 - t_00 - t_01 - t_10)**xy_11)\n",
"f_2"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAcsAAAA5CAMAAABu8fDTAAAAOVBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACXHtMAAAAEnRSTlMAEN0i\nVJnNiUSru3YyZu/z+72BNYgjAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAJGUlEQVR4Ae1b29qqOAwt\nCiIizgzv/7CTY5uWBmT/e7byDVxIoVltSNIDKxLCefgWePpVZ83BLPA6mL6nur4FTl/6tjlGTd89\nuoZUPX15DI+5WvbXEIbboX2ZotF9yqJiP6Jo4DsvL+DL5ww/IRx0XJpofNPE+xFvNvxhscs9hOsM\nP4f1pYnGN225H/Fmw18g1s60YB50XJpofNOW+xFvNvwFYpeRlDioL1F3icYdttyP2NH450T7jvs+\nsC8lGnfYsI64Tjua+BbRiXY7pM0krjzq3gceQqPxfevWEdfH+y18keSgzrz3IdyJvjveuGxwFxNC\njMZoX6mI17HgIkjipkaJ8lRwW8vF7NV+yH5EaF9MDDT8Thmet7ZtH3TvL6vMIcrtjGqmaIxKc0W8\nTAUXgSLLmGCg21pqtyzth2wjWt7WSFfN0A28Z4V5STY8Mx5U/3ep0NdfdxiQJhqjwlQRr0zBRaAM\nj3IjLUW3taWo3tkP2UbkvoSeRvVls5hSFzdUsa89X2B9CK8UjVFRqohXpuAiQOZ6MYK26LZmhfLy\nfsg2wvdluBFBYFQ4mC/vw2N+DNm8ww+zv4Jxo24CjU1gBve6yaSyi/2QtxArvuxK1d/25diPOCD4\nUG5z6kfk6J/D8zrBzr5vGzhQ5Ckrcwi0SyzkVwF1ROj7kd4d3CVGKqKeUQUXgYo+nBeSorXYKmKc\no4AYGzgAeEPmGlG11smKL6doY2lffHnbyklP4Mh2EJBymxOgGnDWHWY8rBto5nuEZzfo1EWv56X8\nGoApgAUCJpQr6eouMVyhehoVXAQ+zaWVZypOeWvaaiGUX+YQo0AuZq4YoZLVTlZ82aqNtUX2ZaML\nqt5enJECCyql3OYDt/MQHff2SaNxuMNB+2Ptp+kRU8qvAEIdMUKoNDS1yhKjMaxn6ISmjaSnqiAV\nca5ICHzIFy86ze0SD4rrvLXUKmLCQpru5hAIfWPrNYRKFp30NzwuLzrpKDJ7H8iOULfph31pu011\nptTMOHBfEsTYK3LzHVjiCd3cZRXG6pbK2uBIUVLKrwBCHaE9h4ZTAhrDegb7UoXRU1UQhBP94kvz\nqFTMWzOtloLpOofAfVUgiRQl0Uwk652sjMuqL6/d5datT7KcIbuYxQUnz+YyP3ABvncjrZtQbHhR\nled4tnHEW3kfEBzE3EoPdwzGJmgM6xnneawweqopBaG2TQiQd+fYvDXTKoGqPzkERFSBqjTezDWr\nd7Liy7sZ99QHj8sbjqimn3qYEOWE2Wo9JhjPOIlaPpPK02vG+fMKY3FiKmygyVafo4fhwjcYK/I+\nINQR1xlChnTvYUqHGV2mCT2Dblxh9FRTcgVI0A2DgFvu3idvzbRKoOpPDgERVaAqjTdzzeqdrPjS\n2ftQXhoWOtzIyCnXgIPmlV4GiNvse6CUhFNiiSd7VJ4DCGD15UJekuE5IDiIZsZgw1m0G3Dd1BjW\nM1TaCtJTTckVIEE3DIKesM7RFq0xyDw9QYsfV4FCLl3WNCs7WfFll94ruE0al3f6xUkKHCOn1CWW\nOJx5rcJrYr4aHOQN7ARxMPLkr6bhMQCuF1/m8h6AtjdVBPU8wxR/feDLj8awnkEPrjB6qi+5AiTE\nl/kEs5ioQBCOvDXTKlfXfnMISKgCNWG6l2tW76TwZTe85tvAS93i3YO8CLvE5kpb/stTTtBZmmPB\nJ/kyw2woZxjGNrxwnqXJ9CaxQs/RdnDMHfRcyHuA4CJuqH4KJR0oek7mSnouTCm+xL23iX6Pw6Mm\ntTU9p37cUhJdKOBiWDIhjWDhS1OzZKz+wdpuhHVq4UuDg60wLFj4fnnFVw5lQynFAIMSl1kekS8O\nGZh6BU0uLuXXABwUJeIOc3eb3o01hvWcVFU9QVxV0Eqy2ALhceuE0tb0rE2tnJPoQgEXxZIJaQQb\nDL3qIdy6qaMHvg8jbGPX5lj0VU8UD45BZUObDsgg7KyHWmqUYivAxnhmom16zAOMW2FPo7wPgD1U\nFTFx96q5xrCe9X7UM6kQq5zop5fkKFQU5Kn16Yva6qVAKgpUxWF4qLW0M0cuv605L3PXBO/a3scg\nvqKoMaznd5Ryov+guehKCBpfNt2Eg0xO71jngzIaw3reVMWP/qP/RyQ+uvFlvFct5NRXJK6rTPgm\n1w5zFv9xviDnV9j5sZNX16p2502wgPhysb8tjZPIMqhRPgyKVe58i2sPJXWuAJedx81wJ++upWrn\ntViAfalv9L5Zik2G7rfrTPgW174g2xUQPHYet1YTbs7Ow7cA+1Jd48rxVj4y3PE9uM6Eb3Ht9LZa\nI+eDx86jYue4dN3DFehL2Bbs5dbF+Q4T/g7XTrNzhZx32fnmtU7/bzzo/6CaxyVx6+XTJt5nya2L\nL+tMuE+dG/mMbDfkPPF7oErZw3hLZHCp6XlNFmBfEreeEiRIWeZHSZaxpR0mnKCMyKlzI8880ZKc\nDz7inGNznyyuyJfMrQtXcO+R2imOkvoiX2Kiv8qEb3PtLjnPPobeyx6QUMfV9DxcC5AviVtXDg94\ndPZlmmMX3Dpb2mXCN7n2kmxPgFBl52mtvGKe5Dx8C9AfSSy3DqK11IGSZcStG+K6yp1vce0r5Hyo\ns/Oo0iSZUv9hzhpISSG3npJdNV8qu/ykIaRssMeEb3DtS7I9Aji5lvhm6eEJJH78GuZ02aYFYhK6\n6stN+CnwRRaIfw45fflFXvk1VSRB8hzn7tww/poJD4LK8iROmuNPfINwEHN9UM19eRIvzfEnvkH4\noJGO0fXOPAkuqMjrfOIbhGMY9INaCvnpa5DnSTABVktzeFmOUv4H3yD4Kp41aIFfyJNwErqS5viv\nv0E4PbZlAcmT0L8wEsWunyDAn1PNn4qlrQ99g7D1JGc95UmEVI+vmdYsZZ7ES3O4WY7f9Q2CVeos\nVyzAeRIhYiP9YwXLPEn+TcF2XiSX/8E3CFaps1yxAOdJ2JeJll3JkxTfFKQ0RzXL8fu/Qag8w3mL\nLUB5Eijiu0byZWadLE9SfiGwlRcp5X/8DUKm2nlhLcDfIKzNsfG//ZQnWXxTENMc9He9RZZjIf/j\nbxCs9me5ZgEi1at7n5r0ee97LSCk+jG+QfheM56avWeBfwFjumwR611jTwAAAABJRU5ErkJggg==\n",
"text/latex": [
"$$\\left. t_{00}^{2} t_{01} t_{10}^{2} \\left(- t_{00} - t_{01} - t_{10} + 1\\right)^{2} \\right|_{\\substack{ t_{00}=0.285714285714286\\\\ t_{01}=0.142857142857143\\\\ t_{10}=0.285714285714286 }}$$"
],
"text/plain": [
"⎛ 2 2 2⎞│ \n",
"⎝t₀₀ ⋅t₀₁⋅t₁₀ ⋅(-t₀₀ - t₀₁ - t₁₀ + 1) ⎠│t₀₀=0.285714285714286, t₀₁=0.142857142\n",
"\n",
" \n",
"857143, t₁₀=0.285714285714286"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xy_all = xy_00+xy_01+xy_10+xy_11\n",
"t_00_mle = xy_00/xy_all\n",
"t_01_mle = xy_01/xy_all\n",
"t_10_mle = xy_10/xy_all\n",
"f_2_mle = Subs(f_2,(t_00,t_01,t_10),(t_00_mle, t_01_mle, t_10_mle))\n",
"f_2_mle"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAANgAAAAVBAMAAADWeD20AAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAMt2rmYlmIkR2uxDN\nVO+L8+I6AAAACXBIWXMAAA7EAAAOxAGVKw4bAAADXElEQVRIDb2VT4gTdxTHv/kzmSSTP4vY8y5Y\nKQhiZKV60Q64S6ntIS07C11EA8VKaWlzcLsX0Rx6KZTuQoWy0D9TeilbisN2e21yKpQijAc9eMmK\noEulsna1KtmYft97E//gPQ/yefP+zHt5b36TACOS1P5PGyNqBeQHWyPrhfzlaITNWqPrhfwbF0Y3\nmtvKhdhxIAjqnFD10ADcYC4G1mbfTjAf/CJrSPkJ1oKDSO2aDgKcOfYD4C3PtAyS9US86VNjT4zU\nA3wyGAxqdKg2o3wbOIPyf8AlrDYVro9ug1np0ODdxLlmjrdueTWcbCELHDYwbiJVSrH3Cq6tUKJM\n5D3E60BGoqoVvx7tAd/H+Be5dTgdRaGP6gSzvgsNmXVk/GwTqGUiVNZxFThrYFxFq3wMHE/sStPt\noQ5wVTBtRoHNvmx52yjWUO4p8htod7jcj0JDtQNn0wGcscoEMncxB6wapJaKVPkR6EZmOpHj8yq/\nZKZqgaRB1litId9X0JY1OqXQ0A5RuEvn5yj1pNniz5g10GkiVfgeL8aJHUzLRTGxVAus2WodbR/5\nRwoegd2MnJZmggone0gHTT5I9hucahjEocIq3j02S0ZJvLPPajG02fzLEc7XkbqvgPvOm2y4xD4K\nZxMlHiCnIfcu1oGT21EC8YiwinsfGGfwqXibdq1aYZM5GzjvSzMBU0604IDNFPgTV/pAW2/dxZ0f\n2rdhoMdTrzTjZM83K3Q0BtUGfWa4E+kGFUyp3MZlaaZAbvklPrNv5NasD7yHwuNIQYccJZ3sxTWm\nQwk9fYVsjQvAxZhnoywHpNx3myj2vTqbKSQ/w6+0Ry4u8HOLI8QKGumauGU/PCDdWIyhtDXE7yNa\nIWmDiM2KIXI9RbXHZuXJyddufCtYYmqpBk/+Nco+FmT7mQ8FDXpMpMoKfyyioUP0uG+WaoWk/Q3s\na/J9zspLne1UQqQfMI9akbuF8SZSfCbYCXwtkxWbCnpMpApf6reGtuqub1sWDYWkvY/yI2Avfh9T\n5BroyjjV0FD4y/uDQ7GZuzs4OoEvInxgYNxEqqRj7/rQVn0uti2LhiB7ZPtV5KeOtYDflvcn+Gzq\nJ4adi1sNw8wUw+4/3CZ/ICeQOjDTMjBLRat47341NnSMWv8PkZY9KFD1B6AAAAAASUVORK5CYII=\n",
"text/latex": [
"$$7.77130034497288 \\cdot 10^{-5}$$"
],
"text/plain": [
"7.77130034497288e-5"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f_2_mle.doit()"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAACoAAAArBAMAAAD8sQfNAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAVO8Qq5l2zWYy3Yki\nRLuihtmPAAAACXBIWXMAAA7EAAAOxAGVKw4bAAABjElEQVQoFdXSO0sDQRDA8f/lEk2iuQS/gAcW\nNoIhKIIWOQIBxcIrYuUDGxVRMIKFVgYLW9PY2BjBziaFiJIiQbAT9BuYxsJOCfjAR5xbXeHQWnCL\nXeZ3s7u3zMDX2M1WKmVz4qimAaMLtpvNppMqtPdo7cs1YBKCZCCrlRbRMpzyCFWfQmzGfIflomYv\nFyzoltwrv56o7RsFn5p30DpjVBM+bclLuLVQLfo0WlfhWNKnaUeFqxrV/7LnwmYt8urXTtH55EBB\nayjzOAxnCWgbn9L4v1Yp4s/x1084n77VV1YWi/SfnEpoOqoFOqRcA1ehPCsM1qS5kgRsM+sV8Qar\nHrZpy0OgRPBZGilJzCuZ5WBIJ7U2tFq2aNwhdi8rROWj5MbXj6dIu8QelF6XlabXCCQuy0RelB7I\nLLnpBqGuS/dLQ+6nxkuE375PmPU2SG40T/hJbjO82wyXC6VBW3KtOmGvx3dgSWlEzrXlFfJA2g8r\nuZJSRkkl2CdVlFdIfUrkelcL0mUjMDQn0+/jAxK+hRxvELeTAAAAAElFTkSuQmCC\n",
"text/latex": [
"$$\\frac{179}{2160}$$"
],
"text/plain": [
"179 \n",
"────\n",
"2160"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"integrate(f_2, (t_00, 0, 1), (t_01, 0, 1), (t_10, 0, 1))"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAMoAAAAPBAMAAABXbk2cAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAEJmJZjLNVN0i77ur\nRHZ72Yd1AAAACXBIWXMAAA7EAAAOxAGVKw4bAAADHElEQVQ4EbWUz2sUZxjHP7szm3Gzuzq0IJKD\nWROKSBsMbjwZmgU9FYmLJw+CK23xInaEnkrB4EEPFVwsFOol60VREWNP7cll/QWCZk+leLBppYIX\nY2pIjL/G7/u8hvEfcNh5npnv5/3O87zvO7Pwydh23GE5C7nh8RbF2o6Efd1abbRwarhWg9qjBkfi\nvQm5p9t69P1T2GxeBUc2PhrBewyb0YscYkPLDbWchYF68SJniBbZk6ZpJ1R4zfpevsnV9Dnk4Q79\naToq5w8xnlQpJ95j2IyYGLYJmhpqOQvchS7zCS/YAmXy6qTDb5Rm+Pq+rr+HX+gbfwK57qyqOBLV\nKTa9x7AZvVjqEC2oiuUssAQTPOjllhiF7QQQNCorGqhiOnbABhW3Yz7GSDBDbsp7DHujies6VJY1\n2HIWcq9gvoFbMajMuac9pNR2uePC39cY+6CKkcqrOKiLyWPYjF4crFJ5K2Q5C5zXXHpqWA1R0gkz\nrPv2S23vzcnPtIbpzjrlXf+qETQXT2YXx9xAeQx7o4nToxS0m1jOgpaL7+r8fDwWM6/aHPyG/oRr\nTKv8xFJMKS6+EVYVT4L0oG7N47A3mjhdfV/FZbvxSt9cNJFoM65of5/JwKB+C+TP6bI0RXTvtAj8\nrtNVMfLjXy9V33k8NqOJtkhClrMAk/9NNKRfilnbVOax9m6K0DWfX2E3a1/Gutza8ivmSNhh3jUh\nj8fO6EVtePR+96Nlu1lVmI1/hesJa2ac9SdY0yRcLDcJ3nARDiS3/d5pLkb6WxSWMY9hM3pRH0Bo\nb7LLdrOqcJg0dlUGOyqSe60tb6vj/ib5BbcW5brekK2akKoY0Zpy0zxHDZvRi2GbfFPYchb4qlVY\n4TKcbnGgKl74X6dWv63PWp2r2VJL+gkhVTGitvnDewyb0YucZKChb8/yB+FxvL7OLqK3cMxViVSF\nGwwkxao64M+YSf2ZhHqcq2KkcpBwznsMm9GLfPr0FpzCZ7uxEAyN6Ksa+rwHZxM9qnhBIRjS2DPj\nmzStrv4ti8NdFfji6uG6J/uHVz2GvdFEeT/+8Q7DNU0mVYH1XAAAAABJRU5ErkJggg==\n",
"text/latex": [
"$$0.000937765851683878$$"
],
"text/plain": [
"0.000937765851683878"
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f_2_mle.doit() / integrate(f_2, (t_00, 0, 1), (t_01, 0, 1), (t_10, 0, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"For the next step, we have to compute the leave-one-out cross validation. This would be a pain to do by hand but we can define a function for the MLE for each model and then just have the computer compute the value for each thing to be left out."
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def mle_2_param(x,y):\n",
" x_1 = x.count('1')\n",
" x_0 = x.count('0')\n",
" y_1 = int(sum([x_i == y_i for x_i,y_i in zip(x,y)])) #bug: these must either all be ints or all be floats\n",
" y_0 = int(sum([x_i != y_i for x_i,y_i in zip(x,y)]))\n",
" f_1 = (t_1**x_1)*((1-t_1)**x_0)*(t_2**y_1)*((1-t_2)**y_0)\n",
" t_1_mle = x_1/(x_1+x_0)\n",
" t_2_mle = y_1/(y_1+y_0)\n",
" return Subs(f_1,(t_1, t_2), (t_1_mle, t_2_mle)).doit()\n",
"# log(mle_2_param(\"0 1 1\".split(),\"0 0 1\".split())) # test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The log likelihood of the leave-one-out cross validation of the 2 parameter model is:"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAK8AAAAPBAMAAABgoIKoAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAEM3dMpm7du8iZolU\nq0RaI+fpAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAC20lEQVQ4EbWTy2sTURTGv2keTWbyGFvEgkim\nUBBUpFRwpRgQVyKNoqDU6hRRdGUVSpUKpu4KVYNQcNfgpj4Qs2gtNJRm4UoQg1gs1dqgC5c1EbX2\nNX733GnwH/BAfjn3fud8c+feuUA9ghmmAjRcC54FrMlxHygOpLFt4Byw0z5S0h2HZ4vFArZOsUbJ\nwYNdxaJWjP6Jis40Ay7/BYh4XgHGXpzQQFMl5CCDeAn3vVVWxQaBF57nlbEHzTmRTQ7XtRICPrGo\nHjdcpgI0TN8Ewhlc0sBFhN1YHlEHuz/kgB3dVeAMEIeZheWIHOJ8WSuvgQd1VyD6iMYC1cFYYqlG\n4g8zy4XRzl4VjTQuAOMIlxGrapkVaa28BJqlTMNqcKkp+MaXVSYIZ5klftlW/h9jzvQiWUZiTWRW\nfONPPXKpB1NMN+OJ8hTQ+GlfGrWjfRWNZNskz21xheXzYyfZoNq5V0Aqg8RvLQPsFyXuPcsz9cPo\npbGAE2E7umHU8hgWILULkRIsrxXowWhl05jPGS0guKpl8H18ZaZmM/XDAo0FeuKC4dk4vUXBTlUR\nGsSbr8sVauF2v934TuOMMhYZKdWo3iX2+dA9oGmfija8UsYCpQPncz/4SVUEyXaYG2YZtwcphHiS\nshWNzuZWKBm4rtqU8hyNy/UlGwUaC5T8DpipdNA4Lwg4MFciOQTX4g4seohxwIU6vNiayMB+1amU\nIaCzpAYqYgsLi3dvKfRydIcrtufUigXxLJek3nQ+4iDEVjFOlbkvLsyqyDB4PURRWxTPq4EfEZeJ\ngJcMHejk9moEuYlZrhhXeKm4Pm3cySozi5AjMoI/lY+/4jCL65F0mRIpByMwhxAvGMMa+IiWUqIV\nZm80g8fskRW/pTEOoCUtMmJ14y82xuquvBtX1/OCQBnRrlkbmOhO+7BOvQeOdfFjHpk+zvObq80D\nD0vs3t5PReToMIeiBGcnKsz/U/wFv8kEDR/w3AEAAAAASUVORK5CYII=\n",
"text/latex": [
"$$-44.5666851706718$$"
],
"text/plain": [
"-44.5666851706718"
]
},
"execution_count": 167,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_cv_2 = [log(mle_2_param(x[0:i]+x[i+1:-1],y[0:i]+y[i+1:-1])) \n",
" for i in range(len(x))]\n",
"sum(log_cv_2)"
]
},
{
"cell_type": "code",
"execution_count": 168,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAK8AAAAPBAMAAABgoIKoAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAEM3dMiKJu1SZZnZE\n76v5rQUQAAAACXBIWXMAAA7EAAAOxAGVKw4bAAADBElEQVQ4EbWUy2sTQRjAf9vmsdk8urZQEKmJ\nICgiVhR8nLpeBC+6KAp6MaC0VLGtiKigmKsebPRQDfjYkwdFraKCeHDRkwexVYoeFOoDsfVRW2kb\nakv8Ziapf4Fz+H3fzOz+dvLNTKDaou0dBZVGS8c8rL4un7q7kUtYjztD2NV+XKbaH3osenhZstjB\n/VUsPCCZBmyRt5WlppIHpe3GnlFxMbFZWoLoEOlKJU8CvhIt0hvQGCZyFMkUsFZywYDlNPkGsL5q\nqamUDp4U+K3iB3jPDxihrusBnIXtJMs0rOYE8R47IJojXuSYgTNALKcByTdVS02ldPAitCZUfA1j\nnmTdZFS3A5pIvSObS01KN9aDtZqXvqQa8WHscQ3Ys6xqqankId1MKVpdxjZNy7c9LX55mANqtjeI\nD0hITbuxgFNqRKNhmFRZA/IixhTUqFRXtaa8DtDmvpIVh5kbjzwyleuBDFvraFh6UPZtbEa+M7H5\nUWiQLZL6o4Hja7G2zKuUcddGVwVZ1ZTUgdYg7kZn5QMTMhzdd57sCtIFYpUlWBMBbzXYmycypcEi\nlFhb5lXaR+ydifXD1PXb3QXpHcH+3qaHD4XZcRKDnHs2F1oVlysLFNy9RSVWIK/FxqJUjWtVW6qc\n782SS5LevN/tSbjm3yY5p4bTg3IunFlnmCeD6vjcCTX+lcL2jNhYqip5kZ1wuqCSZFGRMfejKvQQ\nbCtEfeLl+hzOTNonUmaNiAMN2TdbbZ5dbkaJtWVepUXyy4z4DOKBVcgOXlvwCzJBw7iIMwOy4qzM\njKpjfifUiPfgjGuc+/lz7hvaMq/SYjndbUoYKZL0b/mRSblksjRZcdxP91A/GZEaD8iKOco2Ka+B\n3I2EuiCJnLwqCm2pqbSXG9h/5Bqwo73zE/fcxkAuuTPEU5ebOAG9/XyhpZBagtNPJm+9NWADLZ4B\nTGMsWmWswlTpYogciNZK5Tex0mU5Y30jLpER9Sf0vHRSbl3pM2ztkxk6r3pVND+WQQ06KqPGolXy\n1P9pfwEuTCGTAs3q1wAAAABJRU5ErkJggg==\n",
"text/latex": [
"$$-3.29583686600433$$"
],
"text/plain": [
"-3.29583686600433"
]
},
"execution_count": 168,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def mle_3_param(x,y):\n",
" xy = [x_i+y_i for x_i,y_i in zip(x,y)]\n",
" xy_00 = xy.count('00')\n",
" xy_01 = xy.count('01')\n",
" xy_10 = xy.count('10')\n",
" xy_11 = xy.count('11')\n",
" xy_all = xy_00+xy_01+xy_10+xy_11\n",
" f_2 = (t_00**xy_00)*(t_01**xy_01)*(t_10**xy_10)*((1 - t_00 - t_01 - t_10)**xy_11)\n",
" t_00_mle = xy_00/xy_all\n",
" t_01_mle = xy_01/xy_all\n",
" t_10_mle = xy_10/xy_all\n",
" return Subs(f_2,(t_00,t_01,t_10),(t_00_mle, t_01_mle, t_10_mle)).doit()\n",
"log(mle_3_param(\"0 1 1\".split(),\"0 0 1\".split()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The log likelihood of the leave-one-out cross validation of the 3 parameter model is:"
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAKUAAAAPBAMAAAB3ghJhAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAEM3dMiKJu1SZZnZE\n76v5rQUQAAAACXBIWXMAAA7EAAAOxAGVKw4bAAACtUlEQVQ4EbVUTWsTURQ9k68mTTMO7cqFJoK4\nsNAGBMVNGxCErjooCl0Us5ApddG0ILjsuBEUxAjaEpCSlQuLWooupIsGtyItSlGkpRERjB80/fKj\n1o7n3pf4D7wwp/eed87Je2+SAo2KekM+rLnhiszR0pUclMF+7yMwMHkZkZ4JzzOKUzXPyysYc6fT\nR+/EiNvwGJY2xHeQAD4JcRCxXcNY7zDoWlUUKq1BEOwZxWO2VQXgpgOsBt+BA+XokvE0IzHvYxM3\ngDPCvAdWDGMvwi7aDlKLCReoGsVdwIaCVVtj5pEvXPsG1IznX+bLirWBIaBDmNfAWk6ZfRnE6qks\n7K0YEMsZRR64CAVgnplVWrABFKAemRrFsy+M4ZJM3Q4zIbeRHkVyK7TOTNIv0FS0zXBUaGZaP5hv\nPFxqVkcednC/3Bh7+flkUtznLqnwOmGUZzaKFlEpSObn6TvAK+6zoh5ZM3X+JFcLGwSpNt66MLE6\nQjucF/I8epmNUehpFCRzDBcqPDi6y+qhqlmxZcS/9i6bMVzlXzLoxLWfbHv4pPkYhVVnq6D3yS1n\nEZqJF3zjaT8mdZgarDiPkPxjNloSggxa507zKhNFTrN8jCKZYavQyExsA9NPCjnS9DTrHDDuLwH9\nvjBJZijD3uZVPhPyBB+jCI+yVZBMu3HnWHOaHpEDgYPxq3UmlGW6jqgrjM8+VEW8yE+w9sx5qUhX\nyStIJt9jQl4hujTFl1brDdDrchctLsdIEUlXmdYl9LvoA54i8osrRtHPY0BBMvnrC2fw0I1sQz2a\nJ/AA8d947mAa6QzOesMfDJM8ZB1F9Lg3lUVcMlWBW5KpIJnRIgMx67SXjUfitNpKkxVEavwfwlfe\nHQSbUAbDpQpC/H1nEX1LoSow6LNVmFrtKmNg5DZfeOkev4KS8j/qLzd8/lvMZUBlAAAAAElFTkSu\nQmCC\n",
"text/latex": [
"$$-38.239278771593$$"
],
"text/plain": [
"-38.2392787715930"
]
},
"execution_count": 169,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"log_cv_3 = [log(mle_3_param(x[0:i]+x[i+1:-1],y[0:i]+y[i+1:-1])) \n",
" for i in range(len(x))]\n",
"sum(log_cv_3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"The the model with the greatest BIC score is chosen. Does BIC score favor the 2-parameter model?"
]
},
{
"cell_type": "code",
"execution_count": 170,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAC0AAAAPBAMAAACCUFuUAAAAMFBMVEX///8AAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv3aB7AAAAD3RSTlMAMqvdzRC772ZUdiKJ\nmUS6hfrNAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA90lEQVQYGWMQsv+kpPI/gAEIhHxAJAzIJzAw\niAuAeKxFMDEQLa/AwMAxASySBCahBEicdQEDCKCLb2U4gFW8m4Gha80CkHrGlV0ODHt7tIHKgOZk\nM3BaM1wCiYcxMBjwmTI8B7pPvkj9NwNnIcP7AKA4UGAB+wEGdqBmsHoGxjX3NwDFmb+tZpC/0dHU\nABbvZmB12yAvABRnPfL/Qj/E0SB3MrAZMMiLBiQxbGXg+CcPNAQIwOL8BxjmywkkMcxiYNBgdmBg\nXQAVZ09gOC8RkMYwPYAhm/EHA28Aw177b7cZGLh110R5Cf3XftPV84Ah6NZRoDnYAQAwwD6NhCuI\njAAAAABJRU5ErkJggg==\n",
"text/latex": [
"$$\\mathrm{False}$$"
],
"text/plain": [
"False"
]
},
"execution_count": 170,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bic_2 = log(f_1_mle.doit()) - log(len(x))\n",
"bic_3 = log(f_2_mle.doit()) - (3/2)*log(len(x))\n",
"bic_2 < bic_3\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment