Last active
October 19, 2020 15:49
-
-
Save frankcleary/a89da479d85c98f86e31 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"metadata": { | |
"name": "", | |
"signature": "sha256:a96922a4a2453eba6073c16430fff29b217a257985a8e9e533c3647bd0d3d815" | |
}, | |
"nbformat": 3, | |
"nbformat_minor": 0, | |
"worksheets": [ | |
{ | |
"cells": [ | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"Singular Value Decomposition and Applications" | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 3, | |
"metadata": {}, | |
"source": [ | |
"Frank Cleary | <a href=\"http://www.frankcleary.com\">www.frankcleary.com</a> | See also: <a href=\"http://www.frankcleary.com/svdimage\">SVD Image Compression</a> | <a href=\"https://gist.github.com/frankcleary/a89da479d85c98f86e31\">Notebook Gist</a>" | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"Introduction" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The singular value decomposition of a matrix has many applications. Here I'll focus on an introduction to singular value decomposition and an application in clustering articles by topic. In another notebook (<a href=\"http://nbviewer.ipython.org/gist/frankcleary/4d2bd178708503b556b0\">link</a>) I show how singular value decomposition can be used in image compression.\n", | |
"\n", | |
"Any matrix $A$ can be decomposed to three matrices $U$, $\\Sigma$, and $V$ such that $A = U \\Sigma V$, this is called singular value decomposition. The columns of $U$ and $V$ are orthonormal and $\\Sigma$ is diagonal. Most scientific computing packages have a function to compute the singular value decomposition, I won't go into the details of how to find $U$, $\\Sigma$ and $V$ here. Some sources write the decomposition as $A = U \\Sigma V^T$, so that their $V^T$ is our $V$. The usage in this notebook is consistent with how numpy's singular value decomposition function returns $V$." | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Example with a small matrix $A$:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"If $A = \\begin{bmatrix} 1 & 0 \\\\ 1 & 2 \\end{bmatrix}$\n", | |
" \n", | |
"$A$ can be written as $U \\Sigma V$ where $U$, $\\Sigma$, and $V$ are, rounded to 2 decimal places:\n", | |
"\n", | |
"$U = \\begin{bmatrix} -0.23 & -0.97 \\\\ -0.97 & 0.23 \\end{bmatrix}$\n", | |
" \n", | |
"$S = \\begin{bmatrix} 2.29 & 0 \\\\ 0 & 0.87 \\end{bmatrix}$\n", | |
" \n", | |
"$V = \\begin{bmatrix} -0.53 & -0.85 \\\\ -0.85 & 0.53 \\end{bmatrix}$" | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"Interpretation" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Although the singular value decomposition has interesting properties from a linear algebra standpoint, I'm going to focus here on some of its applications and skip the derivation and geometric interpretations.\n", | |
"\n", | |
"Let $A$ be a $m \\times n$ matrix with column vectors $\\vec{a}_1, \\vec{a}_2, ..., \\vec{a}_n$. In the singular value decomposition of $A$, $U$ will be $m \\times m$, $\\Sigma$ will be $m \\times n$ and $V$ will be $n \\times n$. We denote the column vectors of $U$ as $\\vec{u}_1, \\vec{u}_2, ..., \\vec{u}_m$ and $V$ as $\\vec{v}_1, \\vec{v}_2, ..., \\vec{v}_n$, similarly to $A$. We'll call the values along the diagonal of $\\Sigma$ as $\\sigma_1, \\sigma_2, ...$.\n", | |
"\n", | |
"We have that $A = U \\Sigma V$ where:\n", | |
"\n", | |
"$U = \\begin{bmatrix} \\\\ \\\\ \\\\ \\vec{u}_1 & \\vec{u}_2 & \\dots & \\vec{u}_m \\\\ \\\\ \\\\ \\end{bmatrix}$\n", | |
"\n", | |
"$\\Sigma = \\begin{bmatrix} \\sigma_1 & 0 & \\dots \\\\ 0 & \\sigma_2 & \\dots \\\\ \\vdots & \\vdots & \\ddots \\end{bmatrix}$\n", | |
"\n", | |
"$V = \\begin{bmatrix} \\\\ \\\\ \\\\ \\vec{v}_1 & \\vec{v}_2 & \\dots & \\vec{v}_n \\\\ \\\\ \\\\ \\end{bmatrix}$\n", | |
"\n", | |
"Because $\\Sigma$ is diagonal, the columns of $A$ can be written as:\n", | |
"\n", | |
"$\\vec{a}_i = \\vec{u}_1 * \\sigma_1 * V_{1,i} + \n", | |
" \\vec{u}_2 * \\sigma_2 * V_{2,i} + ... = U * \\Sigma * \\vec{v}_i$\n", | |
" \n", | |
"This is equivalent to creating a vector $\\vec{w}_i$, where the elements of $\\vec{w}_i$ are the elements of $\\vec{v}_i$, weighted by the $\\sigma$'s:\n", | |
"\n", | |
"$\\vec{w}_i = \\begin{bmatrix} \\sigma_1V_{1,i} \\\\ \\sigma_2V_{2,i} \\\\\n", | |
" \\sigma_3V_{3,i} \\\\ \\vdots \\end{bmatrix} = \\Sigma * \\vec{v}_i$\n", | |
" \n", | |
"Then $\\vec{a}_i = U * \\vec{w}_i$. That is to say that every column $\\vec{a}_i$ of $A$ is expressed by a sum over all the columns of $U$, weighted by the values in the $i^{th}$ column of $V$, and the $\\sigma$'s. By convention the order of the columns in $U$ and rows in $V$ is chosen such that the values in \n", | |
"$\\Sigma = \\begin{bmatrix} \\sigma_1 & 0 & \\dots \\\\ 0 & \\sigma_2 & \\dots \\\\ \\vdots & \\vdots & \\ddots \\end{bmatrix}$ obey $\\sigma_1 > \\sigma_2 > \\sigma_3 > ...$. This means that as a whole, the first column of $U$ and the first row of $V$ contribute more to the final values of $A$ than subsequent columns. This has applications in image compression (<a href=\"http://nbviewer.ipython.org/gist/frankcleary/4d2bd178708503b556b0\">link to another notebook</a>) and reducing the dimensionality of data by selecting the most import components." | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Brief discussion of dimensionality" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"This section isn't required to understand how singular value decomposition is useful, but I've included it for completeness.\n", | |
"\n", | |
"If $A$ is $m \\times n$ ($m$ rows and $n$ columns), $U$ will be $m \\times m$, $\\Sigma$ will be $m \\times n$ and $V$ will be $n \\times n$. However, there are only $r = rank(A)$ non-zero values in $\\Sigma$, i.e. $\\sigma_1, ..., \\sigma_r \\neq 0$; $\\sigma_{r+1}, ..., \\sigma_n = 0$. Therefore columns of $U$ beyond the $r^{th}$ column and rows of $V$ beyond the $r^{th}$ row do not contribute to $A$ and are usually omitted, leaving $U$ an $m \\times r$ matrix, $\\Sigma$ an $r \\times r$ diagonal matrix and $V$ an $r \\times n$ matrix.\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Example with data:" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Singular value decomposition can be used to classify similar objects (for example, news articles on a particular topic). Note above that similar $\\vec{a_i}$'s will have similar $\\vec{v_i}$'s.\n", | |
"\n", | |
"Imagine four blog posts, two about skiing and two about hockey. I've made up some data about five different words and the number of times they appear in each post:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import pandas as pd\n", | |
"\n", | |
"c_names = ['post1', 'post2', 'post3', 'post4']\n", | |
"words = ['ice', 'snow', 'tahoe', 'goal', 'puck']\n", | |
"post_words = pd.DataFrame([[4, 4, 6, 2],\n", | |
" [6, 1, 0, 5],\n", | |
" [3, 0, 0, 5],\n", | |
" [0, 6, 5, 1],\n", | |
" [0, 4, 5, 0]],\n", | |
" index = words,\n", | |
" columns = c_names)\n", | |
"post_words.index.names = ['word:']\n", | |
"post_words" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>post1</th>\n", | |
" <th>post2</th>\n", | |
" <th>post3</th>\n", | |
" <th>post4</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>word:</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>ice</th>\n", | |
" <td> 4</td>\n", | |
" <td> 4</td>\n", | |
" <td> 6</td>\n", | |
" <td> 2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>snow</th>\n", | |
" <td> 6</td>\n", | |
" <td> 1</td>\n", | |
" <td> 0</td>\n", | |
" <td> 5</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>tahoe</th>\n", | |
" <td> 3</td>\n", | |
" <td> 0</td>\n", | |
" <td> 0</td>\n", | |
" <td> 5</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>goal</th>\n", | |
" <td> 0</td>\n", | |
" <td> 6</td>\n", | |
" <td> 5</td>\n", | |
" <td> 1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>puck</th>\n", | |
" <td> 0</td>\n", | |
" <td> 4</td>\n", | |
" <td> 5</td>\n", | |
" <td> 0</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 1, | |
"text": [ | |
" post1 post2 post3 post4\n", | |
"word: \n", | |
"ice 4 4 6 2\n", | |
"snow 6 1 0 5\n", | |
"tahoe 3 0 0 5\n", | |
"goal 0 6 5 1\n", | |
"puck 0 4 5 0" | |
] | |
} | |
], | |
"prompt_number": 1 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It looks like posts 1 and 4 pertain to skiing, and while posts 2 and 3 are about hockey." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Imagine the DataFrame <code>post_words</code> as the matrix $A$, where the entries represent the number of times a given word appears in the post. The singular value decomposition of $A$ can be calculated using numpy." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import numpy as np\n", | |
"\n", | |
"U, sigma, V = np.linalg.svd(post_words)\n", | |
"print \"V = \"\n", | |
"print np.round(V, decimals=2)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"V = \n", | |
"[[-0.4 -0.57 -0.63 -0.35]\n", | |
" [-0.6 0.33 0.41 -0.6 ]\n", | |
" [ 0.6 -0.41 0.32 -0.61]\n", | |
" [-0.34 -0.63 0.58 0.39]]\n" | |
] | |
} | |
], | |
"prompt_number": 2 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Recall that $\\vec{a}_i = U * \\Sigma * \\vec{v}_i$, that is each column $\\vec{v}_i$ of $V$ defines the entries in that column, $\\vec{a}_i$, of our data matrix, $A$. Let's label V with the identities of the posts using a DataFrame:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"V_df = pd.DataFrame(V, columns=c_names)\n", | |
"V_df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>post1</th>\n", | |
" <th>post2</th>\n", | |
" <th>post3</th>\n", | |
" <th>post4</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>-0.395634</td>\n", | |
" <td>-0.570869</td>\n", | |
" <td>-0.630100</td>\n", | |
" <td>-0.347212</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>-0.599836</td>\n", | |
" <td> 0.331743</td>\n", | |
" <td> 0.408279</td>\n", | |
" <td>-0.602870</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> 0.604001</td>\n", | |
" <td>-0.405353</td>\n", | |
" <td> 0.321932</td>\n", | |
" <td>-0.605996</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>-0.344752</td>\n", | |
" <td>-0.632253</td>\n", | |
" <td> 0.576751</td>\n", | |
" <td> 0.385695</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 3, | |
"text": [ | |
" post1 post2 post3 post4\n", | |
"0 -0.395634 -0.570869 -0.630100 -0.347212\n", | |
"1 -0.599836 0.331743 0.408279 -0.602870\n", | |
"2 0.604001 -0.405353 0.321932 -0.605996\n", | |
"3 -0.344752 -0.632253 0.576751 0.385695" | |
] | |
} | |
], | |
"prompt_number": 3 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Note how post1 and post4 agree closely in value in the first two rows of $V$, as do post2 and post3. This indicates that posts 1 and 4 contain similar words (in this case words relating to skiing). However, the agreement is less close in the last two rows, even among related posts. This is because the weights of the last two rows, $\\sigma_3$ and $\\sigma_4$, are small compared to $\\sigma_1$ and $\\sigma_2$. Let's look at the values for the $\\sigma$'s." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"sigma" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 4, | |
"text": [ | |
"array([ 13.3221948 , 9.2609512 , 2.41918664, 1.37892883])" | |
] | |
} | |
], | |
"prompt_number": 4 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"$\\sigma_1$ and $\\sigma_2$ are about an order of magnitude greater than $\\sigma_3$ and $\\sigma_4$, indicating that the values in the first two rows of $V$ are much more important than the values in the last two. In fact we could closely reproduce $A$ using just the first two rows of $V$ and first two columns of $U$, with an error of at most 1 word:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"A_approx = np.matrix(U[:, :2]) * np.diag(sigma[:2]) * np.matrix(V[:2, :])\n", | |
"\n", | |
"print \"A calculated using only the first two components:\\n\"\n", | |
"print pd.DataFrame(A_approx, index=words, columns=c_names)\n", | |
"print \"\\nError from actual value:\\n\"\n", | |
"print post_words - A_approx" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"A calculated using only the first two components:\n", | |
"\n", | |
" post1 post2 post3 post4\n", | |
"ice 3.197084 4.818556 5.325736 2.792675\n", | |
"snow 5.619793 0.588201 0.384675 5.412204\n", | |
"tahoe 4.043943 0.071665 -0.123639 3.917015\n", | |
"goal 0.682117 5.089628 5.762122 0.336491\n", | |
"puck 0.129398 4.219523 4.799185 -0.143946\n", | |
"\n", | |
"Error from actual value:\n", | |
"\n", | |
" post1 post2 post3 post4\n", | |
"word: \n", | |
"ice 0.802916 -0.818556 0.674264 -0.792675\n", | |
"snow 0.380207 0.411799 -0.384675 -0.412204\n", | |
"tahoe -1.043943 -0.071665 0.123639 1.082985\n", | |
"goal -0.682117 0.910372 -0.762122 0.663509\n", | |
"puck -0.129398 -0.219523 0.200815 0.143946\n" | |
] | |
} | |
], | |
"prompt_number": 5 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To help visualize the similarity between posts, $V$ can be displayed as an image. Notice how the similar posts (1 and 4, 2 and 3) have similar color values in the first two rows:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"%matplotlib inline\n", | |
"import matplotlib.pyplot as plt\n", | |
"\n", | |
"plt.imshow(V, interpolation='none')\n", | |
"plt.xticks(xrange(len(c_names)))\n", | |
"plt.yticks(xrange(len(words)))\n", | |
"plt.ylim([len(words) - 1.5, -.5])\n", | |
"ax = plt.gca()\n", | |
"ax.set_xticklabels(c_names)\n", | |
"ax.set_yticklabels(xrange(1, len(words) + 1))\n", | |
"plt.title(\"$V$\")\n", | |
"plt.colorbar();" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "display_data", | |
"png": "iVBORw0KGgoAAAANSUhEUgAAATMAAAEKCAYAAAB+LbI7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFWZJREFUeJzt3X2QnVV9wPHvZhPASBEpGPLaDQhBaxWiBgVtVgw2BUmg\nTotYJfWPDs4Y6lDrAGI1sW+A9aUWy6QQaBCUGVExDAYSlY04IBAMgarJJiUpCSERiVJU8rKb7R/n\n2ezN3XvvPs+9z9179zzfz8yZfZ57z3mec2Y3v5xznpcDkiRJkiRJkiRJkiRJkiRJkhSRacBtwC7g\nQyWfnwL8FLgeeHUL6iVJmc0G/qfss2nAX7SgLpJUt+OA/cC4ks+uaFFdJKkhvwK6ku33ATNaVxUp\nH+NbXQG1xNPATEIPbQLwTGurIzXOYFZMTwMnA2cAX2hxXSSpbp8DHiX0zqQo2DMrpl5ga5IkSZIk\nxWI+sBHYDFxZJU83sB74b6AnY1lJarpOYAvhVp8JwBPA68ryHEt4wmRasn98hrKpjRs5iyRVNYcQ\nkLYBB4A7gYVleT4AfBPYkez/MkPZ1AxmkhoxFdhesr8j+azUKYQnTx4A1jH0XHCasqk1fDVzxqlz\nB57pXdvoYSTVZy1hPqouR8HA3mxFXgKOKdkfSFFmAuGZ4HcDE4GHgR+nLJtaw8Hsmd610J1rnWrb\nugRmLhm98/WsGr1zAXA78MFRPN+jo3guCP85v2v0Trf4M6N3LoBHlsCZS0blVBe8Ae75SMfcRo6x\nF/jHDPk/Bb9X9tGzwPSS/ekMDScHbScMLV9O0g+BNyX5RiqbmsNMqeAmZEgVrCMMI7uAI4CLgZVl\neb4DvIMw4T8ROBP4WcqyqXnTrFRwDQaBPmAxcD8hWC0Hfg5clny/jHDrxX3Ak8BB4CZCMKNK2bqM\nvWB2bHera9Bkb2x1BZqsq9UVaK6p3a2uQWavaPwQq5JUalnZ/r8mKU3Zuoy9YPbq7lbXoMliD2aR\nPw46rbvVNcisyvBxzBl7wUxSrmIJArG0Q1Kd7JlJikIsQSCWdkiqkz0zSVEwmEmKQg63ZrQFg5lU\ncLEEgVjaIalODjMlRSGWIBBLOyTVyZ6ZpCjEEgRiaYekOtkzkxQFb82QFAV7ZpKiEEsQiKUdkuo0\nIUsU6GtaNRpmMJMKbnwkwSzNgia3ALuBp5pcF0ktMKEzfapiPuE9/5uBK2uc6q2EcPi+ks+2EdYG\nWE+DS4Wlicm3Av8O3NbIiSS1p0w9s+E6gRuAeYRl5x4jrLBUvjBJJ3AdYWGTUgOEdT/3NFQL0gWz\nB4l+FQqpuCYc2VDxOcAWQg8L4E5gIcOD2eXAXYTeWbmOhmqQcN1MqejGZ0jDTSUs8jtoR/JZeZ6F\nwI3Jfumq4QPA9whraP51/Y3wAoCkGlGgZ29INQzU/Db4EnBVkreDw3tiZwPPAScAawhzbw+mOOYw\n+QSzrUuGto/tLsBycFKL7OiBZ3sA2LQ1p2PWiALdR4c0aOn/DcvyLDC9ZH86oXdW6s2E4SfA8cCf\nAgcIc2vPJZ8/D3ybMGxtYTCbuSSXw0gawbTuQ2tzznoD9N67tPFjVr9KmcY64BTCvPpO4GLgkrI8\nJ5Vs3wrcQwhkE5OzvwS8EngPUHeD0gSzrwNzgd8njI0/nVRIUgwa69L0AYuB+wmBaTlh8v+y5Pvy\nlc1LnQh8q6QWdwCr661ImmaUR1lJMWnsaibAqiSVqhbEPlyy/TRwesNnT3gBQCq6SKJAJM2QVLdI\nokAkzZBUt8YuALQNg5lUdJFEgUiaIalukUSBSJohqW6RRIFImiGpbo3fmtEWDGZS0UUSBSJphqS6\neTVTUhQiiQKRNENS3SKJApE0Q1LdHGZKikIkUSCSZkiq21GtrkA+DGZS0TnMlBSFSKJAJM2QVLdI\nokAkzZBUt0iGma6bKRVdY+tmAswnLBG3GbiywvcLgQ3AeuBx4JwMZTM1Q1KRNRYFOoEbgHmEZece\nI6y8VLqi+feA7yTbf0RYUu61KcumZs9MKrojM6Th5gBbgG2EtTDvJPTESv22ZPto4JcZyqZmMJOK\nrrFh5lTCEpSDdiSflbuQ0ONaBfxNxrKpOMyUiq5GFOjZDD1bapYeSHmWu5P0TuCrwGkpy6WWSzDr\nWHtTHodpS/1XXzZypjFs4KFW16C5Om+Y0uoqNM8FM/I5To2rmd2nhTRo6X3DsjwLTC/Zn07oYVXz\nICHuHJfky1K2JoeZUtE1NsxcB5wCdAFHABcTJvFLnQx0JNuzk58vpCybqRmSiqyxKNAHLAbuJ/Tx\nlhPmxgaHNMuA9wGXEib5fwO8f4SydTGYSUXX+E2zq5JUalnJ9vVJSlu2LgYzqeh8a4akKEQSBSJp\nhqS6RfJspsFMKrpIokAkzZBUt0iiQCTNkFQ3h5mSouDVTElRsGcmKQqRRIFImiGpbpFEgUiaIalu\nkUSBSJohqW7OmUmKQiRRIJJmSKpb5Xf7jzkGM6noIokCkTRDUt0iiQKRNENS3SKJApE0Q1K9BiK5\nmumCJlLB9Y9Pn6qYD2wENgNXVvj+NOBhYC/w8bLvtgFPAuuBRxtphz0zqeBqBKk0OoEbgHmEZece\nI6ywVLowyQvA5YSFgMsNAN3AnoZqQbpgNh24DXhNcuL/BL7c6IkltYd9Rx6RIff+8g/mAFsIPSyA\nO4GFHB7Mnk/S+VUO2lHl80zSBLMDwBXAE8DRwOPAGhpYEkpS++jvbGjSbCqwvWR/B3BmhvIDwPeA\nfsKKTnWvKJ4mmO1KEoQ1734OTMFgJkWhv7HnmQYaPP3ZwHPACYRO0kbCqueZZR0tdwFnAI/UczJJ\n7aevRjB7qKePh3r6ahV/ljAVNWg6oXeW1nPJz+eBbxOGrU0PZkcDdwEfI/TQDhkYKF1RfRYdHbPq\nqYukEW1KEmza9KpcjthfIwyc2T2eM7uH9j+/dF95lnXAKYSOzk7gYuCSKocrnxubSLiA8BLwSuA9\nwNKU1R4mbTCbAHwTuB24e1gNOxbUe35JmcxKEsyaNYPe3jsaPmKDw8w+YDFwPyEwLSdMQV2WfL8M\nOJFwlfMY4CChQ/R6wkXFbyX5xgN3AKvrrUiaYNaRVPBnwJfqPZGk9tRgMANYlaRSy0q2d3H4UHTQ\nb4DTGz35oDTB7Gzggwzd2AZwNXBfXpWQ1Dr7yHJrRvtKE8x+hE8KSNGqNWc2lsTRCkl1y2GY2RYM\nZlLBGcwkRaHWfWZjicFMKjjnzCRFwWGmpCjsL9CtGZIi5pyZpCg4ZyYpCs6ZSYqCwUxSFJwzkxSF\n/RzZ6irkwmAmFZzDTElRcJgpKQremiEpCrEMM33polRw/XSmTlXMJywRtxm4skqeLyffbyCs8Jal\nbCr2zKSCa7Bn1gncAMwjLDv3GLCSw9fVPQ94LWEVpzOBG4G3pSybmsFMKrh9jd2aMQfYAmxL9u8E\nFnJ4QFoArEi2HwGOJazYNDNF2dQMZlLBNdgzmwpsL9nfQeh9jZRnKjAlRdnUDGZSwTUYzAZS5itf\nADh3BjOp4GrdZ7a1Zztbe7ZX/Z4w11W6JuZ0Qg+rVp5pSZ4JKcqmZjCTCq7WfWYzumcyo3vmof0H\nlj5cnmUdYWK/C9gJXAxcUpZnJWHV8zsJE/+/BnYDL6Qom1ouwezvBz6Sx2Ha0vievlZXoan6/iTy\n/8/Wxvz768/pKA0NM/sIgep+wtXJ5YQJ/MuS75cB3yVc0dwC/Bb48Ahl6xL5X7KkkeRw0+yqJJVa\nVra/OEPZuhjMpILb5xoAkmLgs5mSohDLs5kGM6ngDGaSouD7zCRFwTkzSVFwmCkpCvu9NUNSDJwz\nkxQF58wkRcE5M0lRMJhJioJzZpKi4JyZpCh4a4akKDjMlBQFh5mSohDL1cxxra6ApNbqpzN1yug4\nYA3QC6wmLP5byS2EBU6eKvt8CWG1pvVJml/rZAYzqeCaGMyuIgSzU4HvJ/uV3ErlQDUAfAE4I0n3\n1TqZw0yp4PZxZLMOvQCYm2yvAHqoHNAeJCw3V0nqxYPtmUkF18Se2STC8JHk56Q6qnc5sIGwDF21\nYSqQLpgdBTwCPAH8DPiXOiokqU01GMzWEOa6ytOCsnwDScriRmAmcDrwHPD5WpnTDDP3Au8Cfpfk\n/xHwjuSnpDGu1n1me3seZW/Po7WKn1vju93AicAuYDLwi4xVK81/M3BPrcxp58x+l/w8grDy8J6M\nlZLUpmrdZzah+ywmdJ91aP/FpV/JcuiVwCLguuTn3RmrNpnQIwO4iOFXOw+Tds5sHGGYuRt4gDDc\nlBSBJs6ZXUvoufUC5yT7AFOAe0vyfR14iHDVczvw4eTz64AnCXNmc4Erap0sbc/sIGHc+irgfqCb\ncGUCgB4OHsrYRQdd6S9ASMqkN0mwaVPN+fDUmnjT7B5gXoXPdwLnl+xfUqX8pVlOlvXWjBcJEfUt\nlASzbi+KSqPk1CTBrFl/QG/v1xo+4r79xXnQ/HigD/g18ApCt3FpMyslafT098Vxu2maVkwm3PA2\nLklfJdzNKykC/X1xPJuZJpg9BcxudkUktUaRgpmkiPUdMJhJisDB/jjCQBytkFQ/h5mSorA3jjAQ\nRysk1a+v1RXIh8FMKjqDmaQoGMwkReFAqyuQD4OZVHT9ra5APgxmUtE5zJQUhb2trkA+DGZS0dkz\nkxQFg5mkKEQSzHxFrFR0BzKkbI4jLEXXC6ym8rqXtZayTFP+EIOZVHT9GVI2VxGC0amEF7pWWs18\ncCnL04E3JttnZyh/iMFMKrq+DCmbBYS3VJP8vLBKvvKlLH+VsTzgnJmk5t2aMYmwPCXJz0lV8o0D\nfgKcTFjFfHApy7TlAYOZpMYuAKwhrFpe7pqy/YEkVVJzKcsU5QGDmaRawWxzD2zpqVX63Brf7SYE\nul2EhZF+MUJNBpeyfDMhmGUq75yZVHS15shmdsO5S4ZSNiuBRcn2IuDuCnmOZ+gq5eBSlk9kKH+I\nwUwquubdmnEtITj1Auck+wBTCD2wwe0fEALYI8A9DC1lWa18RQ4zpaJr3lsz9gDzKny+Ezg/2X6S\n6ktZVitfUS7B7LMfjeQW4go6vhL34u3LflxzTnXMO3jRR1tdheZ5ywWMuyeH4/iguaQoRNIXMZhJ\nReebZiVFwTfNSoqCw0xJUTCYSYqCc2aSorCv1RXIh8FMKjqHmZKi4DBTUhS8NUNSFBxmSoqCwUxS\nFJwzkxQFb82QFAWHmZKi4DBTUhQiuTXDNQCkomveIsDHEZai6wVWM7RwSSWdwHrCGgCDlgA7ks/X\nA/NrncxgJhVd84LZVYRgdiphkZKrauT9GGHx39L3uA8AXwDOSNJ9tU5mMJOKrnmrMy0AViTbK4AL\nq+SbBpwH3Ax0lH1Xvl+VwUwqun0ZUjaTCAv5kvycVCXfF4FPEFY2L3c5sAFYTu1hqsFMKrzGhplr\ngKcqpAVl+QY4fAg56L2ElcrXM7wXdiMwEzgdeA74fK1mpL2a2QmsI0zGXZCyjKSxoNbwcX8PHOip\nVfrcGt/tBk4EdgGTCUGr3FmEwHcecBRwDHAbcGlZ/ps5/OLAMGl7ZpUm5yTFoL9G6uyGo5YMpWxW\nAouS7UXA3RXyfBKYTuiBvZ+wuvmlyXeTS/JdROjxVZUmmNWanJM01jXvaua1hJ5bL3BOsg8wBbi3\nSpnSDtN1hBXPNwBzgStqnSzNMHNwcu6YFHkljTXNe5xpDzCvwuc7gfMrfL42SYMurZCnqpF6ZrUm\n5yTFoHm3ZoyqkXpmtSbnhjy6ZGh7andIknLX8/wAPb9Mdl7cmM9BC/Kg+SeTBGHM+ndU6vrNWZJr\npSRV1n1CB90nJDtvOY3P/qC3pfVpJ1nvM/NqpqS2lOWtGeWTc5LUNnwFkFR4bT6zn5LBTCq8OK4A\nGMykwrNnJikKL7e6ArkwmEmFZ89MUhScM5MUBXtmkqJgz0xSFOyZSYqCVzMlRcFhpqQoOMyUFAV7\nZpKiEEfPzHUzpcJr2oomxxHW1ewFVlN9Ed9jgbuAnxNWgXtbxvKAwUxS8xYBuIoQjE4Fvp/sV/Jv\nwHeB1wFvJAS1LOUBg5kkXs6QMlkArEi2VwAXVsjzKuCdwC3Jfh/wYobyhxjMpMJrWs9sEmFVc5Kf\nkyrkmQk8D9wK/AS4CZiYofwhXgCQCq/WXNimJFW1BjixwufXlO0PUHkNkfHAbGAx8BjwJcJw8tMp\nyx92IEmFVqvHdVKSBt1TnuHcGoV3EwLdLmAyYQ3ecjuS9Fiy/03gygzlDxl7w8xne1pdg6YaYFur\nq9BUvZGv79Xz/FhsYNOuZq4EFiXbi4C7K+TZBWwnTPIDvBv4aYbyhxjM2s62VlegqWJf5fHQAr1j\nStPmzK4l9Nx6gXOSfYApwL0l+S4H7gA2EK5m/vMI5StymCkVXtOeANgDzKvw+U7g/JL9DcBbM5Sv\nKJdgNnt6HkdJZ+dGmDKK5+uYPXn0Tgbs3Hk0U6aM3jmP75g9aucCmLhzJ8dPmTJ6JzypY/TOBbBr\nJ5w0Su078aSR86QSx1sz8vhN9wBzcziOpOzWAt0NlM86yfcrwp35kiRJkiSpvS0kPKA66M8J96n0\nE+4sHuvK2/c5woO4G4BvEZ5tG8vK2/cPhLY9QXi4eBQv9zRFefsGfRw4iPNPuRl795kNdxHw+pL9\np5LPftia6uSuvH2rgT8E3kS4/+bqVlQqR+Xtu57QttMJN0l+phWVylF5+yAE6HOB/x396qhZuoCN\nwO2E9xh9A3gF4S7gnwBPAsuBI5L81xJ6XRsIPZS3Ay8ATwPrOfy5iwdofc+si+a1D8I/lNub2YAR\ndNHc9l3NCDdKNlkX+bZvZpLvG4SbQ7dizywaXYSu9tuT/eXAp4BngNcmn60APkb4pW8sKXtM8vNW\n4M8qHLtdglmz2gfhQbkP5FfdzLpoTvv+KTnGRkZ4IV+TdZF/+xYCX0y2DWY5aodh5nbg4WT7dsJj\nC08DW5LPVgB/THjH0V7CH9RFHH6n3yjfGZlJs9p3DbAf+Fr+Vc6kGe27BpgB/BdD//BbJc/2TQQ+\nyeFD53b+2x1T2iGYld601wH8msN/wYPb/cAcwut13wvcV+UY7aYZ7fsr4DzgL/OsaJ2a+fv7GpUf\ncxlNebbvZEJvbwOhVzYNeBx4Td6VLqJ2CGYzGHrn9weAdYRf+MnJZx8iPGXwSsKQYxXwt4RJYoCX\nGOrSl2uH//Xybt984BOE4cre5lU7tbzbd0rJ9kLCXFMr5dm+pwgvGJyZpB2EqZCar7bR2NBFuM3g\nqwxNsB5F6MoPTrDeDEwgvM/oEcL/ak8S/ogAziJMuj5OmEC+iDA0eJnwepFVo9KSyrrIv32bCVfB\n1ifpP0alJZV1kX/77iL8o3+C8G6rVvZausi/faWexjmzaHQR/nBj1YXtG8u6iLt9UWmHYWY7z3fl\nwfaNbbG3T5IkSZIkSZIkSZIkSRrJ/wM9br5pjU4ufgAAAABJRU5ErkJggg==\n", | |
"text": [ | |
"<matplotlib.figure.Figure at 0x3ce64e0>" | |
] | |
} | |
], | |
"prompt_number": 6 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Another thing the singular value decomposition tells us is what most defines the different categories of posts. The skiing posts have very different values from the hockey posts in the second row of $V$, i.e. $V_{2,1} \\approx V_{2, 4}$ and $V_{2,2} \\approx V_{2, 3}$ but $V_{2,1} \\neq V_{2, 2}$.\n", | |
"\n", | |
"Recall from above that:\n", | |
"\n", | |
"$\\vec{a}_i = \\vec{u}_1 * \\sigma_1 * V_{1,i} + \n", | |
" \\vec{u}_2 * \\sigma_2 * V_{2,i} + ...$\n", | |
" \n", | |
"Thus the posts differ very much in how much the values in $\\vec{u}_2$ contribute to their final word count. Here is $\\vec{u}_2$:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"pd.DataFrame(U[:,1], index=words)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>0</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>ice</th>\n", | |
" <td> 0.018526</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>snow</th>\n", | |
" <td>-0.678291</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>tahoe</th>\n", | |
" <td>-0.519801</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>goal</th>\n", | |
" <td> 0.370263</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>puck</th>\n", | |
" <td> 0.363717</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 7, | |
"text": [ | |
" 0\n", | |
"ice 0.018526\n", | |
"snow -0.678291\n", | |
"tahoe -0.519801\n", | |
"goal 0.370263\n", | |
"puck 0.363717" | |
] | |
} | |
], | |
"prompt_number": 7 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"From this we can conclude that, at least in this small data set, the words 'snow' and 'tahoe' identify a different class of posts from the words 'goal' and 'puck'." | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 1, | |
"metadata": {}, | |
"source": [ | |
"Identifying similar research papers using singular value decomposition" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Moving on from the simple example above, here is an application using singular value decomposition to find similar research papers.\n", | |
"\n", | |
"I've collect several different papers for analysis. Unfortunately due to the sorry state of open access for scientific papers I can't share the full article text that was used for analysis. <em>Cell</em>, for example, cautions that <b>\"you may not copy, display, distribute, modify, publish, reproduce, store, transmit, post, ...\"</b> Yikes. However I did chose articles such that you should be able to download the pdf's from the publisher for free.\n", | |
"\n", | |
"<h3>Here are the papers included in analysis (with shortened names in parentheses):</h3>\n", | |
"\n", | |
"<h4>Two papers on the molecular motor ClpX, describing very similar experiments:</h4>\n", | |
"<li><a href=\"http://www.cell.com/retrieve/pii/S0092867411004296\">ClpX(P) Generates Mechanical Force to Unfold and Translocate Its Protein Substrates</a> (clpx1)\n", | |
"<li><a href=\"http://www.cell.com/retrieve/pii/S0092867411003138\">Single-Molecule Protein Unfolding and Translocation by an ATP-Fueled Proteolytic Machine</a> (clpx2)\n", | |
"\n", | |
"<h4>Papers on a very different molecular motor, <a href=\"http://www.frankcleary.com/research\">dynein</a>:</h4>\n", | |
"<li><a href=\"http://www.cell.com/fulltext/S0092-8674(12)00928-2\">Lis1 Acts as a \u201cClutch\u201d between the ATPase and Microtubule-Binding Domains of the Dynein Motor</a> (dyn-lis1)\n", | |
"<li><a href=\"http://www.cell.com/abstract/S0092-8674(06)00862-2\">Single-Molecule Analysis of Dynein Processivity and Stepping Behavior</a> (dyn-steps1)\n", | |
"<li><a href=\"https://reck-peterson.med.harvard.edu/sites/reck-peterson.med.harvard.edu/files/publication_pdf/Qiu_2012.pdf\">Dynein achieves processive motion using both stochastic and coordinated stepping</a> (dyn-steps2)\n", | |
"<li><a href=\"http://www2.mrc-lmb.cam.ac.uk/groups/cartera/pdffiles/2012_Schmidt_NSMB.pdf\">Insights into dynein motor domain function from a 3.3-A crystal structure</a> (dyn-structure)\n", | |
"\n", | |
"<h4>A paper on T-cell signaling:</h4>\n", | |
"<li><a href=\"https://valelab.ucsf.edu/external/publications/2012jamesNature.pdf\">Biophysical mechanism of T-cell receptor triggering in a reconsistuted system</a> (tcell)" | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Reading in the data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"To start, we'll need to read in the word counts for each paper. I used python <a href=\"http://www.unixuser.org/~euske/python/pdfminer/\">PDFMiner</a> to convert the pdf documents to plain text. I also used a list of \"stop words\" (<a href=\"http://norm.al/2009/04/14/list-of-english-stop-words/\">link</a>), words such as \"the\", and \"and\", that appear in all English documents." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"with open('input/stopwords.txt') as f:\n", | |
" stopwords = f.read().strip().split(',')\n", | |
" stopwords = set(stopwords) # use a set for fast membership testing" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 8 | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"import collections\n", | |
"import os\n", | |
"import re\n", | |
"\n", | |
"def word_count(fname):\n", | |
" \"\"\"Return a collections.Counter instance counting\n", | |
" the words in file fname.\"\"\"\n", | |
" \n", | |
" with open(fname) as f:\n", | |
" file_content = f.read()\n", | |
" words = re.split(r'\\W+', file_content.lower())\n", | |
" words = [word for word in words \n", | |
" if len(word) > 3 and word not in stopwords]\n", | |
" word_count = collections.Counter(words)\n", | |
" return word_count\n", | |
" \n", | |
" \n", | |
"file_list = ['input/papers/' + f for f in os.listdir('input/papers/')\n", | |
" if f.endswith('.txt')]\n", | |
"word_df = pd.DataFrame()\n", | |
"for fname in file_list:\n", | |
" word_counter = word_count(fname)\n", | |
" file_df = pd.DataFrame.from_dict(word_counter,\n", | |
" orient='index')\n", | |
" file_df.columns = [fname.replace('input/papers/', '').replace('.txt', '')]\n", | |
" # normalize word count by the total number of words in the file:\n", | |
" file_df.ix[:, 0] = file_df.values.flatten() / float(file_df.values.sum())\n", | |
" word_df = word_df.join(file_df, how='outer', )\n", | |
"\n", | |
"word_df = word_df.fillna(0)\n", | |
"print \"Number of unique words: %s\" % len(word_df)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"Number of unique words: 5657\n" | |
] | |
} | |
], | |
"prompt_number": 9 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Here are the results, sorted by the most common words in the first paper:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"word_df.sort(columns=word_df.columns[0], ascending=False).head(10)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>clpx1</th>\n", | |
" <th>clpx2</th>\n", | |
" <th>dyn-lis1</th>\n", | |
" <th>dyn-steps1</th>\n", | |
" <th>dyn-steps2</th>\n", | |
" <th>dyn-structure</th>\n", | |
" <th>tcell</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>clpx</th>\n", | |
" <td> 0.027648</td>\n", | |
" <td> 0.006701</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000535</td>\n", | |
" <td> 0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>unfolding</th>\n", | |
" <td> 0.019516</td>\n", | |
" <td> 0.021117</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000268</td>\n", | |
" <td> 0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>force</th>\n", | |
" <td> 0.016060</td>\n", | |
" <td> 0.007919</td>\n", | |
" <td> 0.000666</td>\n", | |
" <td> 0.000170</td>\n", | |
" <td> 0.001911</td>\n", | |
" <td> 0.001071</td>\n", | |
" <td> 0.001265</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>figure</th>\n", | |
" <td> 0.012604</td>\n", | |
" <td> 0.009137</td>\n", | |
" <td> 0.011322</td>\n", | |
" <td> 0.011923</td>\n", | |
" <td> 0.001699</td>\n", | |
" <td> 0.002142</td>\n", | |
" <td> 0.001898</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>translocation</th>\n", | |
" <td> 0.011588</td>\n", | |
" <td> 0.014213</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.001265</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>clpxp</th>\n", | |
" <td> 0.011384</td>\n", | |
" <td> 0.021117</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>motor</th>\n", | |
" <td> 0.009555</td>\n", | |
" <td> 0.001218</td>\n", | |
" <td> 0.009491</td>\n", | |
" <td> 0.011923</td>\n", | |
" <td> 0.018896</td>\n", | |
" <td> 0.009103</td>\n", | |
" <td> 0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>substrate</th>\n", | |
" <td> 0.008538</td>\n", | |
" <td> 0.018071</td>\n", | |
" <td> 0.000167</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000212</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000316</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>velocity</th>\n", | |
" <td> 0.008335</td>\n", | |
" <td> 0.002640</td>\n", | |
" <td> 0.005495</td>\n", | |
" <td> 0.002044</td>\n", | |
" <td> 0.000637</td>\n", | |
" <td> 0.000803</td>\n", | |
" <td> 0.000000</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>extension</th>\n", | |
" <td> 0.007522</td>\n", | |
" <td> 0.001015</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.001071</td>\n", | |
" <td> 0.000000</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 10, | |
"text": [ | |
" clpx1 clpx2 dyn-lis1 dyn-steps1 dyn-steps2 \\\n", | |
"clpx 0.027648 0.006701 0.000000 0.000000 0.000000 \n", | |
"unfolding 0.019516 0.021117 0.000000 0.000000 0.000000 \n", | |
"force 0.016060 0.007919 0.000666 0.000170 0.001911 \n", | |
"figure 0.012604 0.009137 0.011322 0.011923 0.001699 \n", | |
"translocation 0.011588 0.014213 0.000000 0.000000 0.000000 \n", | |
"clpxp 0.011384 0.021117 0.000000 0.000000 0.000000 \n", | |
"motor 0.009555 0.001218 0.009491 0.011923 0.018896 \n", | |
"substrate 0.008538 0.018071 0.000167 0.000000 0.000212 \n", | |
"velocity 0.008335 0.002640 0.005495 0.002044 0.000637 \n", | |
"extension 0.007522 0.001015 0.000000 0.000000 0.000000 \n", | |
"\n", | |
" dyn-structure tcell \n", | |
"clpx 0.000535 0.000000 \n", | |
"unfolding 0.000268 0.000000 \n", | |
"force 0.001071 0.001265 \n", | |
"figure 0.002142 0.001898 \n", | |
"translocation 0.000000 0.001265 \n", | |
"clpxp 0.000000 0.000000 \n", | |
"motor 0.009103 0.000000 \n", | |
"substrate 0.000000 0.000316 \n", | |
"velocity 0.000803 0.000000 \n", | |
"extension 0.001071 0.000000 " | |
] | |
} | |
], | |
"prompt_number": 10 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now to calculate the singular value decomposition of this data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"U, sigma, V = np.linalg.svd(word_df)" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [], | |
"prompt_number": 11 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Here is a look at $V$, with the column names added:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"v_df = pd.DataFrame(V, columns=word_df.columns)\n", | |
"v_df.apply(lambda x: np.round(x, decimals=2))" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>clpx1</th>\n", | |
" <th>clpx2</th>\n", | |
" <th>dyn-lis1</th>\n", | |
" <th>dyn-steps1</th>\n", | |
" <th>dyn-steps2</th>\n", | |
" <th>dyn-structure</th>\n", | |
" <th>tcell</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>-0.19</td>\n", | |
" <td>-0.20</td>\n", | |
" <td>-0.55</td>\n", | |
" <td>-0.48</td>\n", | |
" <td>-0.53</td>\n", | |
" <td>-0.27</td>\n", | |
" <td>-0.15</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>-0.61</td>\n", | |
" <td>-0.59</td>\n", | |
" <td> 0.25</td>\n", | |
" <td> 0.13</td>\n", | |
" <td> 0.20</td>\n", | |
" <td>-0.03</td>\n", | |
" <td>-0.41</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td> 0.33</td>\n", | |
" <td> 0.28</td>\n", | |
" <td>-0.09</td>\n", | |
" <td> 0.08</td>\n", | |
" <td> 0.08</td>\n", | |
" <td>-0.05</td>\n", | |
" <td>-0.89</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>-0.09</td>\n", | |
" <td>-0.05</td>\n", | |
" <td>-0.77</td>\n", | |
" <td> 0.32</td>\n", | |
" <td> 0.53</td>\n", | |
" <td> 0.01</td>\n", | |
" <td> 0.10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td> 0.07</td>\n", | |
" <td> 0.04</td>\n", | |
" <td> 0.14</td>\n", | |
" <td> 0.14</td>\n", | |
" <td> 0.14</td>\n", | |
" <td>-0.96</td>\n", | |
" <td> 0.10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>-0.68</td>\n", | |
" <td> 0.73</td>\n", | |
" <td> 0.02</td>\n", | |
" <td> 0.03</td>\n", | |
" <td>-0.05</td>\n", | |
" <td>-0.02</td>\n", | |
" <td>-0.03</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td> 0.01</td>\n", | |
" <td> 0.07</td>\n", | |
" <td> 0.09</td>\n", | |
" <td>-0.79</td>\n", | |
" <td> 0.60</td>\n", | |
" <td>-0.01</td>\n", | |
" <td> 0.00</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 12, | |
"text": [ | |
" clpx1 clpx2 dyn-lis1 dyn-steps1 dyn-steps2 dyn-structure tcell\n", | |
"0 -0.19 -0.20 -0.55 -0.48 -0.53 -0.27 -0.15\n", | |
"1 -0.61 -0.59 0.25 0.13 0.20 -0.03 -0.41\n", | |
"2 0.33 0.28 -0.09 0.08 0.08 -0.05 -0.89\n", | |
"3 -0.09 -0.05 -0.77 0.32 0.53 0.01 0.10\n", | |
"4 0.07 0.04 0.14 0.14 0.14 -0.96 0.10\n", | |
"5 -0.68 0.73 0.02 0.03 -0.05 -0.02 -0.03\n", | |
"6 0.01 0.07 0.09 -0.79 0.60 -0.01 0.00" | |
] | |
} | |
], | |
"prompt_number": 12 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Here are the values of $V$ represented as an image:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"plt.imshow(V, interpolation='none')\n", | |
"ax = plt.gca()\n", | |
"plt.xticks(xrange(len(v_df.columns.values)))\n", | |
"plt.yticks(xrange(len(v_df.index.values)))\n", | |
"plt.title(\"$V$\")\n", | |
"ax.set_xticklabels(v_df.columns.values, rotation=90)\n", | |
"plt.colorbar();" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "display_data", | |
"png": "iVBORw0KGgoAAAANSUhEUgAAAS0AAAFDCAYAAABvHVjEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHA5JREFUeJzt3Xu8VGW9x/HPZgMCogIHMQUV8A4C3g3NHC+llXfLLloe\ns/JVWZ6yjmYXN9U5x9vpYJmVmqZ5vJRo2tEyU0clb4AXbgopkqghkjfMG3sz54/fGvYwzJ5Za9bl\nmWfW9/16rdfMmr1m/R6U/eN5nvVcQERERERERERERERERERERERERERERHJpDHAVsBz4dMXn2wEL\ngPOA4Q7KJSLSp92Ap6s+GwMc56AsIiINjQDeBfpVfPY1R2UREQnlFWBs8P5YYCt3RRGJpr/rAogT\nS4BxWI1rAPCs2+KIhKeklU9LgG2AXYEfOS6LiEhD5wMPY7UtEa+oppVPi4FngkNERERERERERERE\nREQkIRP2370E6NChw81RJIZB0eO9HCdeEjoSuEfpuNKvmv7y/K6b2Lnr6Ka/31PqbPq7AAu7bmRC\n1zFNf3/GWSfEis99XbBfV/Pf3zFeeG7qgqNjxF8aM/7dXXBA8/GnnP1grPDLuy7jPV2fa/r7+3B/\nrPizum5nz65DmvruJMbzpY6jId7vcemHES7+jr0kkTeapnFaIjk3wHUBIlLSEsk535KA8/KOKsRt\n38SzaWEnp/HZquA2/o6O4491G39oYTen8bcobOM0PqimFdkox0nDedLauuA2/k6O449zG9910hpd\n2NZpfGiBJBCRb+UVkYSppiUiXhnsugARKWmJ5JxvScC38opIwtQ8FBGv+Ja0+jW+hEOBJ4G/Amek\nWxwRyVr/CEcraJS0OoGLsMQ1Afgk4HiMgIgkaUCEow9hKjYF4FFgPjHnSzZKnnsBT9E7w+w64Ejg\niThBRaR1xKxBlSs2BwPPA7OAW1g3RwwDfgocAjwHjIwTsFF5RwPLKs6fA/aOE1BEWkvMIQ9hKjaf\nAmZg+QNgZZyAjZqHpTg3F5HWF7N5WKtiM7rqmu2wnc3vBmYDn45T3kY1reeBLSvOt6Q3W641v+um\nte9HFXZ0PjVHpF09X3yKF4pPA/A3hidyz5jNwzAVmwHAbsBBwBDgAeBBrA8sskblnY1lybHAC8DH\nsc74dcRZD0tEwhtd2HbtfMVJjOe2adfHvme9IQ8PB0cdYSo2y7Am4VvBcS8whZSSVjdwKnA71uH2\nS9QJL9JW6iWBfYKj7OL1LwlTsbkZ66zvBDbA+sWb3tk8TM3wD8EhIm0o5uDSvio2pwQ//wU2HOKP\nwFxgDXApsLDZgK0yXkxEHElgRHytis0vqs4vCI7YlLREcm5wlCzQnVoxQlPSEsm5/kpaIuKTAfE2\ntMqckpZIzkWqabUAz4orIkkb4FkW8Ky4IpI4NQ9FxCueZQHPiisiiRvkugDRKGmJ5J2ahyLiFc+y\ngGfFFZHEeZYFEinub447MYnbNKVjqNt1Ctd8IszeIOl5/UC3e6lsPH+10/jXcqTT+MfPuanxRSk5\nfJOEbqTmoYh4xbMs4FlxRSRxnmUBz4orIonbwHUBolHSEsk7z7KAZ8UVkcR5lgU8K66IJM6zp4du\nn9eLiHv9Ixy1HYqtA/9X4Iw6kfbElhE8Jm5xRSTP4mWBTmynnYOx7cRmAbew/q5dncC52AYXHXEC\nKmmJ5F28p4d7AU8BS4Pz64AjWT9pfQW4AattxaLmoUjexWsejsY2Yy17Lvis+pojgZ8F57GmsYSp\naV0OfARYAUyKE0xEWlCdjvji36G4vO63wySg6cCZwbUdZNA8vAL4CXBVnEAi0qLqZIHClnaUTXt8\nvUueByquYEustlVpd6zZCDAS+BCwGuv7iixM0roP2/JaRNpRvJ7t2cB2WI54Afg48Mmqa8ZXvL8C\n+D1NJixQR7yIxMsC3cCpwO1YQ/OXWCf8KcHPq3eajk1JSyTv4g8u/UNwVOorWZ0UN1gySWtBV+/7\nTQswqpDIbUWkyuwizCkCsCiptd1zuUb8xK5EbiMiDexRsAPYYRNY/JNp8e/ZhtN4rgXuB7bHxmPE\nrt6JSAuJP40nU2GKUf0kQETaSYsko7A8K66IJM6zLOBZcUUkcZ71aSlpieSdZ1nAs+KKSOK0RryI\neMWzLOBZcUUkcZ5lAc+KKyKJU0e8iHjFsyzgWXFFJHGeZQHPiisiifMsC3hWXBFJnIY8iIhXPMsC\niRR3zS4ON/XZyl1ogAs++CWn8Y/iZqfxV+ziNDxvlQa7LcCeM9zFPnzzZO7j2dNDbSEmknfp7zB9\nPPA4MBf4CzA5bnFFJM/S32F6CfB+4DUswV0CvLfZgEpaInkXr3kYZofpByrePwSMiRNQSUsk7+Jl\ngVo7TO9d5/qTgdviBFTSEsm7eBtbRNni/gDgs8C+cQIqaYnkXZ3mYXGWHXWE2WEarPP9UqxP65Wo\nRaykpCWSd3WyQGGqHWXTfrbeJWF2mN4KuBE4Aev/ikVJSyTv0t9h+nvAcKCc8lZjHfhNUdISybv0\nd5j+XHAkQklLJO88ywJhRsRvCdwNLADmA19NtUQikq023Kx1NfA14DFgKDAHuIN1B4+JiKdKbbjK\nw/LgAHgDS1ZboKQl0hZ6WqQGFVbU4o4FdsWG4otIG2jnpDUUuAE4DatxiUgb6O6MstjLmtTKEVbY\npDUAmAFcDfyu+oddd/aO5C+Mg8L4jkQKJyJVSvOxZ2Kw6MmNErllT/8odZd3E4kZR5jSdmADxhYC\n02td0HWQkpRIJjp2BnYGYIcdN2fx4stj37Kn069VAMMkrX2x4fdzgUeDz74F/DGtQolIdt5hYISr\n30qtHGGFSVoz0QqnIm2rp1UGYIXkV2lFJHE9ni0Sr6QlknNKWiLiFSUtEfFKt5KWiPhEHfEi4pV3\nIw15cE9JSyTnfGseavyVSM710D/00YdGO0wD/Dj4+ePYogtNU01LJOdiPj0Ms8P0h4FtsQ0w9sbW\nitcO0yLSnJhJK8wO00cAVwbvHwKGAZsBLzYTUElLJOdiJq0wO0zXumYMSloi0oyYHfFhd5iuXgom\nys7U60gkaZ17lru9LkaVVjiLDXBm54VO43/zRz91Gp8L3IZfumxZ44tStKznlMYXpWQQh7BpAo/S\n3qXvReLnFl9hXrHuhtBhdpiuvmZM8FlTVNMSybl6zcOJhZFMLIxce37ttGeqLwmzw/Qt2Iau12Ed\n8K/SZNMQlLREci9m8zDMDtO3YU8QnwL+CZwUJ6CSlkjOJTCNp9EO02CJLRFKWiI5p1UeRMQrSloi\n4hXf5h4qaYnkXL0hD61ISUsk59Q8FBGvKGmJiFfasU9rEHAPsAEwELgZ26xVRNpAOy63/DZwAPBm\ncP1M4H3Bq4h4rl2bh28GrwOxofovp1McEclauyatfsAjwDbYqoMLUyuRiGTqHc+GPIRd2GINsAu2\npMT7gUJaBRKRbPXQGfpoBVF74F4DbgX2AIrlD//c9eDaC8YXxjC+MCaJsolIlQeKq3mguBqA/sxN\n5J6tkozCCpO0RmLLT7wKDAY+AEyrvODgrqbXqBeRCKYWBjC1MACAQUzm/O/Pj33Pdkxam2OL0vcL\njl8Dd6ZZKBHJTjuO05oH7JZ2QUTEjXYcpyUibexdBrouQiRKWiI551vzMIG9PETEZ5Xb3jc6IhoB\n3AEsBv6EbdJabUvgbmABMB9ouLWXkpZIzqU4TutMLGltjz28O7PGNauBrwETsZ16vgzsVO+mSloi\nOZdi0joCG3lA8HpUjWuWA48F79/AdvLZot5N1aclknMpjtPajN79DV8MzusZC+wKPFTvIiUtkZyL\n2RF/B/CeGp9/u+q8FBx9GQrcAJyG1bj6pKQlknP11oh/qbiQlcW66yN8oM7PXsQS2nJskPqKPq4b\nAMwArgZ+Vy8YKGmJ5F695uGIwiRGFCatPV80bUaUW98CnAicG7zWSkgd2K7UC4HpYW6qjniRnOum\nM/QR0TlYTWwxcGBwDtbRfmvwfl/gBGyh0UeD49B6N1VNSyTnUpzG8zJwcI3PXwA+EryfScTKk5KW\nSM614yoPDR3VuO8sNas6NnIWG+CN1wY4jb+6x2l4Sp9xG//QzgOdxn+2x90ivh0J3SeXSUtE/OXb\n3EMlLZGcqzfkoRUpaYnknJqHIuIVJS0R8Yr6tETEK1puWUS8ouahiHhFSUtEvPKOhjyIiE/atabV\nCcwGngMOT684IpK1dk1ap2Hr3bid6CciietZ41fSCrMkxBjgw8BlJDdHU0RaRHd3Z+ijFYSpaf0P\n8E1g45TLIiIO9HT71bXdqLSHYes6PwoUUi+NiGSup0VqUGE1Slr7YHuXfRgYhNW2rgLWWUXpoq5X\n177fqzCIvQqDki2liABwf3E1DxRXA9CfuYnc8523BiZynxpGANcDWwNLgeOAV/u4NvTDvih9VPsD\n36hxw9ITpa0j3CZZqxw/G9j5jQVO4/fvqbcrU/pKa5yGZ/DI7zmN/2zPRc5iD+JQRnZcA/H6mks8\n/3b4q0cPihLvPGBl8HoGMJzau0wDfB3YHXvYd0S9m0bd2MLtb4iIJK+7M/wRTZgdpiHiw74oPXD3\nBIeItJP0+rTC7jAd6WGfX48NRCR53bFGMsXdYTrywz4lLZG8667zs4eLMKtY79txd5gO9bCvkpKW\nSN7V64efXLCj7OJpUe4cZofps4IDeh/21d3jSTtMi+Td6ghHNGF2mK7W8GGfaloieZfe3plhdpiu\nFOphn5KWSN7V69NqQUpaInmnpCUiXlHSEhGvKGmJiFciTD1sBUpaInkXfSiDU4kkrQkTliRxm6ac\nvcjtWkCreqY6jd+Z4vPqMJ5mG6fxr+1+xGn8e9nPWezRTEjmRm7/CkWmmpZI3qlPS0S8oqQlIl5R\n0hIRryhpiYhXNORBRLySxyEPIuIxDXkQEa+oT0tEvKKkJSJeUdISEa+0aUf8UuB1rMtuNbBXWgUS\nkYy9k9qdRwDXA1tjOeQ44NUa1w3DNmqdiK0R/1ngwb5uGnZjixK2J9muKGGJtJfuCEc0Z2L7Im4P\n3Bmc13IhcBuwEzAZeKLeTaPsxhNrR0cRaVHp7cZzBHBl8P5K4Kga12wC7AdcHpx3A6/Vu2mUmtaf\ngdnA50N+R0R80BPhiGYzbMNWgtfNalwzDngJuAJ4BLgUGFLvpmH7tPYF/g5silX3ngTuK/+w9FLF\nBo5D9qdjw0LI24pIFAuLK1lYXAnAxryezE3jPT28A9tFutq3q85L1N7TsD+wG3AqMAuYjjUjv9dX\nwLBJ6+/B60vATVi/1tqk1bHp2SFvIyJxTCiMZEJhJACj2ZNfTbs7/k3rJa0Xi7CiWO/bH6j3bSyh\nLQc2B1bUuOa54JgVnN9A331fQLikNQToBFYBGwIfBCLtjS0iLaxeX9WIgh1l8yP96t8CnAicG7z+\nrsY1y4FlWGf9Ymxz1wX1bhomaW2G1a7K1/8v8KdQRRaR1pfekIdzgN8AJ9M75AFgC6zvqrzL9Few\nvDIQeBo4qd5NwyStZ4BdIhdXRPyQ3oj4l7GaU7UX6E1YAI8De4a9qUbEi+Rdm46IF5F2paVpRMQr\nmjAtIl5R0hIRr2iNeBHximpaIuIVJS0R8YqGPIiIVzTkQUS8ouahiHglj0nr1gUHJXGbpoxja2ex\nAQaUljqNv33nQqfxH+0Y4TT+Uz2HOo0/hDedxR7MW8ncSEMeRMQreaxpiYjHlLRExCsa8iAiXtGQ\nBxHxSq3tJlpYlH0PRUSiGIHt1rMYW6J9WB/XfQtbF34ecA2wQb2bKmmJSFrC7DA9FttLdTdgEraJ\nzifq3VRJSyT3UttiOswO068HNx6CdVcNAZ6vd1P1aYnkXmpjHsLsMP0y8N/As8BbwO3YbvZ9UtIS\nyb1YYx7i7jC9DfBvWDPxNeC3wPHYlmI1hUlaw4DLgIlB0M8CD4b4noh4oV5Naybwl3pfjrvD9B7A\n/cA/gvMbgX2ImbQuBG4DPhpcv2GI74iIN+rVtPYOjrLzotw4zA7TTwLfBQZjsyAPBh6ud9NGHfGb\nAPsBlwfn3VgVTkTaRmod8edgNbHFwIHBOdgO07cG7x8HrgJmA3ODzy6pd9NGNa1xwEvAFcAUYA5w\nGjic2i4iCUtotYj1hd1h+jwiVOEa1bT6Y+MnLg5e/0ntsRYi4q3uCId7jWpazwXHrOD8Bmokrau7\nnln7fnJhGJMLw5Mqn4hUmFd8hXnFVwDYMLGZzn7NmG6UtJYDy7ARrYuxqt6C6otO6BqXfMlEZD2T\nCsOZFFQKNmMql027L4G7tkYNKqwwTw+/gj1+HAg8DZyUaolEJGPtVdMC693fM+2CiIgr7VfTEpG2\n1n41LRFpa6kNeUiFkpZI7ql5KCJeUfNQRLyipCUiXlHzUES8opqWiHjFr6eHzteInxvMo3Ll4eLb\nTuM/WHzHafxSaabT+LNKbvevml/8R+OLUjTP8d9/49eE6RZIWq86je86aT1UfNdp/BJuk9Zsx3vu\nLSi+7DR+aySt1NbTSoWahyK51xo1qLCUtERyrzVqUGF1JHCPIrB/AvcRkejuAQoxvh+1gf4KtnO0\niIiIiIiIiIiIiESQ1RLaOwEHAUOrPj80o/iVhjiIKW1kXgYxtgKuw/b5PgsYUPGzWjvfZqnu5pQZ\nWJZBjK8Ci7D/1n8Djqr42aMZxC/bB1hI7595F2yLvLQdCxwTvFYfx2QQv21kOU7r2BqflbBhF5tn\nEP9ybAu0h4CTsUfFRwArga0ziN/XY+IO1t24Mi31/mEYlUH8LwC7A28AY7H/F2OB6RnErjQdq9nd\nHJw/RjZDdg6n/vCCGzMoQ1vIMmldB1wDrKn6vAMYlEH8TYGfB+9PBU4A7sX+MmVhJVbDqGXTDOKP\nwn5Za80buT+D+B1YwgJYio0tmoH9g5HEeMEonq06z2JI+L9mECMXskxa84ALqP0v/kEZxO+PJcfy\nZMOrsX0dbwc2zCD+EuzPWStxZdE8uxXrS6rVFLsng/grsKbYY8H5G8BhwC+ByRnEL3sW2Dd4PxBr\ntj6RQdzT6W1ZlJXPS8CPMihDW+jMMNYT2F/c12r8bCbwQsrxB2F/3qUVny0B7gOmAL9OOf4a4CUs\nUVbrwZqtabqZ9WsYZTNSjg1wF/A6vbUtsP8mNwF30nfZ0ijH2VgCPR14E9vbM+31WQ7BkmTlsUHF\naxb/cEiCBjqOv4Hj+Fnalt7m+AFYTWNYTuL3xzYeFonkHmBcxflewNwcxT8O2Dh4/12sprFbhvEf\nx355twUWA+cDt+Uo/kzc/iO1A1azXBCcTwa+4644EsYhwJPAl4H/xPpYsvyldR2/3Kf3Pmyy+WHA\nwxnGL/dp/TvWLKr8LA/xfw3Mwv7BOD04vp5h/HuBven9M3fQm8AkBBdL09wOfBG4A+vj2ZXa/Tzt\nGr8neD0MuBT4P+AHGcZ/F/gU8Bl6n5wO6Pvytov/dHD0wx5MlDvCszKEdfsvS/i2NkwOfReYD0wF\nTsEGHB6Wo/i3YoNJn8H6cgZhTaasTAR+AnwyOB8PnJGj+K79AWsal2taHw0+kxY2HRhccb41VuvJ\nS/wNsYG22wXnmwMfzDA+WJ/OFGASbh6CuIx/d43jrgzjb4P1ab2JPTH/CzbIVkLKelBfpY2xqvGq\nnMTfGHvk39fI+KwWK/8INsh2SXA+HqtxZtUZ7jr+HhXvB2H/gHQD38woftlQrIn6esZxpQl7Yp3R\nfwuOx1n3L1K7xr81eF2KNQ0rjyV9fCcNi7DmSdk2wWd5iV/LrAxj/RfrDvEYDvwww/jShHnAfhXn\n7yPbIQeu47tW/QvaUeOzdo4/ouIYiU1tyjJpPlbjsyyfnnrPxdPDbmwUetlMst0OxFX8RsMqHsmg\nDABzsKbYb4LzjwGz6V1pIO2Ju67jP0Lv08JurOZ7csoxK/Vj3elkg3E/uNorLvq0yh3h1wbnH8f+\nB5an0aT9y+sqfpH6j9YPSClutV8Fr+WyVD/yT3ttLdfxKxNGvc/Scga2usjl2J/9JOAW4NyM4nvP\nRdIqsu5f0uq/tGn/8rqOL249wvq13lqfpeU87Inlwdjfuz8DB2KDbUVanovF/1xPI3EVf3NsPa8n\nsQS1e/BaCD7LSq3+qywWwWwbWda0Tq94X6umk/bSHK7j1/IoNiI/S/dij/d/HsTuwAbbTmzz+Cdi\na1rtgfWhla3Cmqxp96V9EfgS9rT06YrPN8LGah2fcvy2kWVH/EZkO12i1eLXssJBTNfTSFzFvzI4\njiWbpXiqXYONfD8H69cqVxhWAf9wUB6J4CpsbErZCOCKHMV3zfU0EtfxNU5KIqs1TqXWZ+0afwds\novQd5HMaiev4GiflORfjtDqw2k152soIsl1B1XX83wI/Ay6jd8WHLJuta+jdxqs8jWRc3W+0V3yN\nk5LIPoONQP4BVi1fFHyWl/hzMoxVS61aRZZlch3/DKx2dzLwueB9nlaZ8J6LmtZV2F/SA7EaxtHY\nPnR5if97bAHCG4F3Kj5Pe8L0TsAEYBNs9Hn5qenGZLMbkuv4Zedi07bK46S+j62xJp5wucpDXi1l\n/eZgCVvtIE1HYgn6cGwEdtkqbHu3tLcRcx1fRDw1Nefx38AS5SqsprsGLQ/jlX6uC5BDc7Dm4fBG\nF6bkGKxJNgB7ircS+HSO4g/FxuxthHXCHwNcnGF8Ee9sh22o8RRwPbbRRpbN9PLSzkdjG6VuQrZL\n87iOX0uWQ14kJhcd8Xn3V+AsbL7dYdhs/zXB64Wk3yFf/n9+GHADtnlulkMuXMc/tuJ9P2wOYtob\ntUqClLTcmIItSfIhbErJNdhihHdhOx+n6ffYBOG3sflwo8huWZZWiF+5iUl5Pa0jM4wv4p05WHL6\nFOtvGnpTRmX4F3oH1G4IvCejuK7jd5LtHoeSAg15yE55lYlOao+Ed7HKxCXAFxzEdRl/FrZPgHhK\nzcPslFeZ2AH7pbkF+0cj6x2mK7n+5XURfyZwEfYQ5J8Vn2e13LWId+7DEljZRqy7Zn2WXI8EdxG/\nSO29D0WkD4tYd9rKINxvoZUntWYepD0bQRKk5mH2rsKagzdizcOjsMXpsrID8A1sOZjy//8SNhcz\nD/FvYP314H+LDX0QDyhpZe8/gD9iey+WsCWAs1zPyfXSOK7ilydsD8PthG2JSUnLjTm4W6JmNZY0\nXHEVf3tssvYmwWvZKuDzDsojTdKQh/zpAl4i+6VxWiX+VOCBjGKJSAKWAs9UHUtyFP983E7YFhGJ\npBUnbEsEWpomf1wvjeM6vusJ2xKTklb+fAIYjU1ncbE0juv45Qnbu2PNw6wnbItIk/oBRwDPA8uA\nadjORHmI73rCuIhENAWYjo3E/zHwXmzAZ1aL4bmOX3ZJxvFEpAmul8ZxHb+SNmn1kMZp5YfrpXFc\nx6/ldqxPTTyiEfH54XppHNfxa1HCEvGA66VxXMffAbgUuIPeZWnuyjC+xKSaVv6Mwub/la0OPstL\nfNcTxiUmJa38cb00juv4rieMS0zqiM+n3eldGudesn+K5jJ+F24nbEtMSlqSN0tZvzlYQquXiohI\nGjobXyLSVuZgy9I8heYcekkTpiVvXE/YFhFpiusJ49Ik1bQkj6Zg04bOB2YAH8PWitcgUxFpOa00\nYVuaoLa85EUrTtiWJmhEvORFK07YFhFpyPWEbYlJHfGSN64nbEtMah5K3riesC0xqSNe8sj1hHER\nEREREREREREREREREYnt/wGyBtFvcDfoPQAAAABJRU5ErkJggg==\n", | |
"text": [ | |
"<matplotlib.figure.Figure at 0xba560b8>" | |
] | |
} | |
], | |
"prompt_number": 13 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Note how in the above image, in the first three rows the similarities between the clpx papers is apparent, as well as between the first three dyn papers. The last dyn paper is somewhat different, but this is to be expected since it is a structure paper and the other three dyn papers involve more microscopy. In terms of comparing the papers, singular value decomposition allowed us to reduce the 5657 different words found in the papers into 6 values that are pre-sorted in order of importance!" | |
] | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 2, | |
"metadata": {}, | |
"source": [ | |
"Quantifying similarity" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we'll look in more detail at how similar each paper is to the others. I've defined a function to calculate the distance between two column vectors of $V$, weighted by the weights in $\\Sigma$. For $\\vec{v}_i$ and $\\vec{v}_j$ the function calculates $\\|\\Sigma * (\\vec{v}_i - \\vec{v}_j)\\|$. This function is applied to every pairwise combination of $\\vec{v}_i$ and $\\vec{v}_j$, giving a metric of how similar two papers are (smaller values are more similar)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"def dist(col1, col2, sigma=sigma):\n", | |
" \"\"\"Return the norm of (col1 - col2), where the differences in \n", | |
" each dimension are wighted by the values in sigma.\"\"\"\n", | |
" return np.linalg.norm(np.array(col1 - col2) * sigma)\n", | |
"\n", | |
"dist_df = pd.DataFrame(index=v_df.columns, columns=v_df.columns)\n", | |
"for cname in v_df.columns:\n", | |
" dist_df[cname] = v_df.apply(lambda x: dist(v_df[cname].values, x.values))\n", | |
"plt.imshow(dist_df.values, interpolation='none')\n", | |
"ax = plt.gca()\n", | |
"plt.xticks(xrange(len(dist_df.columns.values)))\n", | |
"plt.yticks(xrange(len(dist_df.index.values)))\n", | |
"ax.set_xticklabels(dist_df.columns.values, rotation=90)\n", | |
"ax.set_yticklabels(dist_df.index.values)\n", | |
"plt.title(\"Similarity between papers\\nLower value = more similar\")\n", | |
"plt.colorbar()\n", | |
"dist_df" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"html": [ | |
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>clpx1</th>\n", | |
" <th>clpx2</th>\n", | |
" <th>dyn-lis1</th>\n", | |
" <th>dyn-steps1</th>\n", | |
" <th>dyn-steps2</th>\n", | |
" <th>dyn-structure</th>\n", | |
" <th>tcell</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>clpx1</th>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.044530</td>\n", | |
" <td> 0.091754</td>\n", | |
" <td> 0.077374</td>\n", | |
" <td> 0.086122</td>\n", | |
" <td> 0.074950</td>\n", | |
" <td> 0.082144</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>clpx2</th>\n", | |
" <td> 0.044530</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.090552</td>\n", | |
" <td> 0.075129</td>\n", | |
" <td> 0.083906</td>\n", | |
" <td> 0.072627</td>\n", | |
" <td> 0.079379</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>dyn-lis1</th>\n", | |
" <td> 0.091754</td>\n", | |
" <td> 0.090552</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.065258</td>\n", | |
" <td> 0.071804</td>\n", | |
" <td> 0.079625</td>\n", | |
" <td> 0.096965</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>dyn-steps1</th>\n", | |
" <td> 0.077374</td>\n", | |
" <td> 0.075129</td>\n", | |
" <td> 0.065258</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.042777</td>\n", | |
" <td> 0.068084</td>\n", | |
" <td> 0.086867</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>dyn-steps2</th>\n", | |
" <td> 0.086122</td>\n", | |
" <td> 0.083906</td>\n", | |
" <td> 0.071804</td>\n", | |
" <td> 0.042777</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.073860</td>\n", | |
" <td> 0.093479</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>dyn-structure</th>\n", | |
" <td> 0.074950</td>\n", | |
" <td> 0.072627</td>\n", | |
" <td> 0.079625</td>\n", | |
" <td> 0.068084</td>\n", | |
" <td> 0.073860</td>\n", | |
" <td> 0.000000</td>\n", | |
" <td> 0.081524</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>tcell</th>\n", | |
" <td> 0.082144</td>\n", | |
" <td> 0.079379</td>\n", | |
" <td> 0.096965</td>\n", | |
" <td> 0.086867</td>\n", | |
" <td> 0.093479</td>\n", | |
" <td> 0.081524</td>\n", | |
" <td> 0.000000</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"metadata": {}, | |
"output_type": "pyout", | |
"prompt_number": 14, | |
"text": [ | |
" clpx1 clpx2 dyn-lis1 dyn-steps1 dyn-steps2 \\\n", | |
"clpx1 0.000000 0.044530 0.091754 0.077374 0.086122 \n", | |
"clpx2 0.044530 0.000000 0.090552 0.075129 0.083906 \n", | |
"dyn-lis1 0.091754 0.090552 0.000000 0.065258 0.071804 \n", | |
"dyn-steps1 0.077374 0.075129 0.065258 0.000000 0.042777 \n", | |
"dyn-steps2 0.086122 0.083906 0.071804 0.042777 0.000000 \n", | |
"dyn-structure 0.074950 0.072627 0.079625 0.068084 0.073860 \n", | |
"tcell 0.082144 0.079379 0.096965 0.086867 0.093479 \n", | |
"\n", | |
" dyn-structure tcell \n", | |
"clpx1 0.074950 0.082144 \n", | |
"clpx2 0.072627 0.079379 \n", | |
"dyn-lis1 0.079625 0.096965 \n", | |
"dyn-steps1 0.068084 0.086867 \n", | |
"dyn-steps2 0.073860 0.093479 \n", | |
"dyn-structure 0.000000 0.081524 \n", | |
"tcell 0.081524 0.000000 " | |
] | |
}, | |
{ | |
"metadata": {}, | |
"output_type": "display_data", | |
"png": "iVBORw0KGgoAAAANSUhEUgAAAWkAAAFSCAYAAAAwxi4gAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm8XdP9//HXzU1QYmxLSUIQQ6mqIugXvUWLNKWmqrZf\nNRTVb8rXVF/V1k0HOn1b1A9pzYqoqY0hDS03iWqR0ZwKriG+NZQQQ5Cb8/vjvbaz77lnumfY+wzv\n5+OxH/fsca1zyOess/banwVmZmZmZmZmZmZmZmZmZmZmZmZmZpbjq8C0Cs/dBXgstt4L7F5FXZYA\no6s4P66X6upiZpaYnYF7gMXAv4G7ge3qUM5TwG41utZlwI+qOL+auvRWca5Z4oamXQGrymrALcAx\nwB+AFVEL+J00K5VjKLAs7UrEZICOtCvRBBrtv5tZU9oOeLXI/sOAmbH15cCxwOPA68APgY2Bv6OW\n+GRgWDi2C3g2dm689To2nPMq8Dzwm9h5UTnfCuU8Edu2MXA08C76IlkCTAFOBq7Pqfu5wNkF3tdT\nwP8ADwOvAJegL6jIeGBeqN/fgK3C9iuBPuCtUPYpqFV/Ytg/IlZ3Qn3/XcZ1AdYDbgBeBJ4Evh3b\n142+RC9Hn/tDwLYF3huhDt9Gn91LwM/JfrFsDNwJvBz2/R5YPXZuL5V9NtG53wEeAN4GOoFTgedC\nvR/Dv0LMBmVV9I/1MmAvYM2c/YcxMEjfBAwHtkCB8k7UV7wa+od9aDi2i8JB+pMoUA8BNgAeAY7P\nKWcasAbZALEc2Ci8vhR9QUQ+ArxBNtgMBV4Atsn/tulFgWREeM93k+0+2Sacuz0KbIeGukdfIrld\nJYejLwqArwAL0ZcVwBHo8yp13SHAbOB7oe4bogD7uXBuNwp6e4Vzz0RfcoUsB/6KPr9RwALgyLBv\nY9QfPwz4EDAd+HWNPpteYE44d0VgM+AZ9N8HYH2y/w3NrEybo6D3LPAe8Cdg7bDvMAYG6Z1i67NQ\nazLyS7L/4LsoHKRz/TdwY045XTnH5Abp3D7pqcA3wuvxqLVZyFOoRR7ZGwVXgAvo/wUAagHuEjs3\n/j42Ri3OjnDu0WTf9+XovRW77q7ADsDTOftOQ61YUJC+PbZvC9SaL2Q52QAP+vXzlwLHfhEF1ki1\nn81hsX1jUFCPvhQsBUPSroBV7THUGhwFfAz97C7UTQD6Rxd5O2d9KWpll7Ip6gv/P+A14CfAB3OO\neTb3pBIuB74WXn8NdU0UE7/+M+h9g1r2J6Gf89EyMrY/1xPAm8AnULC6BXXhbIoC8PQS11037Fsv\nZ99pZL8sof/n/BawEsX//RV6f+uglv5z6LO/kuKf/WA/m/i5C9GXVHeo/zXo/VqCHKRbywIU7D5W\n4fmZMo+7AHVxjEFdFKcz8P+lYtfKt+9PwMdR3T8PXFWiDuvnvF4UXj+DvjTWjC3DgWuLlD0dOAi1\nFp8P64eFc+eVcd1nUCs0vm819IugUJmlFHp/Z6J+9Y+hz/4/GfjZV/rZ5KvrNejLa4Ow72eDfytW\nDQfp5rYZuuk1IqyPAg6heH9nro4Cr4sZjm68vYW6W44dRHmgVllu3+bb6Mbb1cC9qKVYSAfwX+h9\nr4W+JKJA8zvgm6jPvANYBQX96BfCC6iLI246MAGYEdZ7wvpMskGr2HXvQ5/Hd4APoBtuHyM7FLKS\n0SQnk+2TPi72/oajlv/r4f2fknNeB7rxWclnk2tT1DW0Irp/sRR9QViCHKSb2xLUH3ovuvH2d3TT\n6KSwP0P/llG+Fl3u/lLHgwLIV1Cg+C36+T2Yci5G/bKv0r8vO/oVUKqrI4Na2rej7orHgR+HfbOB\no4DzUF/z42RvhgKchW7wvUp2VMcMFKiiIP03FGxnxM4rdt3lqNX8CTSy4yX0uawWq2/uZ1Kqdf2n\nUOZc1AUT9W9PRDduXwNuRl9suZ/91QzusylUlxXR5/US6tr6EOrGMbM2NQq1EsvpF29l8Zusg1XL\nh46sAbglbY1iCPoFcA36VWBm+IlDawyroL7ip9BY4nZXyY1GMzMzMzMzM2so3ZQe0WEDVZP+9SH0\nsA74829ovnHYXHppzTzK7oOtzKro/4lKfIzsEEN//g3MQbq55Btvm4Za33Bup9ShnWlXII9qPn/H\nkDrzB9waVkT5OhaF5dfACmHfdGD/8Po/0BjccWF9d/SwROQI9Lj3K8Cf6f94cTz96II8dZiKngKM\nm48SAAGcgx5Lfg0ldtq5wHvpYmDej16yvyA6UCrOhSgD4LUMzP5XrcPQAy2/Qg+9LAQ+hXKkPING\nosQfkFkduAKlKe1FT/l15LnWy8AZ6L/NL1FSpn+hx+xXKlCXMei/4WL0UMnk2L74eOrLgPOB21A3\nyEyUve6c8B4eRQ/bRHopPJ76OvTwyuJQ9haxfZeF+t6Ghkp2FbiG1YiDdGs4HT3qu3VYxqKn6kCP\nOHeF159GT8TtGlvvCa/3RU+T7YeeLJuJxizH7YvSXG7BQFejR9IjW6Agf2tYvy/Ubc1w7HVkv0hK\nif+COA7YJ7yHdVEA+n8Fzluf/smEcpcvFylzLPqSWQt9Dn9AT/ptjBJAnQesHI79Dep62BB9poei\ngB6/1hMo4dKZKP/FGPR5jEGPcP+gQD1+hL4w1wjHnVukzgeh/xc+hHJ2/wO4P7yH69EXRaTYL7Jb\nQ70+jDLs5eZROSTUazj6AjKzoNDTZAvpP774c+FYUAt0fng9FeUljnJ7TCfb0p2KWtKRIejpv1Fh\nPV/60bhVUcsqOv4nwEVFjn+FbML5brI3rroY2JKOv+9H6P8ZrIsCUi0bHIcB/4ytb4Xe/4dj215G\nCaE6UV6LzWP7jgbuil0rnsa0A31O8ScKd0JfnvlcDkwim58lLjf966TYvgkoP3j8PcQniIh/pt0U\nvnG4Rihn1bB+WVgsIW5Jt4b16B8I4ukp/44S5ayNfu5egQLpB1GrOLp5tAHZn8avkp2RJB4ciqUf\nXYJaYFFr+sv0b4GdjALs4nD91VGLb7BGo0T8UT0fQdM8rVPBtYrJTekK6m6IbxuO3sMwBn7+hT63\nD6MW+Gyy72EqhT+L76DAfh8akXF4geNA3S2RpTnrUX1L6QR+ir74XyP7ZR/VL8Pg09BaFRykW8Pz\n9B+KtX7YBspUNxvlBX4QTQxwD3oEeyFq0YICy9H0T2O5CvrJHCl10/IaFKR3Qn2sUWtyF5St7SDU\nMlsTBYB8N6zeJNuNAAoa8RbsM2RnoYmWlVEfaq710ZdHoeWQPOcM1svoMx2dU248i18m5/i3UXdQ\nVP81yCZjyvUC+u8yAs1leT71nR3lK6g7aXf0Rbph2N5ON3cbioN081kBBcBoGYqC4/dQa+dDqH8z\n/vN1OrqpFyWw70E/h6fHjrkQ+C7Z/ubVUVAdjNtQi3wi/W9wrYpauy+H+v+AwkHpn+F9jUMt1O/R\nf46+C1G/bnRT88MoqOTzTCi70JLb516JPtRf/RPUUt0AOAHNPZjPcpQy9GyyXz4j6D8TS9xBKDE/\n6FdIJlwjV62C6HDUffMK+pI+s07lWJkcpJvPbah1HC0/QKkoZ6E0pQ+E1z+OnTOd/qk4Z6B/gPFU\nnH9EN7Qmo1bug8Cesf3lDP17F6Ue3R3dHIz8OSz/RKMK3kYBNH7t6PqvoVEkF6HW6Bv0/3l9DpqT\n8HaUKvXv6MZcLQ02tei30S+AJ9EN16tQH3Gha52KfsX8A73fO1CXVD7bheOWoPSlx5EdG10qzWy5\n7yF+7BWo62YR6l75exnXNTMzMzMzMzMzMzMzMzOzevJwmoptkOn//IKZJWQ6VeYMWQkyS8s//FX0\naH0qHKQrl6HjjCrO7oGOropPP6RvVOmDiniw+09s1b1vxecfPuQbFZ97Bf2zEw3WZ4+p4mSgexZ0\nb1f5+Znrqiz/LeheufRxhXQcX2X5d0H3Z6q4wO1Vlv8MdK9f+ri8hg6jY/p7UH3syvy49DHA+0lw\nUouVnuPQzNrSsLQrUCYHaTNrS80S/Jqlni1odKqlr921WWplb51aydK1Xulj6lp+yk24rtEpl796\nuuVH3JK24jpGp1r8Ol2blz6oThykUy5/w9LH1LX8BgnSzRL8mqWeZmY15Za0mVkD+0DaFSiTg7SZ\ntaVmCX7NUk8zs5pyd4eZWQNrliDdqkn/u9H0UJXYFc2Q/B5wQK0qZGaNZWiZSwF7AY8Bj6NJHPI5\nN+yfD2wT2348mlTjofC6qFYN0tXMHPE08HX6zyxiZi1mWJlLHp3AeShQb4HmyvxozjHjgDHAJmiO\nygvC9o8B30CTQG8NjAc2LlbPVgnSh6Jvq3koNURcD5pPbi769to+bD8b+H54vSfZ+f6eDsflm0fO\nzFpEFS3psWj6s170i3sykJsIZx/g8vD6XjTZ8EdQML8Xzebeh+LO/qXq2ey2BE5HM1S/gmZfPo5s\nazqDRttsg2atvgTYCjgNuB+4G82bt3eitTazVFUxBG8E/efdfA7YoYxj1kMNwB+jrHpLgc8D9xUr\nrBWC9G5otuZXwvqreY6JZoWeiWapXg1NYnpU2HY88FR9q2lmjaSKG4fldqfmy5z3GJrw+XY0efFc\nSvxqb4UgnWHwaQSjD/njwEvoW6/YcQX29sRWRqf+qLdZK+p5TQsAQ/pqdt1Cwe9+YFbxUxcB8VzB\no1BLudgxI8M20K/5S8LrM4FnKqlnM7kTuAn4FWpNR8m5O2J/D0Z90zsDi4ElwAbAiagbZCrwR/r/\n7OigVPCvIh+0mZWna/VYvo+hnUx8uja3iwq1pD8VlsiFAw+ZhW4IjgaeR/HlkJxjpgATUH/1jiju\nvBD2rQ28CKwP7MfArpJ+WiFIPwL8BHXA96GfD73075NeiobVDQWOCNsvQsP0/gUcCVwGbIfuuN6I\n+rbHo+F8W9X7TZhZsqoIfstQAJ6GRnpcDDwKRNNRTAJuQyM8FqJujcNj518PfBDddPwW6nqtRz0b\nyhUMHNURdyVwQs62z8Zez0FdH6BfO9VNe2JmDa/Kh1mmhiVuUs76hALn7jqYglolSJuZDUqzPHHY\nDkG6mtnczKxFfaDc6LesrtUoqR2CtJnZAEMdpM3MGtewzrRrUB4HaTNrS2W3pFPWJNU0M6utYU0S\n/ZqkmmZmNebuDjOzBtYk0a9JqmlmVmMrpV2B8jhIm1l7cneHmVkDa5Lo1yTVNDOrsSaJfk1Szcb0\n5WXrp1b25M6iKWjrruTsmfW0TpqFQ8fa6Zaf9vtnvRTLrmUXhbs7zMwaWJNEv1aZiNbMbHCqmIkW\nzRT+GPA4cGqBY84N++ejyUUipwEPo/kOrwZWLFZNB2kza08rlrkM1AmchwL1FmhWlo/mHDMOGINm\ncDkauCBsH43mVv0kmkykE/hysWo2SYPfzKzGKo9+Y9GMK71hfTKwL5qdJbIPcHl4fS+wBrqb8Dqa\nkWVlNJPUymTnPszLLWkza0+Vd3eMAJ6NrT/HwMmsCx3zCvC/aPLZ59Hch38pVk0HaTNrT51lLgNl\n8m4dKN9E1hsD/426PdYDhgNfLXYRd3eYWXsqEP16XtZSxCL6z4M6CrWUix0zMmzrAu4B/h2234gm\nJ79qkNU0M2txBaJf10e0RCb+c8Ahs9ANwdGoy+JgdPMwbgqaiHYysCPq1ngBWAB8H/gAsBTYA7iv\ngmqambW4ogPfilqGAvA01CFyMbppeEzYPwm4DY3wWAi8CRwe9s0DrkCBfjkwB/htscIcpM2sPVUX\n/aaGJW5SzvqEAuf+PCxladUbh93ASRWeeyIaaD4f3XVN79lvM6ufym8cJqpVg3S5d1/zmQNsC2wN\nXM8gvvHMrIlU98RhYlolSB+KWr5Rf09cD3A2MBc9hrl92H426sAH2BOYHjt+aXh9L7ora2atpkmC\ndANUoWpbAqcDO6GB4msCx5FtTWfQndRtgF2AS9DjmKcB9wN3A+cAe+e59pHoBoCZtZomiX5NUs2i\ndgP+gAI0wKt5jrkm/J0JrBaW19Ez9DNR5s2ncs75Gnq+/oQa19fMGkED9DeXoxWCdIb8T/aUOgfg\n48BLDHykcw/gu8Cu6Dn7vB6c+Kf3X6/96c1Yp2vzQVbDzErpeRF6XgorHX21u7DnOEzMncBNwK9Q\na3qtsL0j9vdg1Ne8MxpUvgTYAI3k2AYNpfkjGlS+DXAh6qcu+tzRVmfsW7t3YWZ5da2tBYDOTiY+\nvLw2F3ZLOjGPAD9BN/760A3CXvr3SS9FozaGAkeE7RehYXr/Qn3Pl6Gbij8HVkEjOwCeBr5Y37dg\nZolrkujXJNUs6QoGjuqIu5KBfcufjb2eg7o+crebWatqkujXJNU0M6uxJol+TVLNqnwm7QqYWQNy\nn7SZWQNrkujXJNU0M6uxyrPgJcpB2szaU5NEv1bJ3WFmNjjV5e7YC3gMeBw4tcAx54b989HzFwCb\noWHC0fIaSmNRtJpmZu2n8huHncB56MnkRSgH0BT6zxY+DhiDZnDZAbgAzdCygGzAHhLOv6lYYW5J\nm1l7qrwlPRbNuNKL0kZMBnIfP94HuDy8vhdYA1gn55g9gCfoP6v4AA7SZtaeKg/SI+gfWJ9jYP6f\nfMfkpj3+MnB1OdU0M2s/lUe/cicVyU38Fj9vBeALFO7Pfp+DtJm1pwJD8Hoe1FLEImBUbH0UaikX\nO2Zk2BbZG5iNsnAW5SBtZu2pQPTr2kZLZOLkAYfMQjcERwPPoyybh+QcMwVNRDsZ3TBcDLwQ238I\n2Tz3lVTTynHE0G+kVvZx1cziWAOf6vhBamX3ff2HqZUN6J9mmm5Ot/iJRcci1NeQYTW8WOWjO5ah\nADwtXOViNLLjmLB/EprRaRy6wfgmcHjs/FXQTcOjyinMQdrM2lN10W9qWOIm5axPKHDum8CHyi3I\nQdrM2lOTRL8mqaaZWY05C56ZWQNrkujXJNU0M6sxT0RrZtbA3N1hZtbAmiT6NUk1zcxqrEmiX5NU\n08ysxtzdYWbWwJok+jVDqtJu4KQaXKcH+GR4fSuwWpFjJ6DHOZcDa9WgbDNrNNXNzJKYBqhCSbXK\nUhG/zudLHHs3ypDQU6OyzazBZJpkItpGbUmfjqaZmYnmBOtEaf0im8TWe1FrezbwQDi+lF7UQl4F\ntarnAQ8CXwr75wFPV159M2t0fUPLW9LWAFUYYFuU+m9rYBgwBwXg18K2+Sij1CXh+AzKybotcCxw\nMqWzS0Wt6r1QjteoZV2sC8TMWkgjBOByNGJLehfgRmApsATlZQW4CAXnIajFG5925sbwdw6DSyT5\nAPBZ4KfAzsDrlVbazJrLss4hZS1pS78GA2XoP+1MR9h2A5rNYDxqWb8aO+ad8LeP7K+DaWjK9N8W\nKetxNHPvg8CPge8PpqJXZDLvL/MzKSd4NmtRTwF3heXOvr6aXbdv6NCylgL2Ah5DMaTQFFjnhv3z\nyc4QDpqU9nqUg/oRNClAQY3Y4J8BXAachbo7xgMXokA8DU2NfkQZ19mzjGPWRcH+KtSdcmSeY3Ln\nKXvfoR0Fd5lZjWwYFoAhnZ30LF9ek+v2dVY8ULoTOA8l7l8E3I9+8T8aO2YcMAbdP9sBxa0oGJ+D\nJgU4EMXgVYoV1ohBei5wLfr2eRG4L7bvamA/4PbYtkzO63KatNExWwG/QEPt3gO+GbYfB5yCpmB/\nAN1cPHowb8LMGts7rFDmkW/nbhiLhuj2hvXJwL70D9L7AJeH1/ei1vM6qBt3F+DrYd8y1EAsqBGD\nNMCZYcm1M7phGA/EG8VezwZ2K3DNz+Q553b6B/zIuWExsxbVV3n4GwE8G1t/DrWWSx0zEnXJvgRc\nigZCzAaOB94qVFgj9kkXchPwNfRTwcysKn10lrXkUe4NqNz+0AxqGH8SOD/8fRP4n2IXadSWdD77\npV0BM2sdBQIw/+h5h3/0vFvs1EXAqNj6KNRSLnbMyLCtIxx7f9h+PS0UpM3MaqZQkN6+a2W271r5\n/fVzJ76Re8gsdENwNPA8eq7jkJxjpqD0EpPRDcPFwAth37PApsA/0c3Hh4vV00HazNrSssrT4C1D\nAXgaGulxMbppeEzYPwmN3hiHbjC+iZ7xiHwbjShbAXgiZ98ADtJm1paquHEIMDUscZNy1icUOHc+\nsH25BTlIm1lberfsIXjpcpA2s7ZURXdHohykzawtVdndkZjmqKWZWY0VGt3RaBykzawtOUibmTUw\nB2kzswbWLDcOnWuzcpnMMaUPqpt1UiwbWH5oev/rDN30B6mVDbD3sq1SLf/Wvx2Yavlv7ZVi4cOG\nscri96D62JW5JbN7WQeO7/hrLcqrmFvSZtaW3N1hZtbAmqW7w0HazNqSx0mbmTUwd3eYmTUwB2kz\nswbWLH3SzTR9lplZzbzLimUtBewFPAY8Dpxa4Jhzw/75wDax7b1oguu59J9oOy+3pM2sLVXR3dEJ\nnIdmVVmEpsKaQv/ZwscBY9AMLjsAF6AZWkBzHXYBr5RTmFvSZtaWqpiIdiyacaUXeA9NkbVvzjH7\nAJeH1/cCa9D/EbSyH45xkDaztrSMzrKWPEageQojz4Vt5R6TAf6C5ko8qlQ9k+ju6AaWAP9bh2vv\niyZzfLTUgYN0CfB54EUg3WeAzawuqhgnnSnzuEKt5Z3RBLYfBu5AfdszC10kiSBd7huqxH7AzdQ+\nSF8K/Aa4osbXNbMGUahP+ome53iiZ1GxUxcBo2Lro1BLudgxI8M2UIAGeAm4CXWfFAzS9eruOB1Y\nEAreDHW0z47t3yS23ota27PRHc/NClzzp2jq8/nAL4CdgC+E13OBDYGN0eSQs4AZsWtdBlyIOvgX\noFYywJaov2huuO6YsH0m8Oqg3rGZNZVCfdCjuzZg9+5Pvb/kMQvFsNFoxu+D0Y3DuCnAoeH1jsBi\n4AVgZWDVsH0V4HPAg8XqWY+W9Lao0lsDw4A5KAC/FrbNR1OYXxKOz6BvlG2BY4GTGdhP80Hgi8Dm\nYX014HX0QdwM3Bi2/xVNq74Q3VE9H4hSXa2PZugdA9wV/n4TOAe4Gn0WHu1i1ibeKTy8rpRlaCbw\naagBejH6NR/lxZwE3IZGeCwE3kQxD+AjZOPVUOAq4PZihdUjKO0SKrE0LNE3zEWooicCX6L/lOZR\npecA++e55uJwrYuBW8ISifp9hqPW9XWxfdF0wBngD+H1QuBJFPDvQa3+kaEOC8t7i2bW7Kp84nBq\nWOIm5axPyHPek8AnBlNQPYJ0hv4d5h1h2w3AGcCdqGUd7054J/zti9VpGrA26qI4GvXb7A4ciN58\n1EKO+ryHoGAeHzRezHLgGuAfwHj0zXcMamWXpXtW9nXXelrMrLZm9MHMvrCyrK/osYPRzo+Fz0B9\nwGeh7o7xqD/4HRR4LwCOKOM6e8ZerxKWqaj1+0TYvgR1fYC6P55CQfx69OWwFern7gAOQuMWNwrL\ngvD3SXSTcP1wfPlBertyjzSzSu3aqQWAYZ2c+c7ymly3WYJ0PW4czgWuRX3Pt9H/scerUQs23geT\nyXmdbzTIqqjveT66qXdC2D4ZOAW1zDcEvgocCcwDHkIDyqPrPhPqErWY30WB+6FQ5y3Jjua4Bn0Z\nbIrGOkb9SWbWIqoYJ52oet0oOzMsuXZGNwzjgXij2OvZwG55zvsXuhGY6x4UXOP2LlCnO9CNybif\nhSXXIQWuYWYtwvmkB7oJtXbzBWEzs0S9+/64gsaWZJDeL8Gycrm7wsz6aYSujHI0R3vfzKzG3N1h\nZtbAmmV0h4O0mbUlB2kzswbmIG1m1sB849DMrIEVmb+woThIm1lbcneHmVkDa5buDs9xaGZtqY+h\nZS0F7IWmvXocOLXAMeeG/fMZmJ2zE+UMurlUPd2SNrO2VEV3RydwHrAHmhLrfpQ3Pz6N3zg0scgm\nKO/QBWiGlsjxwCNkZ2kpyEG6CpnrSh9TLx3rlD6mruVvVPqYetnzvY+nVzgwdegDqZbPz9Mt/uY3\n0yt7yLDaXauKID0WTRDSG9Yno0mx40F6H5QaGTRF3xrAOmgKrZEoiP8ETYJSlLs7zKwtVZGqdARK\nYRx5Lmwr95hfoxTLZSXGdkvazNpSFUPw8uW8z6cjz/p44EXUH91VzkUcpM2sLRXq7ljSM4clPXOL\nnboIGBVbH4VaysWOGRm2HYC6QsYBK6GZpa4gO7P4AA7SZtaWCgXplbu2Z+Wu7DzZ/zfx0txDZqEb\ngqOB54GDGThRyBQ0F+tkdMNwMZq85LthAfg0cDJFAjQ4SJtZm6pinPQyFICnoZEeF6ObhseE/ZPQ\nNH3j0A3GNymc075k14mDtJm1pSrzSU8NS9yknPUJJa4xPSxFOUibWVvyY+FmZg3MQdrMrIG94yx4\nZmaNq1la0kk8cdgNnFSna+8LfLTG1xwF3AU8DDwEHFfj65tZA+ijs6wlbUm0pMt9OqcS+6EsUo+W\nOnAQ3gNOAOYBw4HZwB01LsPMUta3PP0AXI56taRPBxYAM4HN0FjC2bH9m8TWe1FrezbwQDg+n5+i\n1u184BfATsAXwuu5wIbAxmhYzCxgRuxalwEXomxVC4DPh+1bouQnc8N1x6AB5/PC/jdQcF6v/Ldu\nZs1g2bLOspa01aMlvS16AmdrYBgwBwXg18K2+Whg9yXh+AzwUjjvWPQEzlE51/wg8EVg87C+GvA6\neqrnZuDGsP2vaED5QpQe8Hxg97BvfWB7FIjvCn+/CZwDXI0+i9zPYzTKA3vv4D4CM2t0fcua45Zc\nPWq5CwqaS8MyJWy/CAXnE4EvoYAZiYLsHGD/PNdcHK51MXBLWCJREpPhqHUdTyC6QvibAf4QXi8E\nnkQB/x7U6h8Z6rAwdu5w4HqU9/WNwm/XzJpRXwO0kstRjyCdoX/2p46w7QbgDOBO1LJ+NXbMO+Fv\nX6xO04C1URfF0SiH6+7AgehJnqiFHPV5D0HBPHcGhEKWA9cA/0CZqW5DrfC70C+AG4DfA38sdIHu\nt7Kvu4ZpMbPaehhlxwfo6Our2XXfeXuF0gc1gHoE6RmoD/gsFOzGo/7gd1DgvQA4oozr7Bl7vUpY\npqLW7xNh+xLU9QHq/ngKBfHr0ZfDVqifuwM4CCXh3igsC8LfJ4HfoO6QrVCQvhj9f3F2sQp2r1zG\nuzCzqmzaWVOUAAAUnElEQVQZFoAhnZ1cv7ysNMwlLe9rju6Oetw4nAtci/qebwPui+27GrVgb49t\ny+S8zjcaZFXU9zwf3Yw8IWyfjJJnz0Y3Dr8KHIlu/D2EUgJG130m1CVqMb+LAvdDoc5bopSBOwNf\nAz4Tts9F85mZWStZ1lnekrJ6fZWcGZZcO6MbhvFAHJ+IaTawW57z/oVuBOa6h+yXbGTvAnW6A92Y\njPtZWOLuxjPWmLW+BgjA5UiyvX8Tau3mC8JmZslaljtxSmNKssW4H/AJ4JUEy4wcTnYEiZmZskKX\ns+S3F/AY8DhwaoFjzg3755Md0LASGtI7D933OqtUNZuj59zMrNaWVnxmJ3AesAeaEut+NNQ4/lTy\nOPQsxiaoq/YCNEPLUnS/6y0Uf+9G3cB3FyrMfa9m1p7eK3MZaCx6pqI3HDEZ5RGK2weNJgO1nNcA\n1gnr0eDdFVDAL9q74CBtZu2pr8xloBHAs7H158K2UseMDK87UXfHC2jI7yMU4SBtZu2p8j7pcpPG\n5d6ZjM7rQ/fnRgK7Al3FLuI+aTNrT4VuCs7rgfk9xc5chFIaR0ahlnKxY0aGbXGvAbcC2wEFC3SQ\nNrP2VChIf6xLS+SKiblHzEI3BEcDz6OEcofkHDMFpa+YjG4YLkbdGx8KJS8GPgB8FhhQQJyDtJm1\np8LD68o5cwJKc9GJ0kg8ip5kBs0afhsa4bEQeBMNAwZYF91QHBKWK1H2zoIcpM2sPVU+BA+UR2hq\nzrZJOesT8pz3IPDJwRTkIG1m7Sn/8LqG4yBdhY7jUyx87RTLBjqm1HNWtOJuG3NgamUDZHKzvSSs\n85QfpFp+36d/mF7hQynROTAItct6WlcO0mbWnirvk06Ug7SZtScHaTOzBuYgbWbWwBykzcwaWHVD\n8BLjIG1m7clD8MzMGpiH4JmZNTD3SZuZNTAHaTOzBuYgbWbWwJrkxuFgZ2bpBk6qQz1Ac4R9tIbX\n+zpKC2hmNtA7ZS4pG2yQrmdWnf2ALQrs66zgeocB6w3yHP+yMGsXlU+fBbAX8BjwOHBqgWPODfvn\nA9uEbaPQvIYPAw8Bx5WqZjlB+nRgATAT2AwFzNmx/ZvE1ntRa3s28EA4Pp+fhkrOB34B7AR8Ibye\nA2yEppP5NZou/XjgUuCA2DXeiL0+NZQ3DzgrHLcdcFW43kqhbmuF47dDHxShvleiKdUvRzMnXA/c\nF5ZPFXgPZtbMKp8tvBM4DwXqLdCsLLm9AOOAMSg+Hg1cECv1BGBLNGPLf+U5t59SLcdt0dQwWwPD\nUMCbjebm2hoF2cOBS8LxGeClcN6xwMnAUTnX/CDwRWDzsL4a8DqabuZm4MbYtYYB24f1S3OuE7Xq\n90bTp49FzxCtgaammYC6ZubkHJ/P5sDO6MfN1ejL4W/A+sCfKdzCN7NmVfk46bFoxpXesD4Zddc+\nGjtmH9ToA7gXxaV1gH+FBdTQfBT94o+f20+pIL0LCppLwzIlbL8IBecTgS+RDaSQDbJzgP3zXHNx\nuNbFwC1hieTOrnttifoB7IG+JKKHPBcXuV4+GfS+ot6nPej/zbYqsDLwVu6J3XdlX3eNhq4NyyjN\nzAalZ7EWAIbU8AmUykd3jACeja0/B+xQxjEj0TyHkdGoG+TeYoWVCtIZ+ge6jrDtBuAM4E7Usn41\ndkwU7Ppi15+G0tTfj5r+Y4HdgQNRi3f3WHlxb8ZeLyPbPTMEWKFAHXPrn+/8lXKOiwfgDvSBv1vg\nmu/r/kypI8ysWl1raAFgaCcTn1pemwtXHqTLvTeXG5fi5w1H3arH07/rdoBSQXoGcBnq5x0GjAcu\nRIF4GupnOaKMyu4Ze71KWKYC9wBPhO1LUNdHIb2oG+U69FNiWNh+B/AD1P/8NrAm+tLIvV4v6ov+\nM/37tnM/yNtRZ/4vw/onUF+3mbWSQkPwXuyBl3qKnbkI3QCMjEIt5WLHjAzbQLHrBuD3wB9LVbPU\njcO5qMthPpr99r7YvquB5SioRTI5r/N946yK+p7no5uRJ4Ttk4FTUMt8ozzn/Q74NAqYO5L99pmG\nuitmhfpGQwQvQ18o0Y3DicA5qDW/LFa33Hoeh4L5fHRz8+g8dTGzZldoyN3qXTCmO7sMNAvdEByN\nftEfTLYrODIFODS83hF1w76AGoUXA48AZ5dTzXL6bAs5GQXcM6q4RjPLZCamWHrKcxxyR3pFZ06s\n5n/bGpT/91SLZ9h3vp9q+X27pjnH4TA6/voeVBe7ADLsXWavxdSOfOXtjYJsJwq6ZwHHhH3RrOHR\nCJA30T28OWiAwgw0Gi2qwGnoF35elY4LvgnYENitwvPNzNJV3ROHU8MSNylnfUKe8+5mkM+nVBqk\n96vwPDOzxuBUpWZmDcwJlszMGpiDtJlZA/Mch2ZmDcwtaTOzBuYgbWbWwJok6b+DtJm1Jw/BMzNr\nYO7uMDNrYE0SpNNNgtDcMpn/SLH0wU4MVmMTr0uv7FOGp1c2wC1vlj6mng7cNd1/tkOn/yC1socN\nG8K7750BtcjdsWaZuTtezZu7IzFuSZtZe2qSlrSDtJm1JwdpM7MG1iRD8AaVMs/MrGX0lbnktxfw\nGPA4cGqBY84N++ejuQwjl6AJAB4sp5oO0mbWnjJlLgN1kk3ovwVwCP0nrwYYB4xBM7gcjaYajFwa\nzi2Lg7SZ2eCMBRaieVPfQ1P/7ZtzzD7A5eH1vcAawEfC+kz6T95dlIO0mdngjACeja0/F7YN9piy\n+MahmbWpiu8cljnAesDY6nLP68dB2szaVKExeDPCUtAiYFRsfRRqKRc7ZmTYNmgO0mbWpgq1pHcK\nS+TM3ANmoRuCo4HngYPRzcO4KWgi2snAjsBiNKJj0Fq5T3p14NgKz+0GTgqvLwMOqEF9zKyhLCtz\nyXviBGAa8AhwLfAocExYAG4DnkQ3GCcB34qdfw1wD7Ap6rc+vFgtW7klvSb6YC4odWAe8cE3hQfi\nmFkTq+pplqlhiZuUsz6hwLm5re6iWjlI/xTYGJgL3AH8G/gqsBx9uKeF/ecBHwbeAo4CFoTz453+\nTkRl1nKa45HDVg7SpwJboid99ga+h8Y3LkVjFgF+i36eLAR2AM4Hdk+8pmaWgrfTrkBZWjlIx1u/\ne6BHMaP5gRcDw9HdgXjSzRWSqZqZpa85Miy1cpCOyzCwy2IICtbbDDz8/XOK6n4m+7prdS1mVlsZ\netHDfbCsr5Y9j+7uSNsSYNXw+i/A94Gr0G+cNdFjmU8BBwLXoyC+FfBAOKfk/w3d69e2wmY2UAej\n0Wg3GNo5hHeX99Toys3Rkm7lIXj/Bv6GMk3thsYtzkI3EqPhdV8FjgTmAQ+h5+0jmQKvzawlvFfm\nkq5WbkmDgnDcz3LWe9FNxVwTY6+LjmE0s2bVHC3pVg/SZmYFpN9KLoeDtJm1KQ/BMzNrYO7uMDNr\nYO7uMDNrYA7SZmYNzN0dZmYNrDla0q38MIuZWRFvl7nktRfwGPA4SuaWz7lh/3z6p58o59z3OUin\npOe1lMt/Mb2yn0qvaABmpPwr9+GUn1/tWZxuBTIhD0f6Kk7634lSHO8FbIHyQ38055hxwBg0g8vR\nZPPal3NuPw7SKUk9SL+UXtm96RUNwMy+dMt/JN3imb445Qqk/n9ApOLHwsei9Ma94YDJwL45x+wD\nXB5e34vSI3+kzHP7cZA2szZVcUt6BJr2KvJc2FbOMeuVcW4/vnFoZm2q4huH5fYXeUanlPWQnf/Q\nixcvyS09VG8w5b2ec+6OwJ9j66cx8AbghcCXY+uPAeuUea6ZmVVhKPAESnK9Akp1nO/G4W3h9Y7A\nPwZxrpmZVWlvNGn1QtQaBs2XekzsmPPC/vnAJ0uca2ZmZmZmZlYDSc2481FgdzQ7fdxeCZUft3IK\nZZo1tQcTKGN9NGD+buC7wLDYvj8mUH4hv02xbOg/VrVejkN9j38Enga+GNs3N4HyI59Cz9BE7/kT\nwPkJlHsAsH/4m7vsn0D5LcHjpOvvgDzbMmgM5boJlH8Jmg39XjTp7nT0NNTLwAZ1LnutAts7gM/X\nuWwo/iW4dgLlHw1sC7yB7uZfH/6enUDZcWejlvufwvo84NMJlPsF9P96ITcmUIem5yBdf5OBq4Hl\nOds7gJUSKP/DaMwmwATga8AM9A+o3l5GLchC9aq3tVFwejXPvnsSKL8DBWjQY8BdwA3oyzHpBx2e\nyVlPIoPJYQmU0fIcpOvvQeCX5G/V7Z5A+UPRl8HSsP574F/ANGCVOpf9JHqP+QJ1Et0Nt6K+4Hxd\nC9MTKP9F1LUwL6y/AYwHLgY+nkD5kWeA/wivV0DdMI8mUO5JZH81RqL1DPCrBOrQ9DrTrkAbeBT9\nY82XUulu4Pk6l78S+u/cG9v2JDAT2Bq4so5lLwdeQl8KufpQF0w9/YmBLcjIDXUuG+BO9LTaG7Ft\ny4GbgL9SuG71qMcZ6AvjJOAt4NvUfybWPdGXQnxZMfY3iS9Ks6qskHL5K6ZcflLGkO1a+gxqSa7R\nJuUPBa5KqCyzpjYd2DC2PhZ4oE3K/xKwWnj9fdSS/GThw2tuPgpWY4B/Ar8g+8huO5R/N+l+IW+G\nfjk8HNY/DnwvveqY5bcnSrLyX8CZqJ80yUCVZvlRf/zOKDnOeOC+hMqGbJ/0d9DP/Pi2dij/SuB+\n9AV5UlhOTLD8GcAOZN9zB9mAbSX4xmFypgHHAnegftptyN9X24rlR2n2xwO/A24BfpRQ2QDvAl8B\nDiU7qmVY4cNbrvwnwjIE3UiNbtwlZWX633/I0CwTDFpb+T7wELATSsKyAAWtdij/VvTwylOoL3Yl\n1AWQlC2B36CpigA2Itn0kGmXn7apqKsnakkfGLaZNZSzgQ/E1jdArdp2KH8V9FDPJmF9XeBzCZUd\nWRGNZtmKdG7Ypln+XXmWOxMsf2PUJ/0WGs30N/RQj5XBMwckbzX0c29JG5S/GhqCVujJw1cSqAPo\n6cYL0dBDUEv2GJK7eZd2+dvFXq+EvjCXAackVH5kOOpyyU2ib9YQtkc30J4Oy3z6/+NpxfJvDX97\nUVdHfHmywDn1sAD93I5sHLa1S/n53J9gWWfRf8jhmsCPEyzfrCwPArvE1ncm2SF4aZefptyA1JFn\nWyuXv1Zs+RB6VD7JL4l5ebYlObqlqXl0R3KWoaf8IneTTP6ENMsvNcRvTp3Lj8xGXQt/COsHAbPI\nZmKrd6KftMufQ3Y0xzL0y+bIOpcZN4T+qQk+QPoPcjUN90knJ7pxd01YPxj9Txs9ll3vgJVG+T0U\nH+r1mTqUmc9l4W9Ul9whaPXOLZ12+fEAWWxbvZyKMi9egt774cAU4GcJld/UHKST00P/f5i5/1Dr\nHbDSLt/SM4eBv2rybauXn6MRJXug/+f+AuyGHu4xswaTRrL/tB9LTqv8dVE+68dQQN42/O0K25KS\nr/85iQkvWoJb0vV3Uux1vpZsvdM1pl1+rrnoacckzUDDzS4MZXegB3u2bPHyv45yOm+H+sAjS1AX\nTL37wo8FvoVGszwR274qGiv91TqX3xJ847D+ViXZR3AbrfxcL6ZQZtqPJadV/uVhOYBkUrPmuho9\nWfhT1C8dNQqXAP9OoT5mRV2BxodG1gIubaPy05T2Y8lpl+9xymZlyDdWNN+2Vix/M5RY6Q7a87Hk\ntMv3OOUm5u6O5HSg1mv0KPRaJDszTprlXwdcAFxENiNekl0wy9E0XvHHkjcsekZrle9xymZlOBQ9\n5fUj9FNzQdjWDuXPTqicQvK1GpOsU9rln4pa70cC3wiv2ykLX1NzSzo5V6B/mLuhVuR+wCNtUv7N\naLKBG4F3YtvrnWDpo8AWwOro6b5oRMtqJDNTe9rlR36GUgBE45R/iPKLWxPwEDxLQi8DuzcyKBtc\nPe2Lvoy+gJ5wiywBJgP3tHj5ZmZNYac2L/8N9MWwBP2SWY7ThTaNIWlXwNrCbNTdsWapA+tkf9TF\nMAyNsngZ+M82Kn84Gi+/KrppuD9wfoLlm1mD2wRNfrsQuBZNiptkV1s0Vdd+wMWojzjJNK1pl59P\nksM/rQq+cWhJeBz4LspXMR5lQ1se/p5D/W8gRv+fjweuB14j2SGAaZd/QOz1EJTD4+0Ey7cqOEhb\nUrZGKSr3Ro8oX40mHrgT+ESdy74ZJRRaivJJrE1yaTobofz4hMNRPul9EyzfzBrcbBSMv4ImZI27\nKaE6fJDswzurAB9JqNy0y+8ETkyoLKsDD8Gzeooy8HWS/0nDpDPwgVKlHp1CuWmWfz+a49KakLs7\nrJ6iDHyboSAxBTUMxgP3pVSntINVGuXfDZyHbtq+Gdue1PRlZtbgZqKAHVmV/vMtJintJ+3SKL+H\nbGKr+GJmBihPSPwx6JVIdrbqdpfvyc56P+1pNeLuDkvCFah740bU3fFFlIw+KZsBJ6P0oNH/8xmU\nx6Qdyr+egfMZXoeG4lmDc5C2JPwE+DOwCwpOh5FsPuO0U6WmVX6U4GkN0k3wZFVwkLakzCa9lKXv\noSCZlrTK3xQld1o9/I0sAY5KoT5WAQ/Bs3bQDbxE8qlSG6X8nYC/J1SWmdmg9QJP5SxPtlH5vyDd\nBE9mZlZEIyZ4sjI5Vam1g7RTpaZdftoJnqwKDtLWDr4MjECPR6eRKjXt8qMET9ui7o6kEzyZmZVl\nCLAPsAh4FpiIZk1vh/LTTjBlZlbU1sDZ6EnHc4Ed0QMmSSW/T7v8yG8TLs/MrKS0U6WmXX5ckg8R\nWQ14nLS1srRTpaZdfj7TUJ+4NQk/cWitLO1UqWmXn48DtJk1nLRTpaZd/mbA74A7yKYpvTPB8q0K\nbklbO1gb5c+IvBe2tUv5aSeYsio4SFs7SDtVatrlp51gyqrgG4fWLrYlmyp1BsmPckiz/G7STfBk\nVXCQNmt9vQzs3sjg2VnMzMyq01n6EDNrcrNRmtKFOGdH03GCJbPWl3aCJzMzK0PaCaasAm5Jm7WH\nrdFj6L8AbgAOQnMd+qEWM7OUNVKCJxsk90uZta5GTPBkg+QnDs1aVyMmeDIzsxxpJ3iyKvjGoVnr\nSzvBk1XB3R1mrS/tBE9WBd84NGsPaSeYMjMzMzMzMzMzMzMzMzMzMzNrQf8fVDf+6C9mQpEAAAAA\nSUVORK5CYII=\n", | |
"text": [ | |
"<matplotlib.figure.Figure at 0x3cd1e80>" | |
] | |
} | |
], | |
"prompt_number": 14 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The two clpx papers and the two dyn-steps are most similar to each other, as expected, while all the dyn paper do bear some similarity to each other. For a quicker readout, I've grouped the data into three similarity levels (in practice this could be done automatically with a clustering algorithm)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"levels = [0.06, 0.075]\n", | |
"binned_df = dist_df.copy()\n", | |
"binned_df[(dist_df <= levels[0]) & (dist_df > 0)] = 1\n", | |
"binned_df[(dist_df <= levels[1]) & (dist_df > levels[0])] = 2\n", | |
"binned_df[(dist_df < 1) & (dist_df > levels[1])] = 3\n", | |
"plt.imshow(binned_df.values, interpolation='none')\n", | |
"ax = plt.gca()\n", | |
"plt.xticks(xrange(len(binned_df.columns.values)))\n", | |
"plt.yticks(xrange(len(binned_df.index.values)))\n", | |
"ax.set_xticklabels(binned_df.columns.values, rotation=90)\n", | |
"ax.set_yticklabels(binned_df.index.values)\n", | |
"plt.title(\"Similarity between papers\\nLower value = more similar\")\n", | |
"plt.colorbar();" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"metadata": {}, | |
"output_type": "display_data", | |
"png": "iVBORw0KGgoAAAANSUhEUgAAAWIAAAFSCAYAAADIJtXXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYHFW5x/HvzCQQJAmETRESJgQMiyEgi0aDDKACGnZR\nBEWQC4gX8coiV0GSoGzi9QIi+xJAAsgmQYgBJZMFkCUriCCBDEG4CAECARJIZvr+8Z6ia2q6unt6\nqVPT9fs8Tz3TXV1V53RD3j596pz3gIiIiIiIiIiIiIiIiIiIiIiIiIhITR0OTKvw3F2BZ0LPO4A9\nq6jLcqC1ivPDOqiuLiIiNTUWeBhYBrwBzAZ2qkM5i4E9anStScAvqji/mrp0VHGuSF30810Bqcpg\n4E/AccAfgDWxluwHPisV0Q9Y7bsSITmgyXcl+oC0/XcTSa2dgLeKvH4kMCv0vAs4HngOeAc4CxgB\nPIK1qG8B+rtj24CXQueGW6G7uHPeAl4Bfhs6LyjnB66c50P7RgDHAh9iXxbLgSnAKcDtkbpfDFwY\n874WA/8N/B14E7gW+xIKjAPmu/o9BIxy+28EOoH3XdmnYq3zk9zrm4TqjqvvG2VcF+CTwB3Aa8AL\nwA9Dr03Aviivxz73p4AdY94brg4/xD6714Ffkf/yGAE8CCx1r/0eWCd0bgeVfTbBuT8BFgIrgBbg\nNOBfrt7PoF8TIj0Mwv5BTgL2BoZEXj+SnoH4LmAgsA0WDB/E+m4HY/94j3DHthEfiD+DBeNmYDPg\naeBHkXKmAeuSDwJdwObu8XXYl0DgE8C75ANKP+DfwA6F3zYdWLDYxL3n2eS7OnZw5+6MBa8jXN2D\nL4pot8ZR2JcBwGHAIuwLCeB72OdV6rrNwBzgDFf34VgQ/Yo7dwIW2PZ2556DfZHF6QL+in1+Q4Fn\ngaPdayOw/vH+wAbADOB/a/TZdABz3blrAiOBJdh/H4Bh5P8bikjIVlhgewlYBdwNbOReO5KegXhM\n6PkTWKsw8Gvy/6jbiA/EUf8F3Bkppy1yTDQQR/uIpwL/4R6Pw1qNcRZjLevAPlgABbiM7kEerCW3\na+jc8PsYgbUcm9y5x5J/39dj763Ydb8IfBZ4MfLaT7HWKFggvj/02jZYqzxOF/kgDvYr5i8xxx6A\nBc9AtZ/NkaHXtsACdxD4pU6afVdAqvYM1qobCnwa+4kc95Me7B9WYEXk+UqstVzKp7C+6f8D3gbO\nBtaPHPNS9KQSrge+7R5/G+tGKCZ8/SXY+wZroZ+M/fQOtk1Dr0c9D7wHbI8FpD9h3S2fwoLsjBLX\n3di99snIaz8l/4UI3T/n94EBFP/3F/f+Po612P+FffY3Uvyz7+1nEz53EfZFNMHV/2bs/UqNKRA3\nlmexgPbpCs/PlXncZVh3xBZYd8Lp9Px/qdi1Cr12N7AdVvevATeVqMOwyOOX3eMl2BfDkNA2ELi1\nSNkzgEOwVt8r7vmR7tz5ZVx3CdaaDL82GGvZx5VZStz7Owfr5/409tl/h56ffaWfTaG63ox9QW3m\nXju/929FSlEg7ttGYjeaNnHPhwLfonj/Y1RTzONiBmI3u97HukaO70V5YK2raF/jCuxm12TgUazF\nF6cJ+E/sfa+HfREEweQq4PtYH3YTsDYW2IOW/r+x7oiwGcAJwEz3vN09n0U+MBW77mPY5/ETYC3s\nJtenyQ8jrGSUxink+4hPDL2/gVgL/h33/k+NnNeE3Wys5LOJ+hTWjbMmdj9hJfYlIDWmQNy3Lcf6\nJx/FbnY9gt2oOdm9nqN7C6dQyyz6eqnjwYLEYVgwuBL7qdybcq7B+knfonvfctCaL9UtkcNazPdj\nXQvPAb90r80BjgEuwfp+nyN/AxLgXOym2lvkR0vMxIJREIgfwgLqzNB5xa7bhbV+t8dGTLyOfS6D\nQ/WNfialWsl3uzLnYd0lQX/zROxm6dvAPdiXV/Szn0zvPpu4uqyJfV6vY91QG2BdLiLSwIZirb1y\n+qkbWfjGZm/VcuKNJEQtYkmLZqwlfzPWuhfJDM2skzRYG+u7XYyNtc26Sm7uiYiIiIiIiEjFJlB6\npIT0VE1q0aewCSugz9873azrWzpozDy86hOtzCDs/4lKfJr88Dx9/p4pEPcthcaj+lDrm7xZSkvZ\n4rsCBVTz+SuG1IA+xMawJpZf4mW3/S+whnttBnCQe/wFbIzqV93zPbEJA4HvYVOX3wT+TPepsuHU\nls8WqMNUbLZb2AIsKQ3ARdgU27exZENjY95LGz3zVHSQ/yXQhKV5XIRlnruVnlnnqnUkNqnjN9jE\nj0XA57GcHkuwER7hSSLrADdgKTA7sNlsTQWutRQYj/23+TWWKOhVbMr4gJi6bIH9N1yGTay4JfRa\neLzxJOBS4D6sy2IWljXtIvce/oFNOAl0ED/e+DZsAscyV/Y2odcmufrehw0zbIu5hvSCAnFjOB2b\ntjrabbtgs8fApuu2uce7YTO/vhh63u4e74/NmjoQm0E1CxvTG7Y/lkJxG3qajE2vDmyDBfJ73fPH\nXN2GuGNvI/9lUUr4l8CJwH7uPWyMBZnfxZw3jO4JbqLboUXK3AX7IlkP+xz+gM1oG4ElJboE+Jg7\n9rdYN8Fw7DM9Agva4Ws9jyUBOgfL17AF9nlsgU1HPjOmHr/AvhTXdcddXKTOh2D/L2yA5Xz+G/C4\new+3Y18GgWK/rO519doQy+wWzfvxLVevgdiXjEimxM2aWkT38bdfcceCtSQXuMdTsby2QS6KGeRb\nrFOxFnGgGZvlNtQ9L5TaMmwQ1kIKjj8buLrI8W+ST0o+gfzNojZ6tojD7/tpun8GG2NBp5aNiiOB\nf4aej8Le/4ahfUuxJEUtWB6GrUKvHQtMD10rnCKzCfucwjPnxmBfkIVcD1xBPp9IWDS16BWh107A\n8kuH30N4EYHwZzqB+Jt167pyBrnnk9wmNaQWcWP4JN3/sYdTHz6CJW/ZCPtpegMWLNfHWrfBDZvN\nyP+MfYv8yhThAFAsteVyrCUVtIoPpXtL6hQsiC5z118Ha7n1ViuWrD2o59PYkj4fr+BaxUTThYJ1\nDYT3DcTeQ396fv5xn9uGWEt6Dvn3MJX4z+InWPB+DBvpcFTMcWBdI4GVkedBfUtpAc7DvtzfJv+F\nHtQvR+9TnEoJCsSN4RW6D2Ma5vaBZUibg+WVfRJLHv8wNp14EdYyBQsex9I9ReLa2M/bQKkbhTdj\ngXgM1ucZtAp3xbKEHYK1sIZg/8gL3SR6j/xPfrDAEG6JLiG/GkmwfQzr04wahn1BxG3fKnBOby3F\nPtPWSLnh7HG5yPErsK6boP7rkk8QFPVv7L/LJtjahJdS31UyDsO6fvbEviyHu/1ZuqGaOAXivmcN\nLMgFWz8sAJ6BtVo2wPobwz81Z2A30oIk5+3YT9cZoWMuB35Gvv93HSxw9sZ9WMt6It1vKg3CWq1L\nXf3PJD7w/NO9r69iLc0z6L7m2uVYP2twI3FDLHAUssSVHbdF+8Ar0Yn1H5+NtTg3A36MrSVXSBeW\njvJC8l8wm9B9RY6wQ7Dk7WC/JnLuGlG1CpQDsa6WN7Ev4nPqVI6EKBD3PfdhrdxgOxNLc/gElgJz\noXv8y9A5M+ie5nEm9o8snObxj9hNpFuw1uqTwF6h18sZNvchltZyT+yGXODPbvsndrd+BRYkw9cO\nrv82NjrjaqxV+S7dfwpfhK0xdz+WhvMR7GZYLfU2beUPsZb8C9hNzpuwPtu4a52G/Rr5G/Z+H8C6\njwrZyR23HEuNeSL5scOlUpiW+x7Cx96AdbO8jHWFPFLGdUVERBI1AMsBPh+7R3FuzHEXY8M9FxC/\nEK6IiFQouI/RD/vFEh0X/1Xs1yvY4g1/owh1TYiI9F6wCvca2A3lNyOv74cNPQRrPa9LkZE9CsQi\nIr3XjHVN/BsbHfR05PVN6H5v41/kb7oWvJiIiPROFzYuf1NslmdbgWOiI0xib3JqhY6KbZbrPoZf\nRBIygypzXAyA3MryD19O/HDLt7GJTDuRTxcANupkaOj5pm5fQQrEFXsRmiZUfnpuOjTtXvn5C+NS\nE5Tp0gnwgwkVnz5+VOU/ptrxmymm2vKbqhxJOz0Hu1dxjfFVzmub8D8w4eTSx8WeH/sDuzztVP75\nN/fvz/hVq3arrgY27fCXJY8yZ+Sndwc2wMbFL8NW+/4yNnY+bAo2Vv8W4HPu2H8TQ4FYRDKpf+Wn\nbozdiGt2243AX7GZj2A5P+7DRk4swsaYF5uarkAsItlURfB7EsvEF3VF5PkJCdRFqtPqt/id27wV\n3eqtZJUP0DbGb/mtfov/SBUt4ppTIPalaXjpY+pJgdib4Z6zNbR93m/5rX6L/0iagl+a6iIikhi1\niEVEPFvLdwVCFIhFJJPSFPzSVBcRkcSoa0JExLM0BeJGzTUxAVsKqBJfxFauXQUcXKsKiUi69Ctz\nS6oujaiaFQReBL6LLXYpIg1KLeLaOwLLgj8fW+olrB1bH2weNiNmZ7f/QuDn7vFe5Ndve9EdV2hd\nMBFpEGoR19a2wOnYysFvYqvinki+VZzDRqrsgK0mfC0wCvgp8DgwG1sHbZ9Eay0iXmn4Wm3tga2i\nG2TIf6vAMcFqvbOwdHaDsYUnj3H7fgQsrm81RSRN0tQ10QiBOEfvl/gOWsvbAa9j2fSLHRfz6vTQ\nk1b/05ZFGlAH+WWrmzo7a3bdNAW/NNWlUg8CdwG/wVrF67n9TaG/38T6isdieUGXA5sBJ2FdFlOx\n5eQfC123iVIBvpp8wiJSllby+SmaW1po76rN7Ru1iGvraeBs7GZbJ3ZTroPufcQrsSFp/YDvuf1X\nY0PcXgWOBiZhWfZHA3difc3jsKFwo+r9JkQkWVUEv6HYoICNsPhyJXBx5JhTgMNDRW2NJZRfVuO6\npMoN9BwtEXYj8OPIvi+HHs/FuinAbuANRUQaWhUt4lVYPJkPDATmAA8A/wgd82u3gTXo/ouYIAyN\nE4hFRHqlikD8qtsA3sUC8CfpHojDDiM/YKCgLARideSKSA9rlRv9Vhd9tRW7z/RozOsfw+Yp/KDY\nRbIQiEVEeugXE/1mdcLs8u4HDgRux4a/vhtzzL7YXIXYbglQIBaRjOrfUnj/Hi02OSFw3nuFTwfu\nAH6PjbiKcygluiVAgVhEMiquRVyGJuAabMTWhUWOWwdLInZYybpUXBURkT6sf+XR7wvAt4GF2HBZ\ngJ8Bw9zjYDXnA4BpwIpSF1QgFpFsiumaKMNsykuYdr3bSlIgFpFsSlH0S1FVREQSNMB3BfIUiEUk\nmyrvmqg5BWIRyaYURb8UVUVEJEEpin4pqkoftOBMf2VvN9Ff2SKNQF0TIiKepSj6pagqIiIJSlH0\nS1FVREQStKbvCuQpEItINqUo+qWoKiIiCUpR9EtRVUREEqRREyIinqUo+pWTQUhEpPH0K3PraSgw\nHfg78BRwYpFSdsYWWzqoVFVERLKn8lET5aziDNb5cT7wZyyZfCy1iEUkmypvEb+KBWHovopz1A+x\nNe1eL1WVRg3EE4CTKzz3JOwnxwLgL+Sz7otII2kpcyuulcKrOG8C7A9c5p7nil2kUQNx0Tddwlxg\nR2A09m32q5rUSETSpfIWcaDYKs4XAv+NxaImSnRNNEof8RFYCziHrSP1fOi1duxnxG7Y+/0e8Dj2\nQb0B/ALYC1tzajd3fOBRbG0qEWk0MdGv/WVof6Xk2aVWcd4RuMU93gDYB+tbntKLqvQp2wKnA2OA\nN4Eh2F3MoFWcA9bCfj7sClwLjAJ+igXk2cBF2AcVdTRwXx3rLiK+xES/ts1sC0yc0+OQclZx3jz0\n+DrgHmKCcJGq9Cl7AH/AgjDAWwWOudn9nQUMdts7wDFu34+AxZFzvg18Brs7KiKNpvIJHeWu4ly2\nRgjEQR9Mb88B2A67o7lJ5PUvYR/sF7GfE4VdOiH/eOc220SkpjrcBtDU2Vm7C1e+Zl25qzgHjip1\nQCME4geBu4DfYK3i9dz+ptDfb2J9v2OBZcByYDNshMQOwFSsn+cx9/xyrN94adGSfzChVu9BRGK0\nug2guaWF9q6u2lxYU5xr6mngbGAG0In9VOigex/xSmw0RHCzDuBq7Abfq1hf8CRsFsyvgLWxu6EA\nLwIH1PctiEjiUhT9UlSVqtzgtjg30rOv98uhx3OxborofhFpVCmKfimqiohIglIU/VJUlbrZ3XcF\nRCSF1EcsIuJZiqJfiqoiIpIgrVknIuJZiqJfiqoiIpKgFEW/FFVFRCRBulknIuJZiqJfiqoiIpKg\nFEW/FFVFRCRBKYp+jbpCh4hIcWuWufVUzirOWwGPYHluSi7blqLvBBGRBFUe/cpZxfkNbPHQshKG\nKRBXYfxof7ddc9WsylcDZzWd6a3srpfO8lY2+P/sJw71W37DqPyf76tug+6rOIcD8etu+1o5F1Qg\nFpFsqk30a6XwKs4eqiIi0tdUH/2KreKccFVERPqimK6J9rnQPq/wayGlVnHuFQViEcmmuFWcd7Et\nMPG6HoeUs4pz+NhKqyIi0uAqXzy0nFWcPwE8jq0Y34V1X2xDTBeGArGIZFPloybKWcX5VWy8cVkU\niEUkm1IU/VJUFRGRBKUo+qWoKiIiCVIaTBERz1IU/fpC0p8JlJE0owztwGfc43uxu5lxTgAWYXc7\n16tB2SKSNv3K3BKqStrVamZ/+Dql5n/PBu7BgreINKBcihYPTWuL+HTgWWAWMBLrzZkTen3L0PMO\nrNU8BxvXN7KM63dgLd21sdbxfOBJ4Bvu9fnAi5VXX0TSrrNfeVsS0tgi3hH4JjAam0Y4Fwuyb7t9\nC4CjgGvd8Tksy9GOwPHAKcAxJcoIWsd7Ay+TbyEX664QkQaSVJAtRxpbxLsCd2IJlZcDU9z+q7EA\n3Iy1XCeHzrnT/Z2LZUMq10Lgy8B5wFjgnUorLSJ9y+qW5rK2JKToO+EjObrPz25y++4AxgMPYi3k\nt0LHfOD+dpJ/T9OAjbBphsfGlPUclsLua8Avgb8Cvyi3ou2hxLStQGtTWdPKRaQXOtwG0NTZWbPr\ndvYrN/x9WLMy46QxEM8EJgHnYl0T44DLsWA7DbgM+F4Z19mrjGM2xgL6TVjXx9EFjomNrm0KvCJ1\n10r+Z25zSwvtXV01uW5nS3oGEqcxEM8DbsX6gl8DHgu9Nhk4ELg/tC8XeVzOKIvgmFHABdgwtVXA\n993+E4FTgY9j3Rf3Et+qFpE+6APWKPPIFXWtB6QzEAOc47aosdhNunCw3Tz0eA6wR8w1dy9wzv10\nD+qBi90mIg2qM0XhLz01Ke0uYDjxgVZEpGydlc9xvha7r/Qa9qs6agMsYfwnsBj7a6y7NVYaR03E\nORDYHnjTd0VEpO/rpKWsrYDrsKGvcU7Auli3B9qA/6FEo7cvtYhFRGqmihbxLIoPk/0/YDv3eDDw\nBrC62AUViEUkk1bXL/3aVdgw21eAQeRn7MZSIBaRTKrjzbqfYWkS2oARwAPYrODlcScoEItIJn0Y\nM3ztifb3eKL9/Wou/XngbPf4eWAxlgPnibgTFIhFJJPiuia2bxvM9m35tDNXTlza20s/A3wJeAib\nizASeKHYCQrEIpJJVXRN3Azshg1TewlLvdDfvXYFNgfiOmxSWjPwE0qM9lIgFpFMqmLUxLdKvL4U\n2Lc3F1QgFpFMqiIQ15wCsYhkkgKxiIhndRxH3GsKxNXI1Wo5vd5r9pyBc/WSstM211zzsPHeygZo\nWnim1/JXL/EbQCYO9Vp8zXxIehatUyAWkUxS14SIiGfqmhAR8Uz5iEVEPFPXhIiIZwrEIiKeqY9Y\nRMQzDV8TEfFMXRMiIp4pEIuIeJamPuIkVnGeAJxcp2vvD2xdh+teC/wbeLIO1xaRFOikX1lbAaXi\nQxvwNraS8zzgjFJ1SSIQ1zMhw4HANnW4bqnlskWkj+ukpaytgHLiwwxgB7f9slRd6hWITweexZad\nHgm0AHNCr28Zet6BtZrnAAvd8YWcB/wdy3p/ATAGS758AfatMxxbqG8qtjbUzNC1JgGXA4+7en3N\n7d8WeNSdvwDYwu2fBbzVq3csIn1KFYG4nPjQq7Rc9egj3hH4JrZqaX9gLhZk33b7FgBHYc17sBbz\n6+6844FTgGMi11wfOADYyj0fDLwDTAHuAe50+/8KHAcsAj4LXArs6V4bBuyMBdvp7u/3gYuAydhn\noT5zkYz4oH7D13LYAqILgJexmPZ0sRPqEXh2xQLjSrdNcfuvxgLwScA3sKAYCALpXOCgAtdc5q51\nDfAntwWCb56BWCv5ttBrwTKtOeAP7vEibCG/rYCHsdb7pq4Oi8p7iyLS19Vx1MRcYCjwPrAP8Efg\nU8VOqEcgztG9Wd7k9t2BLbL3INZCDjftP3B/O0N1mgZshHUnHAvsgrVuvw6cQL6lG/RBN2MBe4cy\n69mFLQL4N2AccB/Wmp5e5vm0hx63uk1EaqvDbQBNnZ01u25cIH6xvYMX21+s5tLLQ4+nYr/M16PI\nAqL1CMQzsT7Zc7GuiXFY/+wHWHC9DPheGdfZK/R4bbdNxVqxz7v9y7FuCrCuisVYoL4d+wIYhfU7\nNwGHANcDm7vtWff3BeC3WNfFKHoRiNvKPVBEKtZKvpHT3NJCe1dXTa4bF4g3bRvBpm0jPno+e+LM\n3l7648BrWCNxFyz+JL6K8zzgVqx/5DXgsdBrk7GRDveH9uUijwuNshgE3A0MwN7Uj93+W4CrgB9i\nAfhwLNCfgX0J3IwF4hywxNVlMNby/RALzt8BVgH/B5ztrhssl70+tlz2mdidUhFpEFWMIw7iwwZY\nfBiPxRuAK7BYdDywGuueOLTUBet1c+oct0WNxW7ShYPt5qHHc4A9Cpz3KnbzLephbORD2D4xdXoA\n+3DCzndbVKnlskWkj6siH3Gp+PA7t5UtyVECd2FDzAoFWhGRRH340b18/5IMxAcmWFbUUR7LFpEU\nStMUZ42bFZFM0lJJIiKeKfuaiIhnCsQiIp4pEIuIeKabdSIinmnNOhERz9Q1ISLimbomREQ80zhi\nERHP1DUhfV5TEqsdxsjNH++vcIDtJnotvmmJ1+IbRpoCscd/TiIi/qympaytgFKrOB+OpQFeCDwE\nbFeqLmoRi0gmVTF87TpsMYkbYl5/Afgitk7n3sCVwOeKXVCBWEQyqYquiVkUXxntkdDjR7E1MYtS\nIBaRTEqoj/hobD3MohSIRSSTEhhHvDu2PucXSh2oQCwimRQ3jvi99id4v/2Jai+/Hbae5t50X7G+\nIAViEcmkuK6JAW2fZUBbfonMpROv7O2lhwF3At8GFpVzggKxiGRSFX3EpVZxPhMYgq0oD7ZK/C7F\nLqhALCKZ9EHlw9dKreL8H24rmwKxiGRS1mbWTQBOrtO19we2rvE1hwLTgb8DTwEn1vj6IpICnbSU\ntSUhiRZxro7XPhC4B/hHDa+5CvgxMB8YCMwBHqhxGSLiWWdX47eITweexWagjARasIAW2DL0vANr\nNc/B5maPjLnmeVgrdQFwATAG2Nc9ngcMB0YAU4EngJmha00CLgced/X6mtu/LTbzZZ677hbAq1gQ\nBngXC8CfLP+ti0hfsHp1S1lbEurRIt4R+CYwGruTOBcLsm+7fQuAo7DEGWAt5tfdeccDpwDHRK65\nPnAAsJV7Phh4B5iCtYjvdPv/ChyHDRn5LHApsKd7bRiwMxZsp7u/3wcuAiZjn0X082gFdsCCtYg0\nkM7V6blFVo+a7IoFxpVum+L2X40F4JOAb2BBMRAE0rnAQQWuucxd6xrgT24LNLm/A7FW8m2h19Zw\nf3PAH9zjRVhSjq2Ah7HW+6auDuExfwOB24EfYS1jEWkgnQm1dstRj0CcIx8ccY9zwB3YeLsHsRZy\neLbJB+5vZ6hO04CNsO6EY7FxeHsCXwdOIN/SDfqgm7GAvUOZ9ezCxgP+DRiHzQc/Dmst93f1/T3w\nx7gLtIcet1I8C4iIVKbDbQBNnZ01u+4HK9YofVBC6hGIZ2J9sudiAW0c1j/7ARZcL8PmX5eyV+jx\n2m6birVin3f7l2PdFGBdFYuxQH079gUwCut3bgIOAa4HNnfbs+7vC1hKu2Hu+OlYy/tp4MJiFWwr\n402ISHVayTdymltaaO/qqsl1uzrT0zVRj5t184Bbsb7g+4DHQq9Nxlqi94f25SKPC42yGIT1BS/A\nbgD+2O2/BTgVa2EPxxIyH43dbHsK2C903SWuLkHL90MsOD/l6rwtll90LDY1cXe3fx42X1xEGsnq\nlvK2BNTrK+Ect0WNxW7ShYPt5qHHc4A9Cpz3KnbzLephLICG7RNTpwewm4Fh57stbDZauUSk8TV4\nH3Gcu7BWa6FAKyKSrNVNpY9JSJKB+MAEy4o6ymPZIpJGq31XIC89vdUiIkla6bsCeeoLFZFsWlXm\nVtjewDPAc8BpBV4fgnXHLsAmhEXvZXWjQCwi2dRZ5tZTC3AJFoy3wdJiRpOP/QyboDYaOAKbwRtL\ngVhEsml1mVtPu2CzcDuwNvMtWCbIsK2xOQlgcxZagQ3jqqJALCLZVHkg3gRbmSPwL7cvbAH5dA27\nAJthqRQK0s06EcmmykdNlJPa9zysO2Ie8KT7Gzs/W4FYRLIpLhAvbIcn24ud+TK2gERgKNYqDltO\n91QOi7F0CgUpEItINsUNX/tUm22ByROjRzyB5VRvBV7B0v5G17FbB1iBpVI4BphBkSyOCsQikk3x\nQ9NKWY1lgJyGjaC4BltA4jj3+hXYaIpJWDfGU1gOnFjpmePX9+Ry0R8jiRbur2yAszbzV/aZL/n9\n3zZXm+RfFes37Odeyz8zd5a3spv792f8qlVQfezKcVOZ/4gOb6pFeUWpRSwi2aQpziIinikQi4h4\npkAsIuKZArGIiGcpyr6mQCwi2VT58LWaUyAWkWyq3YLQVVMgFpFsUh+xiIhnCsQiIp4pEIuIeJai\nm3W9TQw/ATi5DvUAy3AfXW6kGt8FNq7h9USkkXxQ5paA3gbieqaaORDLWFRISwXXOxL4ZC/P0S8E\nkayofIWOmisnEJ+Orbk0CxiJBcU5ode3DD3vwFrNc4CF7vhCzgP+ji0ncgEwBtjXPZ4LbA60A/8L\nPA78CLhBnf+0AAAQg0lEQVQOODh0jXBuz9NcefOBc91xOwE3uesNcHVbzx2/E/n1pCYANwKzgeuB\nDYDbgcfc9vmY9yAifVl1qzjXVKkW4I5Y0uPRQH8sqM0B3nb7FgBHAde643PA6+6844FTsKTIYesD\nBwBbueeDgXeAKcA9wJ2ha/UHdnbPr4tcJ2id7wPsh60LtRJYF1iG5Qs92dU5fHwhWwFjsR8ik7Ev\ngIeAYcCfiW+pi0hfVd044r2BC7GG6dXA+QWOacNiSX9gqXteUKlAvCsWGFe6bYrbfzUWgE8CvkE+\nWEI+kM4lv3he2DJ3rWuAP7ktEM35eWuJ+gF8CfsiCCYsLityvUJy2PsKeoO+RPe+6kHAx4D3oydO\n+J/847Yx0Ka2s0jNdbgNoKmzhrMwKu92aAEuwWLFy9iv9ilYcvjAusDvgL2wZZQ2KHbBUoE4R/dg\n1uT23QGMBx7EWshvhY4JAlpn6PrTgI1chY/FWq97Al/HWq57hsoLey/0eDX5rpRmYI2YOkbrX+j8\nAZHjwkG2CfgstsRJURPqddtSRD7S6jaA5pYW2rtqlJm/8kC8C7CI/PfDLdhgg3AgPgyLk8HyEUuL\nXbBUH/FMrBthANYyHOf2f4AF18vId0sUsxewAxaE18a+LaZiLerR7pjlWDdFnA6sywOsK6K/e/wA\n1jpfyz0fEnO9DqxvGLr3NUeD+P3AiaHn2xepk4j0VZX3EW8CvBR6/i+3L2xL7J7UdGyNu+8Uq0qp\nQDwP6x5YANyH3bwKTAa6sMAVyEUeF+qXHYT1BS/AbgD+2O2/BTgVa2FvXuC8q4DdsBtynyN/s24a\n9rPgCVffoJ06Cbic/M26idjy1o9j34VB3aL1PBEL2AuwG4rHFqiLiPR1lQ9fK2f0WH/gM8BXsYbo\nz7HgXFA5w7XOcVvUWKw1HK5UOIDOAfYocN6r2E//qIeBbUPPd4+8/ho2uiLw36HH59Ozs/xO8v3V\nYKMiCo3iiC7R+gZwaIHjRKSRxHVNvNEOb7YXO/NlYGjo+VDyXRCBl7DuiBVum4n9+n+u0AUrHTd7\nFzCcwoFWRCT94oamDW6zLbAo2lbjCax12wq8go0s+1bkmLuxG3otwJpY4/M3cVWpNBAfWOF5IiLp\nUPkAjNXYIINpWKC9BrtRd5x7/QrgGWzo60KsC/cq4Om4C2ommYhkU3Wz5qa6LeyKyPNfu60kBWIR\nySZlXxMR8Uxr1omIeKYWsYiIZwrEIiKepSgxvAKxiGSTVnEWEfFMXRMiIp6lKBCXk69XCstNyPCn\nl6vnolklNGX4cwfoyvn9AM7iTG9l9+/fzKpV46H62JVjSJn/E7/VVIvyilKLWESyKUUtYgViEckm\nBWIREc80fE1ExLMUDV8rtUKHiEhjypW5FbY3luryOeC0Aq/vj63yM4/4RTI+ohaxiEjvlLOK81+w\n5PAAo7DFNLaIu6BaxCIivRNexXkV+VWcw8Ir0A+kxCrOahGLSEZVfLeu0CrOhdbhPAA4F9gY+Eqx\nC6pFLCIZtbrMrYdypzP9Edga2Be4sdiBahGLSEbFtYhnYYu+xypnFefoBfsB62OrxPfQyC3idYDj\nKzx3AnCyezwJOLgG9RGRVIlrAY8BTg1tPYRXcV4DW8V5SuSYEeSnRX/G/S0YhKGxW8RDgB8Al1Vw\nbnjgSvFBLCLSR1XcR1zOKs4HA0e4Qt4FDi12wUYOxOdh30rzgAewb6PDsaWtpwI/da9fAmwIvA8c\nAzzrzg8n+ch4mhmRRlTV1LpSqzj/ym1laeRAfBqwLbADsA9wBjbsZCWwrjvmSuxbbBF21/NSYM/E\nayoiHqzwXYGPNHIgDrdivwRcS37d1mXY2L4xwG2h49ZIpmoi4l96sv40ciAOy9Gze6EZC8g7FDmn\nqOmhI1qB4erAEKmDDrdBZ2ct/5GlJ+tPIwfi5cAg9/gvwM+Bm7DfI0OAt4DFwNeB27FAPQpY6M4p\n+V98dwVekQS0ug1aWprp6mqv0XXT0yJu5OFrbwAPAU9iCTemYMNO5pEfmnY4cDQwH3gK2C90fi7m\nsYg0hFVlbvXXyC1isEAbdn7keQd2Iy9qYujxUbWskIikRXpaxI0eiEVEYqiPWETEMw1fExHxTF0T\nIiKeqWtCRMQzBWIREc/UNSEi4ll6WsSNPKFDRKSIFWVuBZVaxRngYvf6AuJTKQAKxN4s9jxXz2f5\nHf6KBrL92QN0eJ8o2uG5/EDFSyUFqzjvDWwDfAtbEinsq9iqzVsCx1IiL7oCsScdGS7fZ9kq33/5\naaiBqXiKczmrOO8HXO8eP4ql3v14XE0UiEUkoypuERdaxXmTMo7ZNK4mulknIhlV8c26cvt2ovkZ\nffcJNaR28uvZadOmLbmtner1prx3Iud+Dvhz6PlP6XnD7nK6r1P3DEW6JkREpHf6Ac+TX8V5PoVv\n1t3nHn8O+FtSlRMRyYp9sIWGF2EtYrD1L48LHXOJe30B8JlEayciIiIiIiJ9QFIrr2wN7ImtGh62\nd0Llh33MQ5kiqfdkAmUMwwaczwZ+BvQPvfbHBMqPc6XHsqH7GM96ORHrR/wj8CJwQOi1eQmUH/g8\n8DT597w9cGkC5R4MHOT+RreDEii/z9A44vo7uMC+HDbGcOMEyr8WW6X6UWyh1BnYrJ+lwGZ1Lnu9\nmP1NwNfqXDYU/6LbKIHyjwV2BN7F7rDf7v5emEDZYRdiLfC73fP5wG4JlLsv9v96nDsTqEOfoEBc\nf7cAk4GuyP4mYEAC5W+IjWkEOAH4NjAT+0dSb0uxlmBcveptIywAvVXgtYcTKL8JC8Jg02HbgDuw\nL8DoYP96WxJ5nkQOyCMTKKMhKBDX35PAryncOtszgfL7YQF/pXv+e+BVYBqwdp3LfgF7j4WCcRJd\nA/difbOFugFmJFD+a1g3wHz3/F1gHHANsF0C5QeWAF9wj9fAukz+kUC5J5P/9RcInueA3yRQhz6h\nxXcFMuAf2D/Itwu8Nht4pc7lD8D+O3eE9r0AzAJGAzfWsewu4HUs8Ed1Yt0l9XQ3PVuCgTvqXDbA\ng9isrHdD+7qAu4C/El+3etRjPPalcDLwPvBD6r965l5Y4A9va4b+JvFlKFLSGp7LX9Nz+UnZgnw3\n0O5Yi3DdjJTfD7gpobJEUm8GMDz0fBdgYUbK/wYw2D3+OdYiTHKm0QIsIG0B/BO4gPz00yyUPxu/\nX7ojsV8Af3fPtwPO8FcdybK9sMQf/wmcg/VbJhmMfJYf9I+PxRK2jAMeS6hsyPcR/wT7SR7el4Xy\nbwQex74ET3bbSQmWPxP4LPn33EQ+KAu6WZekacDxwANYv+kOFO47bcTyO93fccBVwJ+AXyRUNsCH\nwGHAEeRHi/SPP7zhyn/ebc3YzcvgZllSPkb3+wE50rRgnGTKz4GngDFYYpBnscCUhfLvxSZwLMb6\nRgdgP9eTsi3wW2xJG4DNiV9nrBHL920q1i0TtIi/7vaJJO5CYK3Q882w1mkWyl8bm9iypXu+MfCV\nhMoOrImNEhmFn5ukPsufXmB7MMHyR2B9xO9jo4Qewia2iJP0oHKxm1Y5YHkGyh+MDd+Km2H3ZgJ1\nAJvFdzk2bA+sRXocyd0w813+TqHHA7AvxdXAqQmVHxiIdY9EE62LJGZn7KbVi25bQPd/II1Y/r3u\nbwfWLRHeXog5px6exX4aB0a4fVkpv5DHEyzrXLoP1xsC/DLB8kU+8iSwa+j5WJIdvua7fJ+iQaep\nwL5GLn+90LYBNu07yS+C+QX2JTlqJPU0aiI5q7HZbIHZJDPf32f5pYbHza1z+YE5WDfAH9zzQ4An\nyGcAq3fyGd/lzyU/SmI19gvl6DqXGdZM92n2a+F/MlOqqI84OcHNspvd829i/2MGU4zrHZR8lN9O\n8WFSu9ehzEImub9BXaLDt+qdm9h3+eEgWGxfvZyGZfy7FnvvRwFTgPMTKj/1FIiT0073f3zRf4z1\nDkq+yxd/5tLz10mhffXyK2ykxpew/+f+AuyBTXAREQ98JIT3PcXWV/kbY/mQn8GC7o7ub5vbl5RC\n/cFJLIrQZ6hFXH8nhx4XapHWOxWg7/Kj5mGz+pI0Exuqdbkruwmb3LJtg5f/XSwn8E5Yn3RgOdZd\nUu++6eOBH2CjRJ4P7R+EjSU+vM7l9xm6WVd/g0h2Omnayo96zUOZvqfY+ir/ercdTDJpP6MmYzPo\nzsP6iYOG33LgDQ/1EeEGbPxkYD3gugyV75PvKba+y9c4XhGn0FjKQvsasfyRWLKfB8jmFFvf5Wsc\nb8qpayI5TVgrNJjWux7JrpDis/zbgMuAq8lnYkuyu6SL/JL2wRTb4UXPaKzyNY5XxDkCm830C+xn\n4bNuXxbKn5NQOXEKtf6SrJPv8k/DWuFHA//hHmcp+1vqqUWcnBuwf3x7YK3BA4GnM1L+PVhC+juB\nD0L76530Z2tgG2AdbBZbMFJkMMmsoO27/MD52HT2YBzvWVh+akkJDV+TJHTQsysih2Uhq6f9sS+c\nfbGZXIHlwC3Aww1evohIaozJePnvYsF/OfaLpAulokyVZt8VkEyYg3VNDCl1YJ0chHUH9MdGLywF\nvpOh8gdi48kHYTfqDgIuTbB8EUmBLbEFSxcBt2ILmSbZLRYsy3QgcA3WZ5tkClDf5ReS5NBJKUE3\n6yQJzwE/w/IrjMOycHW5vxdR/5t2wf/n44DbgbdJdvic7/IPDj1uxnJOrEiwfClBgViSMhpLf7gP\nNt12Mpac/kFg+zqXfQ+W5GYllv9gI5JLAZmG8sOLxAb5iPdPsHwRSYE5WMA9DFtEM+yuhOqwPvkJ\nLGsDn0ioXN/ltwAnJVSWVEjD16SegsxvLRSeUZd05jewNJzHeijXZ/mPY2sWSkqpa0LqKcj8NhIL\nBFOwL/9xwGOe6uQ7IPkofzZwCXaj9L3Q/qSWqhKRFJiFBeXAILqvn5ck3zPKfJTfTj7ZUngTkQx5\nlu5Tegfgfzn5LCk0g7HesxqlF9Q1IUm4AeuKuBPrmjgAS1ielJHAKVjqyeD/+RyWdyML5d9Oz/Xp\nbsOGsUkKKBBLEs4G/gzsigWgI0k2H67vNJy+yg+SDq2L36RDUoICsSRlDv7SYa7CAqEvvsr/FJZw\naB33N7AcOMZDfSSGhq9JFkwAXif5NJxpKX8M8EhCZYmIFNQBLI5sL2So/Avwm3RIRCTz0ph0SEKU\nBlOywHcaTt/l+046JCUoEEsWHApsgk319ZGG03f5QdKhHbGuiaSTDomIfKQZ2A94GXgJmIitZp2F\n8n0nPRIRYTRwITaj72Lgc9gki6QSpPsuP3BlwuWJiAD+03D6Lj8syYk0UiaNI5ZG5jsNp+/yC5mG\n9VFLimhmnTQy32k4fZdfiIKwiHjhOw2n7/JHAlcBD5BPgflgguVLCWoRSxZshOV7CKxy+7JSvu+k\nR1KCArFkge80nL7L9530SErQzTrJih3Jp+GcSfKjB3yWPwG/SYekBAVikcbXQc+uiBxapUNERMS0\nlD5ERPq4OVgKzEUox0QqKemPSOPznXRIREQc30mPJIZaxCLZMBqbUn0BcAdwCLZ2nSZ2iIgkIE1J\nh6QA9ROJNK40Jh2SAjSzTqRxpTHpkIhIJvlOOiQl6GadSOPznXRISlDXhEjj8510SErQzTqRbPCd\n9EhEREREREREREREREREREREJOL/AQGL+2m2DGQqAAAAAElFTkSuQmCC\n", | |
"text": [ | |
"<matplotlib.figure.Figure at 0x3cd1b00>" | |
] | |
} | |
], | |
"prompt_number": 15 | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Finally, let's output a list for each paper of the other papers, sorted in order of decreasing similarity:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"collapsed": false, | |
"input": [ | |
"for paper in dist_df.columns:\n", | |
" sim_papers_df = dist_df.sort(columns=paper)[paper]\n", | |
" sim_papers = sim_papers_df.drop([paper]).index\n", | |
" print 'Papers most similar to ' + paper + ':'\n", | |
" print ', '.join(sim_papers)\n", | |
" print '\\n'" | |
], | |
"language": "python", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"stream": "stdout", | |
"text": [ | |
"Papers most similar to clpx1:\n", | |
"clpx2, dyn-structure, dyn-steps1, tcell, dyn-steps2, dyn-lis1\n", | |
"\n", | |
"\n", | |
"Papers most similar to clpx2:\n", | |
"clpx1, dyn-structure, dyn-steps1, tcell, dyn-steps2, dyn-lis1\n", | |
"\n", | |
"\n", | |
"Papers most similar to dyn-lis1:\n", | |
"dyn-steps1, dyn-steps2, dyn-structure, clpx2, clpx1, tcell\n", | |
"\n", | |
"\n", | |
"Papers most similar to dyn-steps1:\n", | |
"dyn-steps2, dyn-lis1, dyn-structure, clpx2, clpx1, tcell\n", | |
"\n", | |
"\n", | |
"Papers most similar to dyn-steps2:\n", | |
"dyn-steps1, dyn-lis1, dyn-structure, clpx2, clpx1, tcell\n", | |
"\n", | |
"\n", | |
"Papers most similar to dyn-structure:\n", | |
"dyn-steps1, clpx2, dyn-steps2, clpx1, dyn-lis1, tcell\n", | |
"\n", | |
"\n", | |
"Papers most similar to tcell:\n", | |
"clpx2, dyn-structure, clpx1, dyn-steps1, dyn-steps2, dyn-lis1\n", | |
"\n", | |
"\n" | |
] | |
} | |
], | |
"prompt_number": 16 | |
}, | |
{ | |
"cell_type": "heading", | |
"level": 3, | |
"metadata": {}, | |
"source": [ | |
"See also: <a href=\"http://www.frankcleary.com/svdimage\">SVD Image Compression</a>" | |
] | |
} | |
], | |
"metadata": {} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment