Skip to content

Instantly share code, notes, and snippets.

@frankcleary
Last active October 19, 2020 15:49
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save frankcleary/a89da479d85c98f86e31 to your computer and use it in GitHub Desktop.
Save frankcleary/a89da479d85c98f86e31 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:a96922a4a2453eba6073c16430fff29b217a257985a8e9e533c3647bd0d3d815"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Singular Value Decomposition and Applications"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Frank Cleary | <a href=\"http://www.frankcleary.com\">www.frankcleary.com</a> | See also: <a href=\"http://www.frankcleary.com/svdimage\">SVD Image Compression</a> | <a href=\"https://gist.github.com/frankcleary/a89da479d85c98f86e31\">Notebook Gist</a>"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The singular value decomposition of a matrix has many applications. Here I'll focus on an introduction to singular value decomposition and an application in clustering articles by topic. In another notebook (<a href=\"http://nbviewer.ipython.org/gist/frankcleary/4d2bd178708503b556b0\">link</a>) I show how singular value decomposition can be used in image compression.\n",
"\n",
"Any matrix $A$ can be decomposed to three matrices $U$, $\\Sigma$, and $V$ such that $A = U \\Sigma V$, this is called singular value decomposition. The columns of $U$ and $V$ are orthonormal and $\\Sigma$ is diagonal. Most scientific computing packages have a function to compute the singular value decomposition, I won't go into the details of how to find $U$, $\\Sigma$ and $V$ here. Some sources write the decomposition as $A = U \\Sigma V^T$, so that their $V^T$ is our $V$. The usage in this notebook is consistent with how numpy's singular value decomposition function returns $V$."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Example with a small matrix $A$:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If $A = \\begin{bmatrix} 1 & 0 \\\\ 1 & 2 \\end{bmatrix}$\n",
" \n",
"$A$ can be written as $U \\Sigma V$ where $U$, $\\Sigma$, and $V$ are, rounded to 2 decimal places:\n",
"\n",
"$U = \\begin{bmatrix} -0.23 & -0.97 \\\\ -0.97 & 0.23 \\end{bmatrix}$\n",
" \n",
"$S = \\begin{bmatrix} 2.29 & 0 \\\\ 0 & 0.87 \\end{bmatrix}$\n",
" \n",
"$V = \\begin{bmatrix} -0.53 & -0.85 \\\\ -0.85 & 0.53 \\end{bmatrix}$"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Interpretation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Although the singular value decomposition has interesting properties from a linear algebra standpoint, I'm going to focus here on some of its applications and skip the derivation and geometric interpretations.\n",
"\n",
"Let $A$ be a $m \\times n$ matrix with column vectors $\\vec{a}_1, \\vec{a}_2, ..., \\vec{a}_n$. In the singular value decomposition of $A$, $U$ will be $m \\times m$, $\\Sigma$ will be $m \\times n$ and $V$ will be $n \\times n$. We denote the column vectors of $U$ as $\\vec{u}_1, \\vec{u}_2, ..., \\vec{u}_m$ and $V$ as $\\vec{v}_1, \\vec{v}_2, ..., \\vec{v}_n$, similarly to $A$. We'll call the values along the diagonal of $\\Sigma$ as $\\sigma_1, \\sigma_2, ...$.\n",
"\n",
"We have that $A = U \\Sigma V$ where:\n",
"\n",
"$U = \\begin{bmatrix} \\\\ \\\\ \\\\ \\vec{u}_1 & \\vec{u}_2 & \\dots & \\vec{u}_m \\\\ \\\\ \\\\ \\end{bmatrix}$\n",
"\n",
"$\\Sigma = \\begin{bmatrix} \\sigma_1 & 0 & \\dots \\\\ 0 & \\sigma_2 & \\dots \\\\ \\vdots & \\vdots & \\ddots \\end{bmatrix}$\n",
"\n",
"$V = \\begin{bmatrix} \\\\ \\\\ \\\\ \\vec{v}_1 & \\vec{v}_2 & \\dots & \\vec{v}_n \\\\ \\\\ \\\\ \\end{bmatrix}$\n",
"\n",
"Because $\\Sigma$ is diagonal, the columns of $A$ can be written as:\n",
"\n",
"$\\vec{a}_i = \\vec{u}_1 * \\sigma_1 * V_{1,i} + \n",
" \\vec{u}_2 * \\sigma_2 * V_{2,i} + ... = U * \\Sigma * \\vec{v}_i$\n",
" \n",
"This is equivalent to creating a vector $\\vec{w}_i$, where the elements of $\\vec{w}_i$ are the elements of $\\vec{v}_i$, weighted by the $\\sigma$'s:\n",
"\n",
"$\\vec{w}_i = \\begin{bmatrix} \\sigma_1V_{1,i} \\\\ \\sigma_2V_{2,i} \\\\\n",
" \\sigma_3V_{3,i} \\\\ \\vdots \\end{bmatrix} = \\Sigma * \\vec{v}_i$\n",
" \n",
"Then $\\vec{a}_i = U * \\vec{w}_i$. That is to say that every column $\\vec{a}_i$ of $A$ is expressed by a sum over all the columns of $U$, weighted by the values in the $i^{th}$ column of $V$, and the $\\sigma$'s. By convention the order of the columns in $U$ and rows in $V$ is chosen such that the values in \n",
"$\\Sigma = \\begin{bmatrix} \\sigma_1 & 0 & \\dots \\\\ 0 & \\sigma_2 & \\dots \\\\ \\vdots & \\vdots & \\ddots \\end{bmatrix}$ obey $\\sigma_1 > \\sigma_2 > \\sigma_3 > ...$. This means that as a whole, the first column of $U$ and the first row of $V$ contribute more to the final values of $A$ than subsequent columns. This has applications in image compression (<a href=\"http://nbviewer.ipython.org/gist/frankcleary/4d2bd178708503b556b0\">link to another notebook</a>) and reducing the dimensionality of data by selecting the most import components."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Brief discussion of dimensionality"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This section isn't required to understand how singular value decomposition is useful, but I've included it for completeness.\n",
"\n",
"If $A$ is $m \\times n$ ($m$ rows and $n$ columns), $U$ will be $m \\times m$, $\\Sigma$ will be $m \\times n$ and $V$ will be $n \\times n$. However, there are only $r = rank(A)$ non-zero values in $\\Sigma$, i.e. $\\sigma_1, ..., \\sigma_r \\neq 0$; $\\sigma_{r+1}, ..., \\sigma_n = 0$. Therefore columns of $U$ beyond the $r^{th}$ column and rows of $V$ beyond the $r^{th}$ row do not contribute to $A$ and are usually omitted, leaving $U$ an $m \\times r$ matrix, $\\Sigma$ an $r \\times r$ diagonal matrix and $V$ an $r \\times n$ matrix.\n",
"\n"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Example with data:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Singular value decomposition can be used to classify similar objects (for example, news articles on a particular topic). Note above that similar $\\vec{a_i}$'s will have similar $\\vec{v_i}$'s.\n",
"\n",
"Imagine four blog posts, two about skiing and two about hockey. I've made up some data about five different words and the number of times they appear in each post:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"\n",
"c_names = ['post1', 'post2', 'post3', 'post4']\n",
"words = ['ice', 'snow', 'tahoe', 'goal', 'puck']\n",
"post_words = pd.DataFrame([[4, 4, 6, 2],\n",
" [6, 1, 0, 5],\n",
" [3, 0, 0, 5],\n",
" [0, 6, 5, 1],\n",
" [0, 4, 5, 0]],\n",
" index = words,\n",
" columns = c_names)\n",
"post_words.index.names = ['word:']\n",
"post_words"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>post1</th>\n",
" <th>post2</th>\n",
" <th>post3</th>\n",
" <th>post4</th>\n",
" </tr>\n",
" <tr>\n",
" <th>word:</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>ice</th>\n",
" <td> 4</td>\n",
" <td> 4</td>\n",
" <td> 6</td>\n",
" <td> 2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>snow</th>\n",
" <td> 6</td>\n",
" <td> 1</td>\n",
" <td> 0</td>\n",
" <td> 5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tahoe</th>\n",
" <td> 3</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>goal</th>\n",
" <td> 0</td>\n",
" <td> 6</td>\n",
" <td> 5</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>puck</th>\n",
" <td> 0</td>\n",
" <td> 4</td>\n",
" <td> 5</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
" post1 post2 post3 post4\n",
"word: \n",
"ice 4 4 6 2\n",
"snow 6 1 0 5\n",
"tahoe 3 0 0 5\n",
"goal 0 6 5 1\n",
"puck 0 4 5 0"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks like posts 1 and 4 pertain to skiing, and while posts 2 and 3 are about hockey."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Imagine the DataFrame <code>post_words</code> as the matrix $A$, where the entries represent the number of times a given word appears in the post. The singular value decomposition of $A$ can be calculated using numpy."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"\n",
"U, sigma, V = np.linalg.svd(post_words)\n",
"print \"V = \"\n",
"print np.round(V, decimals=2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"V = \n",
"[[-0.4 -0.57 -0.63 -0.35]\n",
" [-0.6 0.33 0.41 -0.6 ]\n",
" [ 0.6 -0.41 0.32 -0.61]\n",
" [-0.34 -0.63 0.58 0.39]]\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Recall that $\\vec{a}_i = U * \\Sigma * \\vec{v}_i$, that is each column $\\vec{v}_i$ of $V$ defines the entries in that column, $\\vec{a}_i$, of our data matrix, $A$. Let's label V with the identities of the posts using a DataFrame:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"V_df = pd.DataFrame(V, columns=c_names)\n",
"V_df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>post1</th>\n",
" <th>post2</th>\n",
" <th>post3</th>\n",
" <th>post4</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.395634</td>\n",
" <td>-0.570869</td>\n",
" <td>-0.630100</td>\n",
" <td>-0.347212</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.599836</td>\n",
" <td> 0.331743</td>\n",
" <td> 0.408279</td>\n",
" <td>-0.602870</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.604001</td>\n",
" <td>-0.405353</td>\n",
" <td> 0.321932</td>\n",
" <td>-0.605996</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-0.344752</td>\n",
" <td>-0.632253</td>\n",
" <td> 0.576751</td>\n",
" <td> 0.385695</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
" post1 post2 post3 post4\n",
"0 -0.395634 -0.570869 -0.630100 -0.347212\n",
"1 -0.599836 0.331743 0.408279 -0.602870\n",
"2 0.604001 -0.405353 0.321932 -0.605996\n",
"3 -0.344752 -0.632253 0.576751 0.385695"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note how post1 and post4 agree closely in value in the first two rows of $V$, as do post2 and post3. This indicates that posts 1 and 4 contain similar words (in this case words relating to skiing). However, the agreement is less close in the last two rows, even among related posts. This is because the weights of the last two rows, $\\sigma_3$ and $\\sigma_4$, are small compared to $\\sigma_1$ and $\\sigma_2$. Let's look at the values for the $\\sigma$'s."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sigma"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"array([ 13.3221948 , 9.2609512 , 2.41918664, 1.37892883])"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$\\sigma_1$ and $\\sigma_2$ are about an order of magnitude greater than $\\sigma_3$ and $\\sigma_4$, indicating that the values in the first two rows of $V$ are much more important than the values in the last two. In fact we could closely reproduce $A$ using just the first two rows of $V$ and first two columns of $U$, with an error of at most 1 word:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"A_approx = np.matrix(U[:, :2]) * np.diag(sigma[:2]) * np.matrix(V[:2, :])\n",
"\n",
"print \"A calculated using only the first two components:\\n\"\n",
"print pd.DataFrame(A_approx, index=words, columns=c_names)\n",
"print \"\\nError from actual value:\\n\"\n",
"print post_words - A_approx"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"A calculated using only the first two components:\n",
"\n",
" post1 post2 post3 post4\n",
"ice 3.197084 4.818556 5.325736 2.792675\n",
"snow 5.619793 0.588201 0.384675 5.412204\n",
"tahoe 4.043943 0.071665 -0.123639 3.917015\n",
"goal 0.682117 5.089628 5.762122 0.336491\n",
"puck 0.129398 4.219523 4.799185 -0.143946\n",
"\n",
"Error from actual value:\n",
"\n",
" post1 post2 post3 post4\n",
"word: \n",
"ice 0.802916 -0.818556 0.674264 -0.792675\n",
"snow 0.380207 0.411799 -0.384675 -0.412204\n",
"tahoe -1.043943 -0.071665 0.123639 1.082985\n",
"goal -0.682117 0.910372 -0.762122 0.663509\n",
"puck -0.129398 -0.219523 0.200815 0.143946\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To help visualize the similarity between posts, $V$ can be displayed as an image. Notice how the similar posts (1 and 4, 2 and 3) have similar color values in the first two rows:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"\n",
"plt.imshow(V, interpolation='none')\n",
"plt.xticks(xrange(len(c_names)))\n",
"plt.yticks(xrange(len(words)))\n",
"plt.ylim([len(words) - 1.5, -.5])\n",
"ax = plt.gca()\n",
"ax.set_xticklabels(c_names)\n",
"ax.set_yticklabels(xrange(1, len(words) + 1))\n",
"plt.title(\"$V$\")\n",
"plt.colorbar();"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAATMAAAEKCAYAAAB+LbI7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFWZJREFUeJzt3X2QnVV9wPHvZhPASBEpGPLaDQhBaxWiBgVtVgw2BUmg\nTotYJfWPDs4Y6lDrAGI1sW+A9aUWy6QQaBCUGVExDAYSlY04IBAMgarJJiUpCSERiVJU8rKb7R/n\n2ezN3XvvPs+9z9179zzfz8yZfZ57z3mec2Y3v5xznpcDkiRJkiRJkiRJkiRJkiRJkhSRacBtwC7g\nQyWfnwL8FLgeeHUL6iVJmc0G/qfss2nAX7SgLpJUt+OA/cC4ks+uaFFdJKkhvwK6ku33ATNaVxUp\nH+NbXQG1xNPATEIPbQLwTGurIzXOYFZMTwMnA2cAX2hxXSSpbp8DHiX0zqQo2DMrpl5ga5IkSZIk\nxWI+sBHYDFxZJU83sB74b6AnY1lJarpOYAvhVp8JwBPA68ryHEt4wmRasn98hrKpjRs5iyRVNYcQ\nkLYBB4A7gYVleT4AfBPYkez/MkPZ1AxmkhoxFdhesr8j+azUKYQnTx4A1jH0XHCasqk1fDVzxqlz\nB57pXdvoYSTVZy1hPqouR8HA3mxFXgKOKdkfSFFmAuGZ4HcDE4GHgR+nLJtaw8Hsmd610J1rnWrb\nugRmLhm98/WsGr1zAXA78MFRPN+jo3guCP85v2v0Trf4M6N3LoBHlsCZS0blVBe8Ae75SMfcRo6x\nF/jHDPk/Bb9X9tGzwPSS/ekMDScHbScMLV9O0g+BNyX5RiqbmsNMqeAmZEgVrCMMI7uAI4CLgZVl\neb4DvIMw4T8ROBP4WcqyqXnTrFRwDQaBPmAxcD8hWC0Hfg5clny/jHDrxX3Ak8BB4CZCMKNK2bqM\nvWB2bHera9Bkb2x1BZqsq9UVaK6p3a2uQWavaPwQq5JUalnZ/r8mKU3Zuoy9YPbq7lbXoMliD2aR\nPw46rbvVNcisyvBxzBl7wUxSrmIJArG0Q1Kd7JlJikIsQSCWdkiqkz0zSVEwmEmKQg63ZrQFg5lU\ncLEEgVjaIalODjMlRSGWIBBLOyTVyZ6ZpCjEEgRiaYekOtkzkxQFb82QFAV7ZpKiEEsQiKUdkuo0\nIUsU6GtaNRpmMJMKbnwkwSzNgia3ALuBp5pcF0ktMKEzfapiPuE9/5uBK2uc6q2EcPi+ks+2EdYG\nWE+DS4Wlicm3Av8O3NbIiSS1p0w9s+E6gRuAeYRl5x4jrLBUvjBJJ3AdYWGTUgOEdT/3NFQL0gWz\nB4l+FQqpuCYc2VDxOcAWQg8L4E5gIcOD2eXAXYTeWbmOhmqQcN1MqejGZ0jDTSUs8jtoR/JZeZ6F\nwI3Jfumq4QPA9whraP51/Y3wAoCkGlGgZ29INQzU/Db4EnBVkreDw3tiZwPPAScAawhzbw+mOOYw\n+QSzrUuGto/tLsBycFKL7OiBZ3sA2LQ1p2PWiALdR4c0aOn/DcvyLDC9ZH86oXdW6s2E4SfA8cCf\nAgcIc2vPJZ8/D3ybMGxtYTCbuSSXw0gawbTuQ2tzznoD9N67tPFjVr9KmcY64BTCvPpO4GLgkrI8\nJ5Vs3wrcQwhkE5OzvwS8EngPUHeD0gSzrwNzgd8njI0/nVRIUgwa69L0AYuB+wmBaTlh8v+y5Pvy\nlc1LnQh8q6QWdwCr661ImmaUR1lJMWnsaibAqiSVqhbEPlyy/TRwesNnT3gBQCq6SKJAJM2QVLdI\nokAkzZBUt8YuALQNg5lUdJFEgUiaIalukUSBSJohqW6RRIFImiGpbo3fmtEWDGZS0UUSBSJphqS6\neTVTUhQiiQKRNENS3SKJApE0Q1LdHGZKikIkUSCSZkiq21GtrkA+DGZS0TnMlBSFSKJAJM2QVLdI\nokAkzZBUt0iGma6bKRVdY+tmAswnLBG3GbiywvcLgQ3AeuBx4JwMZTM1Q1KRNRYFOoEbgHmEZece\nI6y8VLqi+feA7yTbf0RYUu61KcumZs9MKrojM6Th5gBbgG2EtTDvJPTESv22ZPto4JcZyqZmMJOK\nrrFh5lTCEpSDdiSflbuQ0ONaBfxNxrKpOMyUiq5GFOjZDD1bapYeSHmWu5P0TuCrwGkpy6WWSzDr\nWHtTHodpS/1XXzZypjFs4KFW16C5Om+Y0uoqNM8FM/I5To2rmd2nhTRo6X3DsjwLTC/Zn07oYVXz\nICHuHJfky1K2JoeZUtE1NsxcB5wCdAFHABcTJvFLnQx0JNuzk58vpCybqRmSiqyxKNAHLAbuJ/Tx\nlhPmxgaHNMuA9wGXEib5fwO8f4SydTGYSUXX+E2zq5JUalnJ9vVJSlu2LgYzqeh8a4akKEQSBSJp\nhqS6RfJspsFMKrpIokAkzZBUt0iiQCTNkFQ3h5mSouDVTElRsGcmKQqRRIFImiGpbpFEgUiaIalu\nkUSBSJohqW7OmUmKQiRRIJJmSKpb5Xf7jzkGM6noIokCkTRDUt0iiQKRNENS3SKJApE0Q1K9BiK5\nmumCJlLB9Y9Pn6qYD2wENgNXVvj+NOBhYC/w8bLvtgFPAuuBRxtphz0zqeBqBKk0OoEbgHmEZece\nI6ywVLowyQvA5YSFgMsNAN3AnoZqQbpgNh24DXhNcuL/BL7c6IkltYd9Rx6RIff+8g/mAFsIPSyA\nO4GFHB7Mnk/S+VUO2lHl80zSBLMDwBXAE8DRwOPAGhpYEkpS++jvbGjSbCqwvWR/B3BmhvIDwPeA\nfsKKTnWvKJ4mmO1KEoQ1734OTMFgJkWhv7HnmQYaPP3ZwHPACYRO0kbCqueZZR0tdwFnAI/UczJJ\n7aevRjB7qKePh3r6ahV/ljAVNWg6oXeW1nPJz+eBbxOGrU0PZkcDdwEfI/TQDhkYKF1RfRYdHbPq\nqYukEW1KEmza9KpcjthfIwyc2T2eM7uH9j+/dF95lnXAKYSOzk7gYuCSKocrnxubSLiA8BLwSuA9\nwNKU1R4mbTCbAHwTuB24e1gNOxbUe35JmcxKEsyaNYPe3jsaPmKDw8w+YDFwPyEwLSdMQV2WfL8M\nOJFwlfMY4CChQ/R6wkXFbyX5xgN3AKvrrUiaYNaRVPBnwJfqPZGk9tRgMANYlaRSy0q2d3H4UHTQ\nb4DTGz35oDTB7Gzggwzd2AZwNXBfXpWQ1Dr7yHJrRvtKE8x+hE8KSNGqNWc2lsTRCkl1y2GY2RYM\nZlLBGcwkRaHWfWZjicFMKjjnzCRFwWGmpCjsL9CtGZIi5pyZpCg4ZyYpCs6ZSYqCwUxSFJwzkxSF\n/RzZ6irkwmAmFZzDTElRcJgpKQremiEpCrEMM33polRw/XSmTlXMJywRtxm4skqeLyffbyCs8Jal\nbCr2zKSCa7Bn1gncAMwjLDv3GLCSw9fVPQ94LWEVpzOBG4G3pSybmsFMKrh9jd2aMQfYAmxL9u8E\nFnJ4QFoArEi2HwGOJazYNDNF2dQMZlLBNdgzmwpsL9nfQeh9jZRnKjAlRdnUDGZSwTUYzAZS5itf\nADh3BjOp4GrdZ7a1Zztbe7ZX/Z4w11W6JuZ0Qg+rVp5pSZ4JKcqmZjCTCq7WfWYzumcyo3vmof0H\nlj5cnmUdYWK/C9gJXAxcUpZnJWHV8zsJE/+/BnYDL6Qom1ouwezvBz6Sx2Ha0vievlZXoan6/iTy\n/8/Wxvz768/pKA0NM/sIgep+wtXJ5YQJ/MuS75cB3yVc0dwC/Bb48Ahl6xL5X7KkkeRw0+yqJJVa\nVra/OEPZuhjMpILb5xoAkmLgs5mSohDLs5kGM6ngDGaSouD7zCRFwTkzSVFwmCkpCvu9NUNSDJwz\nkxQF58wkRcE5M0lRMJhJioJzZpKi4JyZpCh4a4akKDjMlBQFh5mSohDL1cxxra6ApNbqpzN1yug4\nYA3QC6wmLP5byS2EBU6eKvt8CWG1pvVJml/rZAYzqeCaGMyuIgSzU4HvJ/uV3ErlQDUAfAE4I0n3\n1TqZw0yp4PZxZLMOvQCYm2yvAHqoHNAeJCw3V0nqxYPtmUkF18Se2STC8JHk56Q6qnc5sIGwDF21\nYSqQLpgdBTwCPAH8DPiXOiokqU01GMzWEOa6ytOCsnwDScriRmAmcDrwHPD5WpnTDDP3Au8Cfpfk\n/xHwjuSnpDGu1n1me3seZW/Po7WKn1vju93AicAuYDLwi4xVK81/M3BPrcxp58x+l/w8grDy8J6M\nlZLUpmrdZzah+ywmdJ91aP/FpV/JcuiVwCLguuTn3RmrNpnQIwO4iOFXOw+Tds5sHGGYuRt4gDDc\nlBSBJs6ZXUvoufUC5yT7AFOAe0vyfR14iHDVczvw4eTz64AnCXNmc4Erap0sbc/sIGHc+irgfqCb\ncGUCgB4OHsrYRQdd6S9ASMqkN0mwaVPN+fDUmnjT7B5gXoXPdwLnl+xfUqX8pVlOlvXWjBcJEfUt\nlASzbi+KSqPk1CTBrFl/QG/v1xo+4r79xXnQ/HigD/g18ApCt3FpMyslafT098Vxu2maVkwm3PA2\nLklfJdzNKykC/X1xPJuZJpg9BcxudkUktUaRgpmkiPUdMJhJisDB/jjCQBytkFQ/h5mSorA3jjAQ\nRysk1a+v1RXIh8FMKjqDmaQoGMwkReFAqyuQD4OZVHT9ra5APgxmUtE5zJQUhb2trkA+DGZS0dkz\nkxQFg5mkKEQSzHxFrFR0BzKkbI4jLEXXC6ym8rqXtZayTFP+EIOZVHT9GVI2VxGC0amEF7pWWs18\ncCnL04E3JttnZyh/iMFMKrq+DCmbBYS3VJP8vLBKvvKlLH+VsTzgnJmk5t2aMYmwPCXJz0lV8o0D\nfgKcTFjFfHApy7TlAYOZpMYuAKwhrFpe7pqy/YEkVVJzKcsU5QGDmaRawWxzD2zpqVX63Brf7SYE\nul2EhZF+MUJNBpeyfDMhmGUq75yZVHS15shmdsO5S4ZSNiuBRcn2IuDuCnmOZ+gq5eBSlk9kKH+I\nwUwquubdmnEtITj1Auck+wBTCD2wwe0fEALYI8A9DC1lWa18RQ4zpaJr3lsz9gDzKny+Ezg/2X6S\n6ktZVitfUS7B7LMfjeQW4go6vhL34u3LflxzTnXMO3jRR1tdheZ5ywWMuyeH4/iguaQoRNIXMZhJ\nReebZiVFwTfNSoqCw0xJUTCYSYqCc2aSorCv1RXIh8FMKjqHmZKi4DBTUhS8NUNSFBxmSoqCwUxS\nFJwzkxQFb82QFAWHmZKi4DBTUhQiuTXDNQCkomveIsDHEZai6wVWM7RwSSWdwHrCGgCDlgA7ks/X\nA/NrncxgJhVd84LZVYRgdiphkZKrauT9GGHx39L3uA8AXwDOSNJ9tU5mMJOKrnmrMy0AViTbK4AL\nq+SbBpwH3Ax0lH1Xvl+VwUwqun0ZUjaTCAv5kvycVCXfF4FPEFY2L3c5sAFYTu1hqsFMKrzGhplr\ngKcqpAVl+QY4fAg56L2ElcrXM7wXdiMwEzgdeA74fK1mpL2a2QmsI0zGXZCyjKSxoNbwcX8PHOip\nVfrcGt/tBk4EdgGTCUGr3FmEwHcecBRwDHAbcGlZ/ps5/OLAMGl7ZpUm5yTFoL9G6uyGo5YMpWxW\nAouS7UXA3RXyfBKYTuiBvZ+wuvmlyXeTS/JdROjxVZUmmNWanJM01jXvaua1hJ5bL3BOsg8wBbi3\nSpnSDtN1hBXPNwBzgStqnSzNMHNwcu6YFHkljTXNe5xpDzCvwuc7gfMrfL42SYMurZCnqpF6ZrUm\n5yTFoHm3ZoyqkXpmtSbnhjy6ZGh7andIknLX8/wAPb9Mdl7cmM9BC/Kg+SeTBGHM+ndU6vrNWZJr\npSRV1n1CB90nJDtvOY3P/qC3pfVpJ1nvM/NqpqS2lOWtGeWTc5LUNnwFkFR4bT6zn5LBTCq8OK4A\nGMykwrNnJikKL7e6ArkwmEmFZ89MUhScM5MUBXtmkqJgz0xSFOyZSYqCVzMlRcFhpqQoOMyUFAV7\nZpKiEEfPzHUzpcJr2oomxxHW1ewFVlN9Ed9jgbuAnxNWgXtbxvKAwUxS8xYBuIoQjE4Fvp/sV/Jv\nwHeB1wFvJAS1LOUBg5kkXs6QMlkArEi2VwAXVsjzKuCdwC3Jfh/wYobyhxjMpMJrWs9sEmFVc5Kf\nkyrkmQk8D9wK/AS4CZiYofwhXgCQCq/WXNimJFW1BjixwufXlO0PUHkNkfHAbGAx8BjwJcJw8tMp\nyx92IEmFVqvHdVKSBt1TnuHcGoV3EwLdLmAyYQ3ecjuS9Fiy/03gygzlDxl7w8xne1pdg6YaYFur\nq9BUvZGv79Xz/FhsYNOuZq4EFiXbi4C7K+TZBWwnTPIDvBv4aYbyhxjM2s62VlegqWJf5fHQAr1j\nStPmzK4l9Nx6gXOSfYApwL0l+S4H7gA2EK5m/vMI5StymCkVXtOeANgDzKvw+U7g/JL9DcBbM5Sv\nKJdgNnt6HkdJZ+dGmDKK5+uYPXn0Tgbs3Hk0U6aM3jmP75g9aucCmLhzJ8dPmTJ6JzypY/TOBbBr\nJ5w0Su078aSR86QSx1sz8vhN9wBzcziOpOzWAt0NlM86yfcrwp35kiRJkiSpvS0kPKA66M8J96n0\nE+4sHuvK2/c5woO4G4BvEZ5tG8vK2/cPhLY9QXi4eBQv9zRFefsGfRw4iPNPuRl795kNdxHw+pL9\np5LPftia6uSuvH2rgT8E3kS4/+bqVlQqR+Xtu57QttMJN0l+phWVylF5+yAE6HOB/x396qhZuoCN\nwO2E9xh9A3gF4S7gnwBPAsuBI5L81xJ6XRsIPZS3Ay8ATwPrOfy5iwdofc+si+a1D8I/lNub2YAR\ndNHc9l3NCDdKNlkX+bZvZpLvG4SbQ7dizywaXYSu9tuT/eXAp4BngNcmn60APkb4pW8sKXtM8vNW\n4M8qHLtdglmz2gfhQbkP5FfdzLpoTvv+KTnGRkZ4IV+TdZF/+xYCX0y2DWY5aodh5nbg4WT7dsJj\nC08DW5LPVgB/THjH0V7CH9RFHH6n3yjfGZlJs9p3DbAf+Fr+Vc6kGe27BpgB/BdD//BbJc/2TQQ+\nyeFD53b+2x1T2iGYld601wH8msN/wYPb/cAcwut13wvcV+UY7aYZ7fsr4DzgL/OsaJ2a+fv7GpUf\ncxlNebbvZEJvbwOhVzYNeBx4Td6VLqJ2CGYzGHrn9weAdYRf+MnJZx8iPGXwSsKQYxXwt4RJYoCX\nGOrSl2uH//Xybt984BOE4cre5lU7tbzbd0rJ9kLCXFMr5dm+pwgvGJyZpB2EqZCar7bR2NBFuM3g\nqwxNsB5F6MoPTrDeDEwgvM/oEcL/ak8S/ogAziJMuj5OmEC+iDA0eJnwepFVo9KSyrrIv32bCVfB\n1ifpP0alJZV1kX/77iL8o3+C8G6rVvZausi/faWexjmzaHQR/nBj1YXtG8u6iLt9UWmHYWY7z3fl\nwfaNbbG3T5IkSZIkSZIkSZIkSRrJ/wM9br5pjU4ufgAAAABJRU5ErkJggg==\n",
"text": [
"<matplotlib.figure.Figure at 0x3ce64e0>"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another thing the singular value decomposition tells us is what most defines the different categories of posts. The skiing posts have very different values from the hockey posts in the second row of $V$, i.e. $V_{2,1} \\approx V_{2, 4}$ and $V_{2,2} \\approx V_{2, 3}$ but $V_{2,1} \\neq V_{2, 2}$.\n",
"\n",
"Recall from above that:\n",
"\n",
"$\\vec{a}_i = \\vec{u}_1 * \\sigma_1 * V_{1,i} + \n",
" \\vec{u}_2 * \\sigma_2 * V_{2,i} + ...$\n",
" \n",
"Thus the posts differ very much in how much the values in $\\vec{u}_2$ contribute to their final word count. Here is $\\vec{u}_2$:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"pd.DataFrame(U[:,1], index=words)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>ice</th>\n",
" <td> 0.018526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>snow</th>\n",
" <td>-0.678291</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tahoe</th>\n",
" <td>-0.519801</td>\n",
" </tr>\n",
" <tr>\n",
" <th>goal</th>\n",
" <td> 0.370263</td>\n",
" </tr>\n",
" <tr>\n",
" <th>puck</th>\n",
" <td> 0.363717</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
" 0\n",
"ice 0.018526\n",
"snow -0.678291\n",
"tahoe -0.519801\n",
"goal 0.370263\n",
"puck 0.363717"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From this we can conclude that, at least in this small data set, the words 'snow' and 'tahoe' identify a different class of posts from the words 'goal' and 'puck'."
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Identifying similar research papers using singular value decomposition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Moving on from the simple example above, here is an application using singular value decomposition to find similar research papers.\n",
"\n",
"I've collect several different papers for analysis. Unfortunately due to the sorry state of open access for scientific papers I can't share the full article text that was used for analysis. <em>Cell</em>, for example, cautions that <b>\"you may not copy, display, distribute, modify, publish, reproduce, store, transmit, post, ...\"</b> Yikes. However I did chose articles such that you should be able to download the pdf's from the publisher for free.\n",
"\n",
"<h3>Here are the papers included in analysis (with shortened names in parentheses):</h3>\n",
"\n",
"<h4>Two papers on the molecular motor ClpX, describing very similar experiments:</h4>\n",
"<li><a href=\"http://www.cell.com/retrieve/pii/S0092867411004296\">ClpX(P) Generates Mechanical Force to Unfold and Translocate Its Protein Substrates</a> (clpx1)\n",
"<li><a href=\"http://www.cell.com/retrieve/pii/S0092867411003138\">Single-Molecule Protein Unfolding and Translocation by an ATP-Fueled Proteolytic Machine</a> (clpx2)\n",
"\n",
"<h4>Papers on a very different molecular motor, <a href=\"http://www.frankcleary.com/research\">dynein</a>:</h4>\n",
"<li><a href=\"http://www.cell.com/fulltext/S0092-8674(12)00928-2\">Lis1 Acts as a \u201cClutch\u201d between the ATPase and Microtubule-Binding Domains of the Dynein Motor</a> (dyn-lis1)\n",
"<li><a href=\"http://www.cell.com/abstract/S0092-8674(06)00862-2\">Single-Molecule Analysis of Dynein Processivity and Stepping Behavior</a> (dyn-steps1)\n",
"<li><a href=\"https://reck-peterson.med.harvard.edu/sites/reck-peterson.med.harvard.edu/files/publication_pdf/Qiu_2012.pdf\">Dynein achieves processive motion using both stochastic and coordinated stepping</a> (dyn-steps2)\n",
"<li><a href=\"http://www2.mrc-lmb.cam.ac.uk/groups/cartera/pdffiles/2012_Schmidt_NSMB.pdf\">Insights into dynein motor domain function from a 3.3-A crystal structure</a> (dyn-structure)\n",
"\n",
"<h4>A paper on T-cell signaling:</h4>\n",
"<li><a href=\"https://valelab.ucsf.edu/external/publications/2012jamesNature.pdf\">Biophysical mechanism of T-cell receptor triggering in a reconsistuted system</a> (tcell)"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Reading in the data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To start, we'll need to read in the word counts for each paper. I used python <a href=\"http://www.unixuser.org/~euske/python/pdfminer/\">PDFMiner</a> to convert the pdf documents to plain text. I also used a list of \"stop words\" (<a href=\"http://norm.al/2009/04/14/list-of-english-stop-words/\">link</a>), words such as \"the\", and \"and\", that appear in all English documents."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"with open('input/stopwords.txt') as f:\n",
" stopwords = f.read().strip().split(',')\n",
" stopwords = set(stopwords) # use a set for fast membership testing"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import collections\n",
"import os\n",
"import re\n",
"\n",
"def word_count(fname):\n",
" \"\"\"Return a collections.Counter instance counting\n",
" the words in file fname.\"\"\"\n",
" \n",
" with open(fname) as f:\n",
" file_content = f.read()\n",
" words = re.split(r'\\W+', file_content.lower())\n",
" words = [word for word in words \n",
" if len(word) > 3 and word not in stopwords]\n",
" word_count = collections.Counter(words)\n",
" return word_count\n",
" \n",
" \n",
"file_list = ['input/papers/' + f for f in os.listdir('input/papers/')\n",
" if f.endswith('.txt')]\n",
"word_df = pd.DataFrame()\n",
"for fname in file_list:\n",
" word_counter = word_count(fname)\n",
" file_df = pd.DataFrame.from_dict(word_counter,\n",
" orient='index')\n",
" file_df.columns = [fname.replace('input/papers/', '').replace('.txt', '')]\n",
" # normalize word count by the total number of words in the file:\n",
" file_df.ix[:, 0] = file_df.values.flatten() / float(file_df.values.sum())\n",
" word_df = word_df.join(file_df, how='outer', )\n",
"\n",
"word_df = word_df.fillna(0)\n",
"print \"Number of unique words: %s\" % len(word_df)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Number of unique words: 5657\n"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are the results, sorted by the most common words in the first paper:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"word_df.sort(columns=word_df.columns[0], ascending=False).head(10)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>clpx1</th>\n",
" <th>clpx2</th>\n",
" <th>dyn-lis1</th>\n",
" <th>dyn-steps1</th>\n",
" <th>dyn-steps2</th>\n",
" <th>dyn-structure</th>\n",
" <th>tcell</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>clpx</th>\n",
" <td> 0.027648</td>\n",
" <td> 0.006701</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000535</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unfolding</th>\n",
" <td> 0.019516</td>\n",
" <td> 0.021117</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000268</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>force</th>\n",
" <td> 0.016060</td>\n",
" <td> 0.007919</td>\n",
" <td> 0.000666</td>\n",
" <td> 0.000170</td>\n",
" <td> 0.001911</td>\n",
" <td> 0.001071</td>\n",
" <td> 0.001265</td>\n",
" </tr>\n",
" <tr>\n",
" <th>figure</th>\n",
" <td> 0.012604</td>\n",
" <td> 0.009137</td>\n",
" <td> 0.011322</td>\n",
" <td> 0.011923</td>\n",
" <td> 0.001699</td>\n",
" <td> 0.002142</td>\n",
" <td> 0.001898</td>\n",
" </tr>\n",
" <tr>\n",
" <th>translocation</th>\n",
" <td> 0.011588</td>\n",
" <td> 0.014213</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.001265</td>\n",
" </tr>\n",
" <tr>\n",
" <th>clpxp</th>\n",
" <td> 0.011384</td>\n",
" <td> 0.021117</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>motor</th>\n",
" <td> 0.009555</td>\n",
" <td> 0.001218</td>\n",
" <td> 0.009491</td>\n",
" <td> 0.011923</td>\n",
" <td> 0.018896</td>\n",
" <td> 0.009103</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>substrate</th>\n",
" <td> 0.008538</td>\n",
" <td> 0.018071</td>\n",
" <td> 0.000167</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000212</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000316</td>\n",
" </tr>\n",
" <tr>\n",
" <th>velocity</th>\n",
" <td> 0.008335</td>\n",
" <td> 0.002640</td>\n",
" <td> 0.005495</td>\n",
" <td> 0.002044</td>\n",
" <td> 0.000637</td>\n",
" <td> 0.000803</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>extension</th>\n",
" <td> 0.007522</td>\n",
" <td> 0.001015</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.001071</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
" clpx1 clpx2 dyn-lis1 dyn-steps1 dyn-steps2 \\\n",
"clpx 0.027648 0.006701 0.000000 0.000000 0.000000 \n",
"unfolding 0.019516 0.021117 0.000000 0.000000 0.000000 \n",
"force 0.016060 0.007919 0.000666 0.000170 0.001911 \n",
"figure 0.012604 0.009137 0.011322 0.011923 0.001699 \n",
"translocation 0.011588 0.014213 0.000000 0.000000 0.000000 \n",
"clpxp 0.011384 0.021117 0.000000 0.000000 0.000000 \n",
"motor 0.009555 0.001218 0.009491 0.011923 0.018896 \n",
"substrate 0.008538 0.018071 0.000167 0.000000 0.000212 \n",
"velocity 0.008335 0.002640 0.005495 0.002044 0.000637 \n",
"extension 0.007522 0.001015 0.000000 0.000000 0.000000 \n",
"\n",
" dyn-structure tcell \n",
"clpx 0.000535 0.000000 \n",
"unfolding 0.000268 0.000000 \n",
"force 0.001071 0.001265 \n",
"figure 0.002142 0.001898 \n",
"translocation 0.000000 0.001265 \n",
"clpxp 0.000000 0.000000 \n",
"motor 0.009103 0.000000 \n",
"substrate 0.000000 0.000316 \n",
"velocity 0.000803 0.000000 \n",
"extension 0.001071 0.000000 "
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now to calculate the singular value decomposition of this data."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"U, sigma, V = np.linalg.svd(word_df)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a look at $V$, with the column names added:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"v_df = pd.DataFrame(V, columns=word_df.columns)\n",
"v_df.apply(lambda x: np.round(x, decimals=2))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>clpx1</th>\n",
" <th>clpx2</th>\n",
" <th>dyn-lis1</th>\n",
" <th>dyn-steps1</th>\n",
" <th>dyn-steps2</th>\n",
" <th>dyn-structure</th>\n",
" <th>tcell</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-0.19</td>\n",
" <td>-0.20</td>\n",
" <td>-0.55</td>\n",
" <td>-0.48</td>\n",
" <td>-0.53</td>\n",
" <td>-0.27</td>\n",
" <td>-0.15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.61</td>\n",
" <td>-0.59</td>\n",
" <td> 0.25</td>\n",
" <td> 0.13</td>\n",
" <td> 0.20</td>\n",
" <td>-0.03</td>\n",
" <td>-0.41</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.33</td>\n",
" <td> 0.28</td>\n",
" <td>-0.09</td>\n",
" <td> 0.08</td>\n",
" <td> 0.08</td>\n",
" <td>-0.05</td>\n",
" <td>-0.89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>-0.09</td>\n",
" <td>-0.05</td>\n",
" <td>-0.77</td>\n",
" <td> 0.32</td>\n",
" <td> 0.53</td>\n",
" <td> 0.01</td>\n",
" <td> 0.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 0.07</td>\n",
" <td> 0.04</td>\n",
" <td> 0.14</td>\n",
" <td> 0.14</td>\n",
" <td> 0.14</td>\n",
" <td>-0.96</td>\n",
" <td> 0.10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>-0.68</td>\n",
" <td> 0.73</td>\n",
" <td> 0.02</td>\n",
" <td> 0.03</td>\n",
" <td>-0.05</td>\n",
" <td>-0.02</td>\n",
" <td>-0.03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> 0.01</td>\n",
" <td> 0.07</td>\n",
" <td> 0.09</td>\n",
" <td>-0.79</td>\n",
" <td> 0.60</td>\n",
" <td>-0.01</td>\n",
" <td> 0.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
" clpx1 clpx2 dyn-lis1 dyn-steps1 dyn-steps2 dyn-structure tcell\n",
"0 -0.19 -0.20 -0.55 -0.48 -0.53 -0.27 -0.15\n",
"1 -0.61 -0.59 0.25 0.13 0.20 -0.03 -0.41\n",
"2 0.33 0.28 -0.09 0.08 0.08 -0.05 -0.89\n",
"3 -0.09 -0.05 -0.77 0.32 0.53 0.01 0.10\n",
"4 0.07 0.04 0.14 0.14 0.14 -0.96 0.10\n",
"5 -0.68 0.73 0.02 0.03 -0.05 -0.02 -0.03\n",
"6 0.01 0.07 0.09 -0.79 0.60 -0.01 0.00"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here are the values of $V$ represented as an image:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"plt.imshow(V, interpolation='none')\n",
"ax = plt.gca()\n",
"plt.xticks(xrange(len(v_df.columns.values)))\n",
"plt.yticks(xrange(len(v_df.index.values)))\n",
"plt.title(\"$V$\")\n",
"ax.set_xticklabels(v_df.columns.values, rotation=90)\n",
"plt.colorbar();"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAS0AAAFDCAYAAABvHVjEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHA5JREFUeJzt3Xu8VGW9x/HPZgMCogIHMQUV8A4C3g3NHC+llXfLLloe\ns/JVWZ6yjmYXN9U5x9vpYJmVmqZ5vJRo2tEyU0clb4AXbgopkqghkjfMG3sz54/fGvYwzJ5Za9bl\nmWfW9/16rdfMmr1m/R6U/eN5nvVcQERERERERERERERERERERERERERERHJpDHAVsBz4dMXn2wEL\ngPOA4Q7KJSLSp92Ap6s+GwMc56AsIiINjQDeBfpVfPY1R2UREQnlFWBs8P5YYCt3RRGJpr/rAogT\nS4BxWI1rAPCs2+KIhKeklU9LgG2AXYEfOS6LiEhD5wMPY7UtEa+oppVPi4FngkNERERERERERERE\nREQkIRP2370E6NChw81RJIZB0eO9HCdeEjoSuEfpuNKvmv7y/K6b2Lnr6Ka/31PqbPq7AAu7bmRC\n1zFNf3/GWSfEis99XbBfV/Pf3zFeeG7qgqNjxF8aM/7dXXBA8/GnnP1grPDLuy7jPV2fa/r7+3B/\nrPizum5nz65DmvruJMbzpY6jId7vcemHES7+jr0kkTeapnFaIjk3wHUBIlLSEsk535KA8/KOKsRt\n38SzaWEnp/HZquA2/o6O4491G39oYTen8bcobOM0PqimFdkox0nDedLauuA2/k6O449zG9910hpd\n2NZpfGiBJBCRb+UVkYSppiUiXhnsugARKWmJ5JxvScC38opIwtQ8FBGv+Ja0+jW+hEOBJ4G/Amek\nWxwRyVr/CEcraJS0OoGLsMQ1Afgk4HiMgIgkaUCEow9hKjYF4FFgPjHnSzZKnnsBT9E7w+w64Ejg\niThBRaR1xKxBlSs2BwPPA7OAW1g3RwwDfgocAjwHjIwTsFF5RwPLKs6fA/aOE1BEWkvMIQ9hKjaf\nAmZg+QNgZZyAjZqHpTg3F5HWF7N5WKtiM7rqmu2wnc3vBmYDn45T3kY1reeBLSvOt6Q3W641v+um\nte9HFXZ0PjVHpF09X3yKF4pPA/A3hidyz5jNwzAVmwHAbsBBwBDgAeBBrA8sskblnY1lybHAC8DH\nsc74dcRZD0tEwhtd2HbtfMVJjOe2adfHvme9IQ8PB0cdYSo2y7Am4VvBcS8whZSSVjdwKnA71uH2\nS9QJL9JW6iWBfYKj7OL1LwlTsbkZ66zvBDbA+sWb3tk8TM3wD8EhIm0o5uDSvio2pwQ//wU2HOKP\nwFxgDXApsLDZgK0yXkxEHElgRHytis0vqs4vCI7YlLREcm5wlCzQnVoxQlPSEsm5/kpaIuKTAfE2\ntMqckpZIzkWqabUAz4orIkkb4FkW8Ky4IpI4NQ9FxCueZQHPiisiiRvkugDRKGmJ5J2ahyLiFc+y\ngGfFFZHEeZYFEinub447MYnbNKVjqNt1Ctd8IszeIOl5/UC3e6lsPH+10/jXcqTT+MfPuanxRSk5\nfJOEbqTmoYh4xbMs4FlxRSRxnmUBz4orIonbwHUBolHSEsk7z7KAZ8UVkcR5lgU8K66IJM6zp4du\nn9eLiHv9Ixy1HYqtA/9X4Iw6kfbElhE8Jm5xRSTP4mWBTmynnYOx7cRmAbew/q5dncC52AYXHXEC\nKmmJ5F28p4d7AU8BS4Pz64AjWT9pfQW4AattxaLmoUjexWsejsY2Yy17Lvis+pojgZ8F57GmsYSp\naV0OfARYAUyKE0xEWlCdjvji36G4vO63wySg6cCZwbUdZNA8vAL4CXBVnEAi0qLqZIHClnaUTXt8\nvUueByquYEustlVpd6zZCDAS+BCwGuv7iixM0roP2/JaRNpRvJ7t2cB2WI54Afg48Mmqa8ZXvL8C\n+D1NJixQR7yIxMsC3cCpwO1YQ/OXWCf8KcHPq3eajk1JSyTv4g8u/UNwVOorWZ0UN1gySWtBV+/7\nTQswqpDIbUWkyuwizCkCsCiptd1zuUb8xK5EbiMiDexRsAPYYRNY/JNp8e/ZhtN4rgXuB7bHxmPE\nrt6JSAuJP40nU2GKUf0kQETaSYsko7A8K66IJM6zLOBZcUUkcZ71aSlpieSdZ1nAs+KKSOK0RryI\neMWzLOBZcUUkcZ5lAc+KKyKJU0e8iHjFsyzgWXFFJHGeZQHPiisiifMsC3hWXBFJnIY8iIhXPMsC\niRR3zS4ON/XZyl1ogAs++CWn8Y/iZqfxV+ziNDxvlQa7LcCeM9zFPnzzZO7j2dNDbSEmknfp7zB9\nPPA4MBf4CzA5bnFFJM/S32F6CfB+4DUswV0CvLfZgEpaInkXr3kYZofpByrePwSMiRNQSUsk7+Jl\ngVo7TO9d5/qTgdviBFTSEsm7eBtbRNni/gDgs8C+cQIqaYnkXZ3mYXGWHXWE2WEarPP9UqxP65Wo\nRaykpCWSd3WyQGGqHWXTfrbeJWF2mN4KuBE4Aev/ikVJSyTv0t9h+nvAcKCc8lZjHfhNUdISybv0\nd5j+XHAkQklLJO88ywJhRsRvCdwNLADmA19NtUQikq023Kx1NfA14DFgKDAHuIN1B4+JiKdKbbjK\nw/LgAHgDS1ZboKQl0hZ6WqQGFVbU4o4FdsWG4otIG2jnpDUUuAE4DatxiUgb6O6MstjLmtTKEVbY\npDUAmAFcDfyu+oddd/aO5C+Mg8L4jkQKJyJVSvOxZ2Kw6MmNErllT/8odZd3E4kZR5jSdmADxhYC\n02td0HWQkpRIJjp2BnYGYIcdN2fx4stj37Kn069VAMMkrX2x4fdzgUeDz74F/DGtQolIdt5hYISr\n30qtHGGFSVoz0QqnIm2rp1UGYIXkV2lFJHE9ni0Sr6QlknNKWiLiFSUtEfFKt5KWiPhEHfEi4pV3\nIw15cE9JSyTnfGseavyVSM710D/00YdGO0wD/Dj4+ePYogtNU01LJOdiPj0Ms8P0h4FtsQ0w9sbW\nitcO0yLSnJhJK8wO00cAVwbvHwKGAZsBLzYTUElLJOdiJq0wO0zXumYMSloi0oyYHfFhd5iuXgom\nys7U60gkaZ17lru9LkaVVjiLDXBm54VO43/zRz91Gp8L3IZfumxZ44tStKznlMYXpWQQh7BpAo/S\n3qXvReLnFl9hXrHuhtBhdpiuvmZM8FlTVNMSybl6zcOJhZFMLIxce37ttGeqLwmzw/Qt2Iau12Ed\n8K/SZNMQlLREci9m8zDMDtO3YU8QnwL+CZwUJ6CSlkjOJTCNp9EO02CJLRFKWiI5p1UeRMQrSloi\n4hXf5h4qaYnkXL0hD61ISUsk59Q8FBGvKGmJiFfasU9rEHAPsAEwELgZ26xVRNpAOy63/DZwAPBm\ncP1M4H3Bq4h4rl2bh28GrwOxofovp1McEclauyatfsAjwDbYqoMLUyuRiGTqHc+GPIRd2GINsAu2\npMT7gUJaBRKRbPXQGfpoBVF74F4DbgX2AIrlD//c9eDaC8YXxjC+MCaJsolIlQeKq3mguBqA/sxN\n5J6tkozCCpO0RmLLT7wKDAY+AEyrvODgrqbXqBeRCKYWBjC1MACAQUzm/O/Pj33Pdkxam2OL0vcL\njl8Dd6ZZKBHJTjuO05oH7JZ2QUTEjXYcpyUibexdBrouQiRKWiI551vzMIG9PETEZ5Xb3jc6IhoB\n3AEsBv6EbdJabUvgbmABMB9ouLWXkpZIzqU4TutMLGltjz28O7PGNauBrwETsZ16vgzsVO+mSloi\nOZdi0joCG3lA8HpUjWuWA48F79/AdvLZot5N1aclknMpjtPajN79DV8MzusZC+wKPFTvIiUtkZyL\n2RF/B/CeGp9/u+q8FBx9GQrcAJyG1bj6pKQlknP11oh/qbiQlcW66yN8oM7PXsQS2nJskPqKPq4b\nAMwArgZ+Vy8YKGmJ5F695uGIwiRGFCatPV80bUaUW98CnAicG7zWSkgd2K7UC4HpYW6qjniRnOum\nM/QR0TlYTWwxcGBwDtbRfmvwfl/gBGyh0UeD49B6N1VNSyTnUpzG8zJwcI3PXwA+EryfScTKk5KW\nSM614yoPDR3VuO8sNas6NnIWG+CN1wY4jb+6x2l4Sp9xG//QzgOdxn+2x90ivh0J3SeXSUtE/OXb\n3EMlLZGcqzfkoRUpaYnknJqHIuIVJS0R8Yr6tETEK1puWUS8ouahiHhFSUtEvPKOhjyIiE/atabV\nCcwGngMOT684IpK1dk1ap2Hr3bid6CciietZ41fSCrMkxBjgw8BlJDdHU0RaRHd3Z+ijFYSpaf0P\n8E1g45TLIiIO9HT71bXdqLSHYes6PwoUUi+NiGSup0VqUGE1Slr7YHuXfRgYhNW2rgLWWUXpoq5X\n177fqzCIvQqDki2liABwf3E1DxRXA9CfuYnc8523BiZynxpGANcDWwNLgeOAV/u4NvTDvih9VPsD\n36hxw9ITpa0j3CZZqxw/G9j5jQVO4/fvqbcrU/pKa5yGZ/DI7zmN/2zPRc5iD+JQRnZcA/H6mks8\n/3b4q0cPihLvPGBl8HoGMJzau0wDfB3YHXvYd0S9m0bd2MLtb4iIJK+7M/wRTZgdpiHiw74oPXD3\nBIeItJP0+rTC7jAd6WGfX48NRCR53bFGMsXdYTrywz4lLZG8667zs4eLMKtY79txd5gO9bCvkpKW\nSN7V64efXLCj7OJpUe4cZofps4IDeh/21d3jSTtMi+Td6ghHNGF2mK7W8GGfaloieZfe3plhdpiu\nFOphn5KWSN7V69NqQUpaInmnpCUiXlHSEhGvKGmJiFciTD1sBUpaInkXfSiDU4kkrQkTliRxm6ac\nvcjtWkCreqY6jd+Z4vPqMJ5mG6fxr+1+xGn8e9nPWezRTEjmRm7/CkWmmpZI3qlPS0S8oqQlIl5R\n0hIRryhpiYhXNORBRLySxyEPIuIxDXkQEa+oT0tEvKKkJSJeUdISEa+0aUf8UuB1rMtuNbBXWgUS\nkYy9k9qdRwDXA1tjOeQ44NUa1w3DNmqdiK0R/1ngwb5uGnZjixK2J9muKGGJtJfuCEc0Z2L7Im4P\n3Bmc13IhcBuwEzAZeKLeTaPsxhNrR0cRaVHp7cZzBHBl8P5K4Kga12wC7AdcHpx3A6/Vu2mUmtaf\ngdnA50N+R0R80BPhiGYzbMNWgtfNalwzDngJuAJ4BLgUGFLvpmH7tPYF/g5silX3ngTuK/+w9FLF\nBo5D9qdjw0LI24pIFAuLK1lYXAnAxryezE3jPT28A9tFutq3q85L1N7TsD+wG3AqMAuYjjUjv9dX\nwLBJ6+/B60vATVi/1tqk1bHp2SFvIyJxTCiMZEJhJACj2ZNfTbs7/k3rJa0Xi7CiWO/bH6j3bSyh\nLQc2B1bUuOa54JgVnN9A331fQLikNQToBFYBGwIfBCLtjS0iLaxeX9WIgh1l8yP96t8CnAicG7z+\nrsY1y4FlWGf9Ymxz1wX1bhomaW2G1a7K1/8v8KdQRRaR1pfekIdzgN8AJ9M75AFgC6zvqrzL9Few\nvDIQeBo4qd5NwyStZ4BdIhdXRPyQ3oj4l7GaU7UX6E1YAI8De4a9qUbEi+Rdm46IF5F2paVpRMQr\nmjAtIl5R0hIRr2iNeBHximpaIuIVJS0R8YqGPIiIVzTkQUS8ouahiHglj0nr1gUHJXGbpoxja2ex\nAQaUljqNv33nQqfxH+0Y4TT+Uz2HOo0/hDedxR7MW8ncSEMeRMQreaxpiYjHlLRExCsa8iAiXtGQ\nBxHxSq3tJlpYlH0PRUSiGIHt1rMYW6J9WB/XfQtbF34ecA2wQb2bKmmJSFrC7DA9FttLdTdgEraJ\nzifq3VRJSyT3UttiOswO068HNx6CdVcNAZ6vd1P1aYnkXmpjHsLsMP0y8N/As8BbwO3YbvZ9UtIS\nyb1YYx7i7jC9DfBvWDPxNeC3wPHYlmI1hUlaw4DLgIlB0M8CD4b4noh4oV5Naybwl3pfjrvD9B7A\n/cA/gvMbgX2ImbQuBG4DPhpcv2GI74iIN+rVtPYOjrLzotw4zA7TTwLfBQZjsyAPBh6ud9NGHfGb\nAPsBlwfn3VgVTkTaRmod8edgNbHFwIHBOdgO07cG7x8HrgJmA3ODzy6pd9NGNa1xwEvAFcAUYA5w\nGjic2i4iCUtotYj1hd1h+jwiVOEa1bT6Y+MnLg5e/0ntsRYi4q3uCId7jWpazwXHrOD8Bmokrau7\nnln7fnJhGJMLw5Mqn4hUmFd8hXnFVwDYMLGZzn7NmG6UtJYDy7ARrYuxqt6C6otO6BqXfMlEZD2T\nCsOZFFQKNmMql027L4G7tkYNKqwwTw+/gj1+HAg8DZyUaolEJGPtVdMC693fM+2CiIgr7VfTEpG2\n1n41LRFpa6kNeUiFkpZI7ql5KCJeUfNQRLyipCUiXlHzUES8opqWiHjFr6eHzteInxvMo3Ll4eLb\nTuM/WHzHafxSaabT+LNKbvevml/8R+OLUjTP8d9/49eE6RZIWq86je86aT1UfNdp/BJuk9Zsx3vu\nLSi+7DR+aySt1NbTSoWahyK51xo1qLCUtERyrzVqUGF1JHCPIrB/AvcRkejuAQoxvh+1gf4KtnO0\niIiIiIiIiIiIiESQ1RLaOwEHAUOrPj80o/iVhjiIKW1kXgYxtgKuw/b5PgsYUPGzWjvfZqnu5pQZ\nWJZBjK8Ci7D/1n8Djqr42aMZxC/bB1hI7595F2yLvLQdCxwTvFYfx2QQv21kOU7r2BqflbBhF5tn\nEP9ybAu0h4CTsUfFRwArga0ziN/XY+IO1t24Mi31/mEYlUH8LwC7A28AY7H/F2OB6RnErjQdq9nd\nHJw/RjZDdg6n/vCCGzMoQ1vIMmldB1wDrKn6vAMYlEH8TYGfB+9PBU4A7sX+MmVhJVbDqGXTDOKP\nwn5Za80buT+D+B1YwgJYio0tmoH9g5HEeMEonq06z2JI+L9mECMXskxa84ALqP0v/kEZxO+PJcfy\nZMOrsX0dbwc2zCD+EuzPWStxZdE8uxXrS6rVFLsng/grsKbYY8H5G8BhwC+ByRnEL3sW2Dd4PxBr\ntj6RQdzT6W1ZlJXPS8CPMihDW+jMMNYT2F/c12r8bCbwQsrxB2F/3qUVny0B7gOmAL9OOf4a4CUs\nUVbrwZqtabqZ9WsYZTNSjg1wF/A6vbUtsP8mNwF30nfZ0ijH2VgCPR14E9vbM+31WQ7BkmTlsUHF\naxb/cEiCBjqOv4Hj+Fnalt7m+AFYTWNYTuL3xzYeFonkHmBcxflewNwcxT8O2Dh4/12sprFbhvEf\nx355twUWA+cDt+Uo/kzc/iO1A1azXBCcTwa+4644EsYhwJPAl4H/xPpYsvyldR2/3Kf3Pmyy+WHA\nwxnGL/dp/TvWLKr8LA/xfw3Mwv7BOD04vp5h/HuBven9M3fQm8AkBBdL09wOfBG4A+vj2ZXa/Tzt\nGr8neD0MuBT4P+AHGcZ/F/gU8Bl6n5wO6Pvytov/dHD0wx5MlDvCszKEdfsvS/i2NkwOfReYD0wF\nTsEGHB6Wo/i3YoNJn8H6cgZhTaasTAR+AnwyOB8PnJGj+K79AWsal2taHw0+kxY2HRhccb41VuvJ\nS/wNsYG22wXnmwMfzDA+WJ/OFGASbh6CuIx/d43jrgzjb4P1ab2JPTH/CzbIVkLKelBfpY2xqvGq\nnMTfGHvk39fI+KwWK/8INsh2SXA+HqtxZtUZ7jr+HhXvB2H/gHQD38woftlQrIn6esZxpQl7Yp3R\nfwuOx1n3L1K7xr81eF2KNQ0rjyV9fCcNi7DmSdk2wWd5iV/LrAxj/RfrDvEYDvwww/jShHnAfhXn\n7yPbIQeu47tW/QvaUeOzdo4/ouIYiU1tyjJpPlbjsyyfnnrPxdPDbmwUetlMst0OxFX8RsMqHsmg\nDABzsKbYb4LzjwGz6V1pIO2Ju67jP0Lv08JurOZ7csoxK/Vj3elkg3E/uNorLvq0yh3h1wbnH8f+\nB5an0aT9y+sqfpH6j9YPSClutV8Fr+WyVD/yT3ttLdfxKxNGvc/Scga2usjl2J/9JOAW4NyM4nvP\nRdIqsu5f0uq/tGn/8rqOL249wvq13lqfpeU87Inlwdjfuz8DB2KDbUVanovF/1xPI3EVf3NsPa8n\nsQS1e/BaCD7LSq3+qywWwWwbWda0Tq94X6umk/bSHK7j1/IoNiI/S/dij/d/HsTuwAbbTmzz+Cdi\na1rtgfWhla3Cmqxp96V9EfgS9rT06YrPN8LGah2fcvy2kWVH/EZkO12i1eLXssJBTNfTSFzFvzI4\njiWbpXiqXYONfD8H69cqVxhWAf9wUB6J4CpsbErZCOCKHMV3zfU0EtfxNU5KIqs1TqXWZ+0afwds\novQd5HMaiev4GiflORfjtDqw2k152soIsl1B1XX83wI/Ay6jd8WHLJuta+jdxqs8jWRc3W+0V3yN\nk5LIPoONQP4BVi1fFHyWl/hzMoxVS61aRZZlch3/DKx2dzLwueB9nlaZ8J6LmtZV2F/SA7EaxtHY\nPnR5if97bAHCG4F3Kj5Pe8L0TsAEYBNs9Hn5qenGZLMbkuv4Zedi07bK46S+j62xJp5wucpDXi1l\n/eZgCVvtIE1HYgn6cGwEdtkqbHu3tLcRcx1fRDw1Nefx38AS5SqsprsGLQ/jlX6uC5BDc7Dm4fBG\nF6bkGKxJNgB7ircS+HSO4g/FxuxthHXCHwNcnGF8Ee9sh22o8RRwPbbRRpbN9PLSzkdjG6VuQrZL\n87iOX0uWQ14kJhcd8Xn3V+AsbL7dYdhs/zXB64Wk3yFf/n9+GHADtnlulkMuXMc/tuJ9P2wOYtob\ntUqClLTcmIItSfIhbErJNdhihHdhOx+n6ffYBOG3sflwo8huWZZWiF+5iUl5Pa0jM4wv4p05WHL6\nFOtvGnpTRmX4F3oH1G4IvCejuK7jd5LtHoeSAg15yE55lYlOao+Ed7HKxCXAFxzEdRl/FrZPgHhK\nzcPslFeZ2AH7pbkF+0cj6x2mK7n+5XURfyZwEfYQ5J8Vn2e13LWId+7DEljZRqy7Zn2WXI8EdxG/\nSO29D0WkD4tYd9rKINxvoZUntWYepD0bQRKk5mH2rsKagzdizcOjsMXpsrID8A1sOZjy//8SNhcz\nD/FvYP314H+LDX0QDyhpZe8/gD9iey+WsCWAs1zPyfXSOK7ilydsD8PthG2JSUnLjTm4W6JmNZY0\nXHEVf3tssvYmwWvZKuDzDsojTdKQh/zpAl4i+6VxWiX+VOCBjGKJSAKWAs9UHUtyFP983E7YFhGJ\npBUnbEsEWpomf1wvjeM6vusJ2xKTklb+fAIYjU1ncbE0juv45Qnbu2PNw6wnbItIk/oBRwDPA8uA\nadjORHmI73rCuIhENAWYjo3E/zHwXmzAZ1aL4bmOX3ZJxvFEpAmul8ZxHb+SNmn1kMZp5YfrpXFc\nx6/ldqxPTTyiEfH54XppHNfxa1HCEvGA66VxXMffAbgUuIPeZWnuyjC+xKSaVv6Mwub/la0OPstL\nfNcTxiUmJa38cb00juv4rieMS0zqiM+n3eldGudesn+K5jJ+F24nbEtMSlqSN0tZvzlYQquXiohI\nGjobXyLSVuZgy9I8heYcekkTpiVvXE/YFhFpiusJ49Ik1bQkj6Zg04bOB2YAH8PWitcgUxFpOa00\nYVuaoLa85EUrTtiWJmhEvORFK07YFhFpyPWEbYlJHfGSN64nbEtMah5K3riesC0xqSNe8sj1hHER\nEREREREREREREREREYnt/wGyBtFvcDfoPQAAAABJRU5ErkJggg==\n",
"text": [
"<matplotlib.figure.Figure at 0xba560b8>"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note how in the above image, in the first three rows the similarities between the clpx papers is apparent, as well as between the first three dyn papers. The last dyn paper is somewhat different, but this is to be expected since it is a structure paper and the other three dyn papers involve more microscopy. In terms of comparing the papers, singular value decomposition allowed us to reduce the 5657 different words found in the papers into 6 values that are pre-sorted in order of importance!"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Quantifying similarity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we'll look in more detail at how similar each paper is to the others. I've defined a function to calculate the distance between two column vectors of $V$, weighted by the weights in $\\Sigma$. For $\\vec{v}_i$ and $\\vec{v}_j$ the function calculates $\\|\\Sigma * (\\vec{v}_i - \\vec{v}_j)\\|$. This function is applied to every pairwise combination of $\\vec{v}_i$ and $\\vec{v}_j$, giving a metric of how similar two papers are (smaller values are more similar)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def dist(col1, col2, sigma=sigma):\n",
" \"\"\"Return the norm of (col1 - col2), where the differences in \n",
" each dimension are wighted by the values in sigma.\"\"\"\n",
" return np.linalg.norm(np.array(col1 - col2) * sigma)\n",
"\n",
"dist_df = pd.DataFrame(index=v_df.columns, columns=v_df.columns)\n",
"for cname in v_df.columns:\n",
" dist_df[cname] = v_df.apply(lambda x: dist(v_df[cname].values, x.values))\n",
"plt.imshow(dist_df.values, interpolation='none')\n",
"ax = plt.gca()\n",
"plt.xticks(xrange(len(dist_df.columns.values)))\n",
"plt.yticks(xrange(len(dist_df.index.values)))\n",
"ax.set_xticklabels(dist_df.columns.values, rotation=90)\n",
"ax.set_yticklabels(dist_df.index.values)\n",
"plt.title(\"Similarity between papers\\nLower value = more similar\")\n",
"plt.colorbar()\n",
"dist_df"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>clpx1</th>\n",
" <th>clpx2</th>\n",
" <th>dyn-lis1</th>\n",
" <th>dyn-steps1</th>\n",
" <th>dyn-steps2</th>\n",
" <th>dyn-structure</th>\n",
" <th>tcell</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>clpx1</th>\n",
" <td> 0.000000</td>\n",
" <td> 0.044530</td>\n",
" <td> 0.091754</td>\n",
" <td> 0.077374</td>\n",
" <td> 0.086122</td>\n",
" <td> 0.074950</td>\n",
" <td> 0.082144</td>\n",
" </tr>\n",
" <tr>\n",
" <th>clpx2</th>\n",
" <td> 0.044530</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.090552</td>\n",
" <td> 0.075129</td>\n",
" <td> 0.083906</td>\n",
" <td> 0.072627</td>\n",
" <td> 0.079379</td>\n",
" </tr>\n",
" <tr>\n",
" <th>dyn-lis1</th>\n",
" <td> 0.091754</td>\n",
" <td> 0.090552</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.065258</td>\n",
" <td> 0.071804</td>\n",
" <td> 0.079625</td>\n",
" <td> 0.096965</td>\n",
" </tr>\n",
" <tr>\n",
" <th>dyn-steps1</th>\n",
" <td> 0.077374</td>\n",
" <td> 0.075129</td>\n",
" <td> 0.065258</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.042777</td>\n",
" <td> 0.068084</td>\n",
" <td> 0.086867</td>\n",
" </tr>\n",
" <tr>\n",
" <th>dyn-steps2</th>\n",
" <td> 0.086122</td>\n",
" <td> 0.083906</td>\n",
" <td> 0.071804</td>\n",
" <td> 0.042777</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.073860</td>\n",
" <td> 0.093479</td>\n",
" </tr>\n",
" <tr>\n",
" <th>dyn-structure</th>\n",
" <td> 0.074950</td>\n",
" <td> 0.072627</td>\n",
" <td> 0.079625</td>\n",
" <td> 0.068084</td>\n",
" <td> 0.073860</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.081524</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tcell</th>\n",
" <td> 0.082144</td>\n",
" <td> 0.079379</td>\n",
" <td> 0.096965</td>\n",
" <td> 0.086867</td>\n",
" <td> 0.093479</td>\n",
" <td> 0.081524</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
" clpx1 clpx2 dyn-lis1 dyn-steps1 dyn-steps2 \\\n",
"clpx1 0.000000 0.044530 0.091754 0.077374 0.086122 \n",
"clpx2 0.044530 0.000000 0.090552 0.075129 0.083906 \n",
"dyn-lis1 0.091754 0.090552 0.000000 0.065258 0.071804 \n",
"dyn-steps1 0.077374 0.075129 0.065258 0.000000 0.042777 \n",
"dyn-steps2 0.086122 0.083906 0.071804 0.042777 0.000000 \n",
"dyn-structure 0.074950 0.072627 0.079625 0.068084 0.073860 \n",
"tcell 0.082144 0.079379 0.096965 0.086867 0.093479 \n",
"\n",
" dyn-structure tcell \n",
"clpx1 0.074950 0.082144 \n",
"clpx2 0.072627 0.079379 \n",
"dyn-lis1 0.079625 0.096965 \n",
"dyn-steps1 0.068084 0.086867 \n",
"dyn-steps2 0.073860 0.093479 \n",
"dyn-structure 0.000000 0.081524 \n",
"tcell 0.081524 0.000000 "
]
},
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAWkAAAFSCAYAAAAwxi4gAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm8XdP9//HXzU1QYmxLSUIQQ6mqIugXvUWLNKWmqrZf\nNRTVb8rXVF/V1k0HOn1b1A9pzYqoqY0hDS03iWqR0ZwKriG+NZQQQ5Cb8/vjvbaz77lnumfY+wzv\n5+OxH/fsca1zyOess/banwVmZmZmZmZmZmZmZmZmZmZmZmZmZpbjq8C0Cs/dBXgstt4L7F5FXZYA\no6s4P66X6upiZpaYnYF7gMXAv4G7ge3qUM5TwG41utZlwI+qOL+auvRWca5Z4oamXQGrymrALcAx\nwB+AFVEL+J00K5VjKLAs7UrEZICOtCvRBBrtv5tZU9oOeLXI/sOAmbH15cCxwOPA68APgY2Bv6OW\n+GRgWDi2C3g2dm689To2nPMq8Dzwm9h5UTnfCuU8Edu2MXA08C76IlkCTAFOBq7Pqfu5wNkF3tdT\nwP8ADwOvAJegL6jIeGBeqN/fgK3C9iuBPuCtUPYpqFV/Ytg/IlZ3Qn3/XcZ1AdYDbgBeBJ4Evh3b\n142+RC9Hn/tDwLYF3huhDt9Gn91LwM/JfrFsDNwJvBz2/R5YPXZuL5V9NtG53wEeAN4GOoFTgedC\nvR/Dv0LMBmVV9I/1MmAvYM2c/YcxMEjfBAwHtkCB8k7UV7wa+od9aDi2i8JB+pMoUA8BNgAeAY7P\nKWcasAbZALEc2Ci8vhR9QUQ+ArxBNtgMBV4Atsn/tulFgWREeM93k+0+2Sacuz0KbIeGukdfIrld\nJYejLwqArwAL0ZcVwBHo8yp13SHAbOB7oe4bogD7uXBuNwp6e4Vzz0RfcoUsB/6KPr9RwALgyLBv\nY9QfPwz4EDAd+HWNPpteYE44d0VgM+AZ9N8HYH2y/w3NrEybo6D3LPAe8Cdg7bDvMAYG6Z1i67NQ\nazLyS7L/4LsoHKRz/TdwY045XTnH5Abp3D7pqcA3wuvxqLVZyFOoRR7ZGwVXgAvo/wUAagHuEjs3\n/j42Ri3OjnDu0WTf9+XovRW77q7ADsDTOftOQ61YUJC+PbZvC9SaL2Q52QAP+vXzlwLHfhEF1ki1\nn81hsX1jUFCPvhQsBUPSroBV7THUGhwFfAz97C7UTQD6Rxd5O2d9KWpll7Ip6gv/P+A14CfAB3OO\neTb3pBIuB74WXn8NdU0UE7/+M+h9g1r2J6Gf89EyMrY/1xPAm8AnULC6BXXhbIoC8PQS11037Fsv\nZ99pZL8sof/n/BawEsX//RV6f+uglv5z6LO/kuKf/WA/m/i5C9GXVHeo/zXo/VqCHKRbywIU7D5W\n4fmZMo+7AHVxjEFdFKcz8P+lYtfKt+9PwMdR3T8PXFWiDuvnvF4UXj+DvjTWjC3DgWuLlD0dOAi1\nFp8P64eFc+eVcd1nUCs0vm819IugUJmlFHp/Z6J+9Y+hz/4/GfjZV/rZ5KvrNejLa4Ow72eDfytW\nDQfp5rYZuuk1IqyPAg6heH9nro4Cr4sZjm68vYW6W44dRHmgVllu3+bb6Mbb1cC9qKVYSAfwX+h9\nr4W+JKJA8zvgm6jPvANYBQX96BfCC6iLI246MAGYEdZ7wvpMskGr2HXvQ5/Hd4APoBtuHyM7FLKS\n0SQnk+2TPi72/oajlv/r4f2fknNeB7rxWclnk2tT1DW0Irp/sRR9QViCHKSb2xLUH3ovuvH2d3TT\n6KSwP0P/llG+Fl3u/lLHgwLIV1Cg+C36+T2Yci5G/bKv0r8vO/oVUKqrI4Na2rej7orHgR+HfbOB\no4DzUF/z42RvhgKchW7wvUp2VMcMFKiiIP03FGxnxM4rdt3lqNX8CTSy4yX0uawWq2/uZ1Kqdf2n\nUOZc1AUT9W9PRDduXwNuRl9suZ/91QzusylUlxXR5/US6tr6EOrGMbM2NQq1EsvpF29l8Zusg1XL\nh46sAbglbY1iCPoFcA36VWBm+IlDawyroL7ip9BY4nZXyY1GMzMzMzMzM2so3ZQe0WEDVZP+9SH0\nsA74829ovnHYXHppzTzK7oOtzKro/4lKfIzsEEN//g3MQbq55Btvm4Za33Bup9ShnWlXII9qPn/H\nkDrzB9waVkT5OhaF5dfACmHfdGD/8Po/0BjccWF9d/SwROQI9Lj3K8Cf6f94cTz96II8dZiKngKM\nm48SAAGcgx5Lfg0ldtq5wHvpYmDej16yvyA6UCrOhSgD4LUMzP5XrcPQAy2/Qg+9LAQ+hXKkPING\nosQfkFkduAKlKe1FT/l15LnWy8AZ6L/NL1FSpn+hx+xXKlCXMei/4WL0UMnk2L74eOrLgPOB21A3\nyEyUve6c8B4eRQ/bRHopPJ76OvTwyuJQ9haxfZeF+t6Ghkp2FbiG1YiDdGs4HT3qu3VYxqKn6kCP\nOHeF159GT8TtGlvvCa/3RU+T7YeeLJuJxizH7YvSXG7BQFejR9IjW6Agf2tYvy/Ubc1w7HVkv0hK\nif+COA7YJ7yHdVEA+n8Fzluf/smEcpcvFylzLPqSWQt9Dn9AT/ptjBJAnQesHI79Dep62BB9poei\ngB6/1hMo4dKZKP/FGPR5jEGPcP+gQD1+hL4w1wjHnVukzgeh/xc+hHJ2/wO4P7yH69EXRaTYL7Jb\nQ70+jDLs5eZROSTUazj6AjKzoNDTZAvpP774c+FYUAt0fng9FeUljnJ7TCfb0p2KWtKRIejpv1Fh\nPV/60bhVUcsqOv4nwEVFjn+FbML5brI3rroY2JKOv+9H6P8ZrIsCUi0bHIcB/4ytb4Xe/4dj215G\nCaE6UV6LzWP7jgbuil0rnsa0A31O8ScKd0JfnvlcDkwim58lLjf966TYvgkoP3j8PcQniIh/pt0U\nvnG4Rihn1bB+WVgsIW5Jt4b16B8I4ukp/44S5ayNfu5egQLpB1GrOLp5tAHZn8avkp2RJB4ciqUf\nXYJaYFFr+sv0b4GdjALs4nD91VGLb7BGo0T8UT0fQdM8rVPBtYrJTekK6m6IbxuO3sMwBn7+hT63\nD6MW+Gyy72EqhT+L76DAfh8akXF4geNA3S2RpTnrUX1L6QR+ir74XyP7ZR/VL8Pg09BaFRykW8Pz\n9B+KtX7YBspUNxvlBX4QTQxwD3oEeyFq0YICy9H0T2O5CvrJHCl10/IaFKR3Qn2sUWtyF5St7SDU\nMlsTBYB8N6zeJNuNAAoa8RbsM2RnoYmWlVEfaq710ZdHoeWQPOcM1svoMx2dU248i18m5/i3UXdQ\nVP81yCZjyvUC+u8yAs1leT71nR3lK6g7aXf0Rbph2N5ON3cbioN081kBBcBoGYqC4/dQa+dDqH8z\n/vN1OrqpFyWw70E/h6fHjrkQ+C7Z/ubVUVAdjNtQi3wi/W9wrYpauy+H+v+AwkHpn+F9jUMt1O/R\nf46+C1G/bnRT88MoqOTzTCi70JLb516JPtRf/RPUUt0AOAHNPZjPcpQy9GyyXz4j6D8TS9xBKDE/\n6FdIJlwjV62C6HDUffMK+pI+s07lWJkcpJvPbah1HC0/QKkoZ6E0pQ+E1z+OnTOd/qk4Z6B/gPFU\nnH9EN7Qmo1bug8Cesf3lDP17F6Ue3R3dHIz8OSz/RKMK3kYBNH7t6PqvoVEkF6HW6Bv0/3l9DpqT\n8HaUKvXv6MZcLQ02tei30S+AJ9EN16tQH3Gha52KfsX8A73fO1CXVD7bheOWoPSlx5EdG10qzWy5\n7yF+7BWo62YR6l75exnXNTMzMzMzMzMzMzMzMzOzevJwmoptkOn//IKZJWQ6VeYMWQkyS8s//FX0\naH0qHKQrl6HjjCrO7oGOropPP6RvVOmDiniw+09s1b1vxecfPuQbFZ97Bf2zEw3WZ4+p4mSgexZ0\nb1f5+Znrqiz/LeheufRxhXQcX2X5d0H3Z6q4wO1Vlv8MdK9f+ri8hg6jY/p7UH3syvy49DHA+0lw\nUouVnuPQzNrSsLQrUCYHaTNrS80S/Jqlni1odKqlr921WWplb51aydK1Xulj6lp+yk24rtEpl796\nuuVH3JK24jpGp1r8Ol2blz6oThykUy5/w9LH1LX8BgnSzRL8mqWeZmY15Za0mVkD+0DaFSiTg7SZ\ntaVmCX7NUk8zs5pyd4eZWQNrliDdqkn/u9H0UJXYFc2Q/B5wQK0qZGaNZWiZSwF7AY8Bj6NJHPI5\nN+yfD2wT2348mlTjofC6qFYN0tXMHPE08HX6zyxiZi1mWJlLHp3AeShQb4HmyvxozjHjgDHAJmiO\nygvC9o8B30CTQG8NjAc2LlbPVgnSh6Jvq3koNURcD5pPbi769to+bD8b+H54vSfZ+f6eDsflm0fO\nzFpEFS3psWj6s170i3sykJsIZx/g8vD6XjTZ8EdQML8Xzebeh+LO/qXq2ey2BE5HM1S/gmZfPo5s\nazqDRttsg2atvgTYCjgNuB+4G82bt3eitTazVFUxBG8E/efdfA7YoYxj1kMNwB+jrHpLgc8D9xUr\nrBWC9G5otuZXwvqreY6JZoWeiWapXg1NYnpU2HY88FR9q2lmjaSKG4fldqfmy5z3GJrw+XY0efFc\nSvxqb4UgnWHwaQSjD/njwEvoW6/YcQX29sRWRqf+qLdZK+p5TQsAQ/pqdt1Cwe9+YFbxUxcB8VzB\no1BLudgxI8M20K/5S8LrM4FnKqlnM7kTuAn4FWpNR8m5O2J/D0Z90zsDi4ElwAbAiagbZCrwR/r/\n7OigVPCvIh+0mZWna/VYvo+hnUx8uja3iwq1pD8VlsiFAw+ZhW4IjgaeR/HlkJxjpgATUH/1jiju\nvBD2rQ28CKwP7MfArpJ+WiFIPwL8BHXA96GfD73075NeiobVDQWOCNsvQsP0/gUcCVwGbIfuuN6I\n+rbHo+F8W9X7TZhZsqoIfstQAJ6GRnpcDDwKRNNRTAJuQyM8FqJujcNj518PfBDddPwW6nqtRz0b\nyhUMHNURdyVwQs62z8Zez0FdH6BfO9VNe2JmDa/Kh1mmhiVuUs76hALn7jqYglolSJuZDUqzPHHY\nDkG6mtnczKxFfaDc6LesrtUoqR2CtJnZAEMdpM3MGtewzrRrUB4HaTNrS2W3pFPWJNU0M6utYU0S\n/ZqkmmZmNebuDjOzBtYk0a9JqmlmVmMrpV2B8jhIm1l7cneHmVkDa5Lo1yTVNDOrsSaJfk1Szcb0\n5WXrp1b25M6iKWjrruTsmfW0TpqFQ8fa6Zaf9vtnvRTLrmUXhbs7zMwaWJNEv1aZiNbMbHCqmIkW\nzRT+GPA4cGqBY84N++ejyUUipwEPo/kOrwZWLFZNB2kza08rlrkM1AmchwL1FmhWlo/mHDMOGINm\ncDkauCBsH43mVv0kmkykE/hysWo2SYPfzKzGKo9+Y9GMK71hfTKwL5qdJbIPcHl4fS+wBrqb8Dqa\nkWVlNJPUymTnPszLLWkza0+Vd3eMAJ6NrT/HwMmsCx3zCvC/aPLZ59Hch38pVk0HaTNrT51lLgNl\n8m4dKN9E1hsD/426PdYDhgNfLXYRd3eYWXsqEP16XtZSxCL6z4M6CrWUix0zMmzrAu4B/h2234gm\nJ79qkNU0M2txBaJf10e0RCb+c8Ahs9ANwdGoy+JgdPMwbgqaiHYysCPq1ngBWAB8H/gAsBTYA7iv\ngmqambW4ogPfilqGAvA01CFyMbppeEzYPwm4DY3wWAi8CRwe9s0DrkCBfjkwB/htscIcpM2sPVUX\n/aaGJW5SzvqEAuf+PCxladUbh93ASRWeeyIaaD4f3XVN79lvM6ufym8cJqpVg3S5d1/zmQNsC2wN\nXM8gvvHMrIlU98RhYlolSB+KWr5Rf09cD3A2MBc9hrl92H426sAH2BOYHjt+aXh9L7ora2atpkmC\ndANUoWpbAqcDO6GB4msCx5FtTWfQndRtgF2AS9DjmKcB9wN3A+cAe+e59pHoBoCZtZomiX5NUs2i\ndgP+gAI0wKt5jrkm/J0JrBaW19Ez9DNR5s2ncs75Gnq+/oQa19fMGkED9DeXoxWCdIb8T/aUOgfg\n48BLDHykcw/gu8Cu6Dn7vB6c+Kf3X6/96c1Yp2vzQVbDzErpeRF6XgorHX21u7DnOEzMncBNwK9Q\na3qtsL0j9vdg1Ne8MxpUvgTYAI3k2AYNpfkjGlS+DXAh6qcu+tzRVmfsW7t3YWZ5da2tBYDOTiY+\nvLw2F3ZLOjGPAD9BN/760A3CXvr3SS9FozaGAkeE7RehYXr/Qn3Pl6Gbij8HVkEjOwCeBr5Y37dg\nZolrkujXJNUs6QoGjuqIu5KBfcufjb2eg7o+crebWatqkujXJNU0M6uxJol+TVLNqnwm7QqYWQNy\nn7SZWQNrkujXJNU0M6uxyrPgJcpB2szaU5NEv1bJ3WFmNjjV5e7YC3gMeBw4tcAx54b989HzFwCb\noWHC0fIaSmNRtJpmZu2n8huHncB56MnkRSgH0BT6zxY+DhiDZnDZAbgAzdCygGzAHhLOv6lYYW5J\nm1l7qrwlPRbNuNKL0kZMBnIfP94HuDy8vhdYA1gn55g9gCfoP6v4AA7SZtaeKg/SI+gfWJ9jYP6f\nfMfkpj3+MnB1OdU0M2s/lUe/cicVyU38Fj9vBeALFO7Pfp+DtJm1pwJD8Hoe1FLEImBUbH0UaikX\nO2Zk2BbZG5iNsnAW5SBtZu2pQPTr2kZLZOLkAYfMQjcERwPPoyybh+QcMwVNRDsZ3TBcDLwQ238I\n2Tz3lVTTynHE0G+kVvZx1cziWAOf6vhBamX3ff2HqZUN6J9mmm5Ot/iJRcci1NeQYTW8WOWjO5ah\nADwtXOViNLLjmLB/EprRaRy6wfgmcHjs/FXQTcOjyinMQdrM2lN10W9qWOIm5axPKHDum8CHyi3I\nQdrM2lOTRL8mqaaZWY05C56ZWQNrkujXJNU0M6sxT0RrZtbA3N1hZtbAmiT6NUk1zcxqrEmiX5NU\n08ysxtzdYWbWwJok+jVDqtJu4KQaXKcH+GR4fSuwWpFjJ6DHOZcDa9WgbDNrNNXNzJKYBqhCSbXK\nUhG/zudLHHs3ypDQU6OyzazBZJpkItpGbUmfjqaZmYnmBOtEaf0im8TWe1FrezbwQDi+lF7UQl4F\ntarnAQ8CXwr75wFPV159M2t0fUPLW9LWAFUYYFuU+m9rYBgwBwXg18K2+Sij1CXh+AzKybotcCxw\nMqWzS0Wt6r1QjteoZV2sC8TMWkgjBOByNGJLehfgRmApsATlZQW4CAXnIajFG5925sbwdw6DSyT5\nAPBZ4KfAzsDrlVbazJrLss4hZS1pS78GA2XoP+1MR9h2A5rNYDxqWb8aO+ad8LeP7K+DaWjK9N8W\nKetxNHPvg8CPge8PpqJXZDLvL/MzKSd4NmtRTwF3heXOvr6aXbdv6NCylgL2Ah5DMaTQFFjnhv3z\nyc4QDpqU9nqUg/oRNClAQY3Y4J8BXAachbo7xgMXokA8DU2NfkQZ19mzjGPWRcH+KtSdcmSeY3Ln\nKXvfoR0Fd5lZjWwYFoAhnZ30LF9ek+v2dVY8ULoTOA8l7l8E3I9+8T8aO2YcMAbdP9sBxa0oGJ+D\nJgU4EMXgVYoV1ohBei5wLfr2eRG4L7bvamA/4PbYtkzO63KatNExWwG/QEPt3gO+GbYfB5yCpmB/\nAN1cPHowb8LMGts7rFDmkW/nbhiLhuj2hvXJwL70D9L7AJeH1/ei1vM6qBt3F+DrYd8y1EAsqBGD\nNMCZYcm1M7phGA/EG8VezwZ2K3DNz+Q553b6B/zIuWExsxbVV3n4GwE8G1t/DrWWSx0zEnXJvgRc\nigZCzAaOB94qVFgj9kkXchPwNfRTwcysKn10lrXkUe4NqNz+0AxqGH8SOD/8fRP4n2IXadSWdD77\npV0BM2sdBQIw/+h5h3/0vFvs1EXAqNj6KNRSLnbMyLCtIxx7f9h+PS0UpM3MaqZQkN6+a2W271r5\n/fVzJ76Re8gsdENwNPA8eq7jkJxjpqD0EpPRDcPFwAth37PApsA/0c3Hh4vV00HazNrSssrT4C1D\nAXgaGulxMbppeEzYPwmN3hiHbjC+iZ7xiHwbjShbAXgiZ98ADtJm1paquHEIMDUscZNy1icUOHc+\nsH25BTlIm1lberfsIXjpcpA2s7ZURXdHohykzawtVdndkZjmqKWZWY0VGt3RaBykzawtOUibmTUw\nB2kzswbWLDcOnWuzcpnMMaUPqpt1UiwbWH5oev/rDN30B6mVDbD3sq1SLf/Wvx2Yavlv7ZVi4cOG\nscri96D62JW5JbN7WQeO7/hrLcqrmFvSZtaW3N1hZtbAmqW7w0HazNqSx0mbmTUwd3eYmTUwB2kz\nswbWLH3SzTR9lplZzbzLimUtBewFPAY8Dpxa4Jhzw/75wDax7b1oguu59J9oOy+3pM2sLVXR3dEJ\nnIdmVVmEpsKaQv/ZwscBY9AMLjsAF6AZWkBzHXYBr5RTmFvSZtaWqpiIdiyacaUXeA9NkbVvzjH7\nAJeH1/cCa9D/EbSyH45xkDaztrSMzrKWPEageQojz4Vt5R6TAf6C5ko8qlQ9k+ju6AaWAP9bh2vv\niyZzfLTUgYN0CfB54EUg3WeAzawuqhgnnSnzuEKt5Z3RBLYfBu5AfdszC10kiSBd7huqxH7AzdQ+\nSF8K/Aa4osbXNbMGUahP+ome53iiZ1GxUxcBo2Lro1BLudgxI8M2UIAGeAm4CXWfFAzS9eruOB1Y\nEAreDHW0z47t3yS23ota27PRHc/NClzzp2jq8/nAL4CdgC+E13OBDYGN0eSQs4AZsWtdBlyIOvgX\noFYywJaov2huuO6YsH0m8Oqg3rGZNZVCfdCjuzZg9+5Pvb/kMQvFsNFoxu+D0Y3DuCnAoeH1jsBi\n4AVgZWDVsH0V4HPAg8XqWY+W9Lao0lsDw4A5KAC/FrbNR1OYXxKOz6BvlG2BY4GTGdhP80Hgi8Dm\nYX014HX0QdwM3Bi2/xVNq74Q3VE9H4hSXa2PZugdA9wV/n4TOAe4Gn0WHu1i1ibeKTy8rpRlaCbw\naagBejH6NR/lxZwE3IZGeCwE3kQxD+AjZOPVUOAq4PZihdUjKO0SKrE0LNE3zEWooicCX6L/lOZR\npecA++e55uJwrYuBW8ISifp9hqPW9XWxfdF0wBngD+H1QuBJFPDvQa3+kaEOC8t7i2bW7Kp84nBq\nWOIm5axPyHPek8AnBlNQPYJ0hv4d5h1h2w3AGcCdqGUd7054J/zti9VpGrA26qI4GvXb7A4ciN58\n1EKO+ryHoGAeHzRezHLgGuAfwHj0zXcMamWXpXtW9nXXelrMrLZm9MHMvrCyrK/osYPRzo+Fz0B9\nwGeh7o7xqD/4HRR4LwCOKOM6e8ZerxKWqaj1+0TYvgR1fYC6P55CQfx69OWwFern7gAOQuMWNwrL\ngvD3SXSTcP1wfPlBertyjzSzSu3aqQWAYZ2c+c7ymly3WYJ0PW4czgWuRX3Pt9H/scerUQs23geT\nyXmdbzTIqqjveT66qXdC2D4ZOAW1zDcEvgocCcwDHkIDyqPrPhPqErWY30WB+6FQ5y3Jjua4Bn0Z\nbIrGOkb9SWbWIqoYJ52oet0oOzMsuXZGNwzjgXij2OvZwG55zvsXuhGY6x4UXOP2LlCnO9CNybif\nhSXXIQWuYWYtwvmkB7oJtXbzBWEzs0S9+/64gsaWZJDeL8Gycrm7wsz6aYSujHI0R3vfzKzG3N1h\nZtbAmmV0h4O0mbUlB2kzswbmIG1m1sB849DMrIEVmb+woThIm1lbcneHmVkDa5buDs9xaGZtqY+h\nZS0F7IWmvXocOLXAMeeG/fMZmJ2zE+UMurlUPd2SNrO2VEV3RydwHrAHmhLrfpQ3Pz6N3zg0scgm\nKO/QBWiGlsjxwCNkZ2kpyEG6CpnrSh9TLx3rlD6mruVvVPqYetnzvY+nVzgwdegDqZbPz9Mt/uY3\n0yt7yLDaXauKID0WTRDSG9Yno0mx40F6H5QaGTRF3xrAOmgKrZEoiP8ETYJSlLs7zKwtVZGqdARK\nYRx5Lmwr95hfoxTLZSXGdkvazNpSFUPw8uW8z6cjz/p44EXUH91VzkUcpM2sLRXq7ljSM4clPXOL\nnboIGBVbH4VaysWOGRm2HYC6QsYBK6GZpa4gO7P4AA7SZtaWCgXplbu2Z+Wu7DzZ/zfx0txDZqEb\ngqOB54GDGThRyBQ0F+tkdMNwMZq85LthAfg0cDJFAjQ4SJtZm6pinPQyFICnoZEeF6ObhseE/ZPQ\nNH3j0A3GNymc075k14mDtJm1pSrzSU8NS9yknPUJJa4xPSxFOUibWVvyY+FmZg3MQdrMrIG94yx4\nZmaNq1la0kk8cdgNnFSna+8LfLTG1xwF3AU8DDwEHFfj65tZA+ijs6wlbUm0pMt9OqcS+6EsUo+W\nOnAQ3gNOAOYBw4HZwB01LsPMUta3PP0AXI56taRPBxYAM4HN0FjC2bH9m8TWe1FrezbwQDg+n5+i\n1u184BfATsAXwuu5wIbAxmhYzCxgRuxalwEXomxVC4DPh+1bouQnc8N1x6AB5/PC/jdQcF6v/Ldu\nZs1g2bLOspa01aMlvS16AmdrYBgwBwXg18K2+Whg9yXh+AzwUjjvWPQEzlE51/wg8EVg87C+GvA6\neqrnZuDGsP2vaED5QpQe8Hxg97BvfWB7FIjvCn+/CZwDXI0+i9zPYzTKA3vv4D4CM2t0fcua45Zc\nPWq5CwqaS8MyJWy/CAXnE4EvoYAZiYLsHGD/PNdcHK51MXBLWCJREpPhqHUdTyC6QvibAf4QXi8E\nnkQB/x7U6h8Z6rAwdu5w4HqU9/WNwm/XzJpRXwO0kstRjyCdoX/2p46w7QbgDOBO1LJ+NXbMO+Fv\nX6xO04C1URfF0SiH6+7AgehJnqiFHPV5D0HBPHcGhEKWA9cA/0CZqW5DrfC70C+AG4DfA38sdIHu\nt7Kvu4ZpMbPaehhlxwfo6Our2XXfeXuF0gc1gHoE6RmoD/gsFOzGo/7gd1DgvQA4oozr7Bl7vUpY\npqLW7xNh+xLU9QHq/ngKBfHr0ZfDVqifuwM4CCXh3igsC8LfJ4HfoO6QrVCQvhj9f3F2sQp2r1zG\nuzCzqmzaWVOUAAAUnElEQVQZFoAhnZ1cv7ysNMwlLe9rju6Oetw4nAtci/qebwPui+27GrVgb49t\ny+S8zjcaZFXU9zwf3Yw8IWyfjJJnz0Y3Dr8KHIlu/D2EUgJG130m1CVqMb+LAvdDoc5bopSBOwNf\nAz4Tts9F85mZWStZ1lnekrJ6fZWcGZZcO6MbhvFAHJ+IaTawW57z/oVuBOa6h+yXbGTvAnW6A92Y\njPtZWOLuxjPWmLW+BgjA5UiyvX8Tau3mC8JmZslaljtxSmNKssW4H/AJ4JUEy4wcTnYEiZmZskKX\ns+S3F/AY8DhwaoFjzg3755Md0LASGtI7D933OqtUNZuj59zMrNaWVnxmJ3AesAeaEut+NNQ4/lTy\nOPQsxiaoq/YCNEPLUnS/6y0Uf+9G3cB3FyrMfa9m1p7eK3MZaCx6pqI3HDEZ5RGK2weNJgO1nNcA\n1gnr0eDdFVDAL9q74CBtZu2pr8xloBHAs7H158K2UseMDK87UXfHC2jI7yMU4SBtZu2p8j7pcpPG\n5d6ZjM7rQ/fnRgK7Al3FLuI+aTNrT4VuCs7rgfk9xc5chFIaR0ahlnKxY0aGbXGvAbcC2wEFC3SQ\nNrP2VChIf6xLS+SKiblHzEI3BEcDz6OEcofkHDMFpa+YjG4YLkbdGx8KJS8GPgB8FhhQQJyDtJm1\np8LD68o5cwJKc9GJ0kg8ip5kBs0afhsa4bEQeBMNAwZYF91QHBKWK1H2zoIcpM2sPVU+BA+UR2hq\nzrZJOesT8pz3IPDJwRTkIG1m7Sn/8LqG4yBdhY7jUyx87RTLBjqm1HNWtOJuG3NgamUDZHKzvSSs\n85QfpFp+36d/mF7hQynROTAItct6WlcO0mbWnirvk06Ug7SZtScHaTOzBuYgbWbWwBykzcwaWHVD\n8BLjIG1m7clD8MzMGpiH4JmZNTD3SZuZNTAHaTOzBuYgbWbWwJrkxuFgZ2bpBk6qQz1Ac4R9tIbX\n+zpKC2hmNtA7ZS4pG2yQrmdWnf2ALQrs66zgeocB6w3yHP+yMGsXlU+fBbAX8BjwOHBqgWPODfvn\nA9uEbaPQvIYPAw8Bx5WqZjlB+nRgATAT2AwFzNmx/ZvE1ntRa3s28EA4Pp+fhkrOB34B7AR8Ibye\nA2yEppP5NZou/XjgUuCA2DXeiL0+NZQ3DzgrHLcdcFW43kqhbmuF47dDHxShvleiKdUvRzMnXA/c\nF5ZPFXgPZtbMKp8tvBM4DwXqLdCsLLm9AOOAMSg+Hg1cECv1BGBLNGPLf+U5t59SLcdt0dQwWwPD\nUMCbjebm2hoF2cOBS8LxGeClcN6xwMnAUTnX/CDwRWDzsL4a8DqabuZm4MbYtYYB24f1S3OuE7Xq\n90bTp49FzxCtgaammYC6ZubkHJ/P5sDO6MfN1ejL4W/A+sCfKdzCN7NmVfk46bFoxpXesD4Zddc+\nGjtmH9ToA7gXxaV1gH+FBdTQfBT94o+f20+pIL0LCppLwzIlbL8IBecTgS+RDaSQDbJzgP3zXHNx\nuNbFwC1hieTOrnttifoB7IG+JKKHPBcXuV4+GfS+ot6nPej/zbYqsDLwVu6J3XdlX3eNhq4NyyjN\nzAalZ7EWAIbU8AmUykd3jACeja0/B+xQxjEj0TyHkdGoG+TeYoWVCtIZ+ge6jrDtBuAM4E7Usn41\ndkwU7Ppi15+G0tTfj5r+Y4HdgQNRi3f3WHlxb8ZeLyPbPTMEWKFAHXPrn+/8lXKOiwfgDvSBv1vg\nmu/r/kypI8ysWl1raAFgaCcTn1pemwtXHqTLvTeXG5fi5w1H3arH07/rdoBSQXoGcBnq5x0GjAcu\nRIF4GupnOaKMyu4Ze71KWKYC9wBPhO1LUNdHIb2oG+U69FNiWNh+B/AD1P/8NrAm+tLIvV4v6ov+\nM/37tnM/yNtRZ/4vw/onUF+3mbWSQkPwXuyBl3qKnbkI3QCMjEIt5WLHjAzbQLHrBuD3wB9LVbPU\njcO5qMthPpr99r7YvquB5SioRTI5r/N946yK+p7no5uRJ4Ttk4FTUMt8ozzn/Q74NAqYO5L99pmG\nuitmhfpGQwQvQ18o0Y3DicA5qDW/LFa33Hoeh4L5fHRz8+g8dTGzZldoyN3qXTCmO7sMNAvdEByN\nftEfTLYrODIFODS83hF1w76AGoUXA48AZ5dTzXL6bAs5GQXcM6q4RjPLZCamWHrKcxxyR3pFZ06s\n5n/bGpT/91SLZ9h3vp9q+X27pjnH4TA6/voeVBe7ADLsXWavxdSOfOXtjYJsJwq6ZwHHhH3RrOHR\nCJA30T28OWiAwgw0Gi2qwGnoF35elY4LvgnYENitwvPNzNJV3ROHU8MSNylnfUKe8+5mkM+nVBqk\n96vwPDOzxuBUpWZmDcwJlszMGpiDtJlZA/Mch2ZmDcwtaTOzBuYgbWbWwJok6b+DtJm1Jw/BMzNr\nYO7uMDNrYE0SpNNNgtDcMpn/SLH0wU4MVmMTr0uv7FOGp1c2wC1vlj6mng7cNd1/tkOn/yC1socN\nG8K7750BtcjdsWaZuTtezZu7IzFuSZtZe2qSlrSDtJm1JwdpM7MG1iRD8AaVMs/MrGX0lbnktxfw\nGPA4cGqBY84N++ejuQwjl6AJAB4sp5oO0mbWnjJlLgN1kk3ovwVwCP0nrwYYB4xBM7gcjaYajFwa\nzi2Lg7SZ2eCMBRaieVPfQ1P/7ZtzzD7A5eH1vcAawEfC+kz6T95dlIO0mdngjACeja0/F7YN9piy\n+MahmbWpiu8cljnAesDY6nLP68dB2szaVKExeDPCUtAiYFRsfRRqKRc7ZmTYNmgO0mbWpgq1pHcK\nS+TM3ANmoRuCo4HngYPRzcO4KWgi2snAjsBiNKJj0Fq5T3p14NgKz+0GTgqvLwMOqEF9zKyhLCtz\nyXviBGAa8AhwLfAocExYAG4DnkQ3GCcB34qdfw1wD7Ap6rc+vFgtW7klvSb6YC4odWAe8cE3hQfi\nmFkTq+pplqlhiZuUsz6hwLm5re6iWjlI/xTYGJgL3AH8G/gqsBx9uKeF/ecBHwbeAo4CFoTz453+\nTkRl1nKa45HDVg7SpwJboid99ga+h8Y3LkVjFgF+i36eLAR2AM4Hdk+8pmaWgrfTrkBZWjlIx1u/\ne6BHMaP5gRcDw9HdgXjSzRWSqZqZpa85Miy1cpCOyzCwy2IICtbbDDz8/XOK6n4m+7prdS1mVlsZ\netHDfbCsr5Y9j+7uSNsSYNXw+i/A94Gr0G+cNdFjmU8BBwLXoyC+FfBAOKfk/w3d69e2wmY2UAej\n0Wg3GNo5hHeX99Toys3Rkm7lIXj/Bv6GMk3thsYtzkI3EqPhdV8FjgTmAQ+h5+0jmQKvzawlvFfm\nkq5WbkmDgnDcz3LWe9FNxVwTY6+LjmE0s2bVHC3pVg/SZmYFpN9KLoeDtJm1KQ/BMzNrYO7uMDNr\nYO7uMDNrYA7SZmYNzN0dZmYNrDla0q38MIuZWRFvl7nktRfwGPA4SuaWz7lh/3z6p58o59z3OUin\npOe1lMt/Mb2yn0qvaABmpPwr9+GUn1/tWZxuBTIhD0f6Kk7634lSHO8FbIHyQ38055hxwBg0g8vR\nZPPal3NuPw7SKUk9SL+UXtm96RUNwMy+dMt/JN3imb445Qqk/n9ApOLHwsei9Ma94YDJwL45x+wD\nXB5e34vSI3+kzHP7cZA2szZVcUt6BJr2KvJc2FbOMeuVcW4/vnFoZm2q4huH5fYXeUanlPWQnf/Q\nixcvyS09VG8w5b2ec+6OwJ9j66cx8AbghcCXY+uPAeuUea6ZmVVhKPAESnK9Akp1nO/G4W3h9Y7A\nPwZxrpmZVWlvNGn1QtQaBs2XekzsmPPC/vnAJ0uca2ZmZmZmZlYDSc2481FgdzQ7fdxeCZUft3IK\nZZo1tQcTKGN9NGD+buC7wLDYvj8mUH4hv02xbOg/VrVejkN9j38Enga+GNs3N4HyI59Cz9BE7/kT\nwPkJlHsAsH/4m7vsn0D5LcHjpOvvgDzbMmgM5boJlH8Jmg39XjTp7nT0NNTLwAZ1LnutAts7gM/X\nuWwo/iW4dgLlHw1sC7yB7uZfH/6enUDZcWejlvufwvo84NMJlPsF9P96ITcmUIem5yBdf5OBq4Hl\nOds7gJUSKP/DaMwmwATga8AM9A+o3l5GLchC9aq3tVFwejXPvnsSKL8DBWjQY8BdwA3oyzHpBx2e\nyVlPIoPJYQmU0fIcpOvvQeCX5G/V7Z5A+UPRl8HSsP574F/ANGCVOpf9JHqP+QJ1Et0Nt6K+4Hxd\nC9MTKP9F1LUwL6y/AYwHLgY+nkD5kWeA/wivV0DdMI8mUO5JZH81RqL1DPCrBOrQ9DrTrkAbeBT9\nY82XUulu4Pk6l78S+u/cG9v2JDAT2Bq4so5lLwdeQl8KufpQF0w9/YmBLcjIDXUuG+BO9LTaG7Ft\ny4GbgL9SuG71qMcZ6AvjJOAt4NvUfybWPdGXQnxZMfY3iS9Ks6qskHL5K6ZcflLGkO1a+gxqSa7R\nJuUPBa5KqCyzpjYd2DC2PhZ4oE3K/xKwWnj9fdSS/GThw2tuPgpWY4B/Ar8g+8huO5R/N+l+IW+G\nfjk8HNY/DnwvveqY5bcnSrLyX8CZqJ80yUCVZvlRf/zOKDnOeOC+hMqGbJ/0d9DP/Pi2dij/SuB+\n9AV5UlhOTLD8GcAOZN9zB9mAbSX4xmFypgHHAnegftptyN9X24rlR2n2xwO/A24BfpRQ2QDvAl8B\nDiU7qmVY4cNbrvwnwjIE3UiNbtwlZWX633/I0CwTDFpb+T7wELATSsKyAAWtdij/VvTwylOoL3Yl\n1AWQlC2B36CpigA2Itn0kGmXn7apqKsnakkfGLaZNZSzgQ/E1jdArdp2KH8V9FDPJmF9XeBzCZUd\nWRGNZtmKdG7Ypln+XXmWOxMsf2PUJ/0WGs30N/RQj5XBMwckbzX0c29JG5S/GhqCVujJw1cSqAPo\n6cYL0dBDUEv2GJK7eZd2+dvFXq+EvjCXAackVH5kOOpyyU2ib9YQtkc30J4Oy3z6/+NpxfJvDX97\nUVdHfHmywDn1sAD93I5sHLa1S/n53J9gWWfRf8jhmsCPEyzfrCwPArvE1ncm2SF4aZefptyA1JFn\nWyuXv1Zs+RB6VD7JL4l5ebYlObqlqXl0R3KWoaf8IneTTP6ENMsvNcRvTp3Lj8xGXQt/COsHAbPI\nZmKrd6KftMufQ3Y0xzL0y+bIOpcZN4T+qQk+QPoPcjUN90knJ7pxd01YPxj9Txs9ll3vgJVG+T0U\nH+r1mTqUmc9l4W9Ul9whaPXOLZ12+fEAWWxbvZyKMi9egt774cAU4GcJld/UHKST00P/f5i5/1Dr\nHbDSLt/SM4eBv2rybauXn6MRJXug/+f+AuyGHu4xswaTRrL/tB9LTqv8dVE+68dQQN42/O0K25KS\nr/85iQkvWoJb0vV3Uux1vpZsvdM1pl1+rrnoacckzUDDzS4MZXegB3u2bPHyv45yOm+H+sAjS1AX\nTL37wo8FvoVGszwR274qGiv91TqX3xJ847D+ViXZR3AbrfxcL6ZQZtqPJadV/uVhOYBkUrPmuho9\nWfhT1C8dNQqXAP9OoT5mRV2BxodG1gIubaPy05T2Y8lpl+9xymZlyDdWNN+2Vix/M5RY6Q7a87Hk\ntMv3OOUm5u6O5HSg1mv0KPRaJDszTprlXwdcAFxENiNekl0wy9E0XvHHkjcsekZrle9xymZlOBQ9\n5fUj9FNzQdjWDuXPTqicQvK1GpOsU9rln4pa70cC3wiv2ykLX1NzSzo5V6B/mLuhVuR+wCNtUv7N\naLKBG4F3YtvrnWDpo8AWwOro6b5oRMtqJDNTe9rlR36GUgBE45R/iPKLWxPwEDxLQi8DuzcyKBtc\nPe2Lvoy+gJ5wiywBJgP3tHj5ZmZNYac2L/8N9MWwBP2SWY7ThTaNIWlXwNrCbNTdsWapA+tkf9TF\nMAyNsngZ+M82Kn84Gi+/KrppuD9wfoLlm1mD2wRNfrsQuBZNiptkV1s0Vdd+wMWojzjJNK1pl59P\nksM/rQq+cWhJeBz4LspXMR5lQ1se/p5D/W8gRv+fjweuB14j2SGAaZd/QOz1EJTD4+0Ey7cqOEhb\nUrZGKSr3Ro8oX40mHrgT+ESdy74ZJRRaivJJrE1yaTobofz4hMNRPul9EyzfzBrcbBSMv4ImZI27\nKaE6fJDswzurAB9JqNy0y+8ETkyoLKsDD8Gzeooy8HWS/0nDpDPwgVKlHp1CuWmWfz+a49KakLs7\nrJ6iDHyboSAxBTUMxgP3pVSntINVGuXfDZyHbtq+Gdue1PRlZtbgZqKAHVmV/vMtJintJ+3SKL+H\nbGKr+GJmBihPSPwx6JVIdrbqdpfvyc56P+1pNeLuDkvCFah740bU3fFFlIw+KZsBJ6P0oNH/8xmU\nx6Qdyr+egfMZXoeG4lmDc5C2JPwE+DOwCwpOh5FsPuO0U6WmVX6U4GkN0k3wZFVwkLakzCa9lKXv\noSCZlrTK3xQld1o9/I0sAY5KoT5WAQ/Bs3bQDbxE8qlSG6X8nYC/J1SWmdmg9QJP5SxPtlH5vyDd\nBE9mZlZEIyZ4sjI5Vam1g7RTpaZdftoJnqwKDtLWDr4MjECPR6eRKjXt8qMET9ui7o6kEzyZmZVl\nCLAPsAh4FpiIZk1vh/LTTjBlZlbU1sDZ6EnHc4Ed0QMmSSW/T7v8yG8TLs/MrKS0U6WmXX5ckg8R\nWQ14nLS1srRTpaZdfj7TUJ+4NQk/cWitLO1UqWmXn48DtJk1nLRTpaZd/mbA74A7yKYpvTPB8q0K\nbklbO1gb5c+IvBe2tUv5aSeYsio4SFs7SDtVatrlp51gyqrgG4fWLrYlmyp1BsmPckiz/G7STfBk\nVXCQNmt9vQzs3sjg2VnMzMyq01n6EDNrcrNRmtKFOGdH03GCJbPWl3aCJzMzK0PaCaasAm5Jm7WH\nrdFj6L8AbgAOQnMd+qEWM7OUNVKCJxsk90uZta5GTPBkg+QnDs1aVyMmeDIzsxxpJ3iyKvjGoVnr\nSzvBk1XB3R1mrS/tBE9WBd84NGsPaSeYMjMzMzMzMzMzMzMzMzMzMzNrQf8fVDf+6C9mQpEAAAAA\nSUVORK5CYII=\n",
"text": [
"<matplotlib.figure.Figure at 0x3cd1e80>"
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The two clpx papers and the two dyn-steps are most similar to each other, as expected, while all the dyn paper do bear some similarity to each other. For a quicker readout, I've grouped the data into three similarity levels (in practice this could be done automatically with a clustering algorithm)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"levels = [0.06, 0.075]\n",
"binned_df = dist_df.copy()\n",
"binned_df[(dist_df <= levels[0]) & (dist_df > 0)] = 1\n",
"binned_df[(dist_df <= levels[1]) & (dist_df > levels[0])] = 2\n",
"binned_df[(dist_df < 1) & (dist_df > levels[1])] = 3\n",
"plt.imshow(binned_df.values, interpolation='none')\n",
"ax = plt.gca()\n",
"plt.xticks(xrange(len(binned_df.columns.values)))\n",
"plt.yticks(xrange(len(binned_df.index.values)))\n",
"ax.set_xticklabels(binned_df.columns.values, rotation=90)\n",
"ax.set_yticklabels(binned_df.index.values)\n",
"plt.title(\"Similarity between papers\\nLower value = more similar\")\n",
"plt.colorbar();"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"png": "iVBORw0KGgoAAAANSUhEUgAAAWIAAAFSCAYAAADIJtXXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYHFW5x/HvzCQQJAmETRESJgQMiyEgi0aDDKACGnZR\nBEWQC4gX8coiV0GSoGzi9QIi+xJAAsgmQYgBJZMFkCUriCCBDEG4CAECARJIZvr+8Z6ia2q6unt6\nqVPT9fs8Tz3TXV1V53RD3j596pz3gIiIiIiIiIiIiIiIiIiIiIiIiIhITR0OTKvw3F2BZ0LPO4A9\nq6jLcqC1ivPDOqiuLiIiNTUWeBhYBrwBzAZ2qkM5i4E9anStScAvqji/mrp0VHGuSF30810Bqcpg\n4E/AccAfgDWxluwHPisV0Q9Y7bsSITmgyXcl+oC0/XcTSa2dgLeKvH4kMCv0vAs4HngOeAc4CxgB\nPIK1qG8B+rtj24CXQueGW6G7uHPeAl4Bfhs6LyjnB66c50P7RgDHAh9iXxbLgSnAKcDtkbpfDFwY\n874WA/8N/B14E7gW+xIKjAPmu/o9BIxy+28EOoH3XdmnYq3zk9zrm4TqjqvvG2VcF+CTwB3Aa8AL\nwA9Dr03Aviivxz73p4AdY94brg4/xD6714Ffkf/yGAE8CCx1r/0eWCd0bgeVfTbBuT8BFgIrgBbg\nNOBfrt7PoF8TIj0Mwv5BTgL2BoZEXj+SnoH4LmAgsA0WDB/E+m4HY/94j3DHthEfiD+DBeNmYDPg\naeBHkXKmAeuSDwJdwObu8XXYl0DgE8C75ANKP+DfwA6F3zYdWLDYxL3n2eS7OnZw5+6MBa8jXN2D\nL4pot8ZR2JcBwGHAIuwLCeB72OdV6rrNwBzgDFf34VgQ/Yo7dwIW2PZ2556DfZHF6QL+in1+Q4Fn\ngaPdayOw/vH+wAbADOB/a/TZdABz3blrAiOBJdh/H4Bh5P8bikjIVlhgewlYBdwNbOReO5KegXhM\n6PkTWKsw8Gvy/6jbiA/EUf8F3Bkppy1yTDQQR/uIpwL/4R6Pw1qNcRZjLevAPlgABbiM7kEerCW3\na+jc8PsYgbUcm9y5x5J/39dj763Ydb8IfBZ4MfLaT7HWKFggvj/02jZYqzxOF/kgDvYr5i8xxx6A\nBc9AtZ/NkaHXtsACdxD4pU6afVdAqvYM1qobCnwa+4kc95Me7B9WYEXk+UqstVzKp7C+6f8D3gbO\nBtaPHPNS9KQSrge+7R5/G+tGKCZ8/SXY+wZroZ+M/fQOtk1Dr0c9D7wHbI8FpD9h3S2fwoLsjBLX\n3di99snIaz8l/4UI3T/n94EBFP/3F/f+Po612P+FffY3Uvyz7+1nEz53EfZFNMHV/2bs/UqNKRA3\nlmexgPbpCs/PlXncZVh3xBZYd8Lp9Px/qdi1Cr12N7AdVvevATeVqMOwyOOX3eMl2BfDkNA2ELi1\nSNkzgEOwVt8r7vmR7tz5ZVx3CdaaDL82GGvZx5VZStz7Owfr5/409tl/h56ffaWfTaG63ox9QW3m\nXju/929FSlEg7ttGYjeaNnHPhwLfonj/Y1RTzONiBmI3u97HukaO70V5YK2raF/jCuxm12TgUazF\nF6cJ+E/sfa+HfREEweQq4PtYH3YTsDYW2IOW/r+x7oiwGcAJwEz3vN09n0U+MBW77mPY5/ETYC3s\nJtenyQ8jrGSUxink+4hPDL2/gVgL/h33/k+NnNeE3Wys5LOJ+hTWjbMmdj9hJfYlIDWmQNy3Lcf6\nJx/FbnY9gt2oOdm9nqN7C6dQyyz6eqnjwYLEYVgwuBL7qdybcq7B+knfonvfctCaL9UtkcNazPdj\nXQvPAb90r80BjgEuwfp+nyN/AxLgXOym2lvkR0vMxIJREIgfwgLqzNB5xa7bhbV+t8dGTLyOfS6D\nQ/WNfialWsl3uzLnYd0lQX/zROxm6dvAPdiXV/Szn0zvPpu4uqyJfV6vY91QG2BdLiLSwIZirb1y\n+qkbWfjGZm/VcuKNJEQtYkmLZqwlfzPWuhfJDM2skzRYG+u7XYyNtc26Sm7uiYiIiIiIiEjFJlB6\npIT0VE1q0aewCSugz9873azrWzpozDy86hOtzCDs/4lKfJr88Dx9/p4pEPcthcaj+lDrm7xZSkvZ\n4rsCBVTz+SuG1IA+xMawJpZf4mW3/S+whnttBnCQe/wFbIzqV93zPbEJA4HvYVOX3wT+TPepsuHU\nls8WqMNUbLZb2AIsKQ3ARdgU27exZENjY95LGz3zVHSQ/yXQhKV5XIRlnruVnlnnqnUkNqnjN9jE\nj0XA57GcHkuwER7hSSLrADdgKTA7sNlsTQWutRQYj/23+TWWKOhVbMr4gJi6bIH9N1yGTay4JfRa\neLzxJOBS4D6sy2IWljXtIvce/oFNOAl0ED/e+DZsAscyV/Y2odcmufrehw0zbIu5hvSCAnFjOB2b\ntjrabbtgs8fApuu2uce7YTO/vhh63u4e74/NmjoQm0E1CxvTG7Y/lkJxG3qajE2vDmyDBfJ73fPH\nXN2GuGNvI/9lUUr4l8CJwH7uPWyMBZnfxZw3jO4JbqLboUXK3AX7IlkP+xz+gM1oG4ElJboE+Jg7\n9rdYN8Fw7DM9Agva4Ws9jyUBOgfL17AF9nlsgU1HPjOmHr/AvhTXdcddXKTOh2D/L2yA5Xz+G/C4\new+3Y18GgWK/rO519doQy+wWzfvxLVevgdiXjEimxM2aWkT38bdfcceCtSQXuMdTsby2QS6KGeRb\nrFOxFnGgGZvlNtQ9L5TaMmwQ1kIKjj8buLrI8W+ST0o+gfzNojZ6tojD7/tpun8GG2NBp5aNiiOB\nf4aej8Le/4ahfUuxJEUtWB6GrUKvHQtMD10rnCKzCfucwjPnxmBfkIVcD1xBPp9IWDS16BWh107A\n8kuH30N4EYHwZzqB+Jt167pyBrnnk9wmNaQWcWP4JN3/sYdTHz6CJW/ZCPtpegMWLNfHWrfBDZvN\nyP+MfYv8yhThAFAsteVyrCUVtIoPpXtL6hQsiC5z118Ha7n1ViuWrD2o59PYkj4fr+BaxUTThYJ1\nDYT3DcTeQ396fv5xn9uGWEt6Dvn3MJX4z+InWPB+DBvpcFTMcWBdI4GVkedBfUtpAc7DvtzfJv+F\nHtQvR+9TnEoJCsSN4RW6D2Ma5vaBZUibg+WVfRJLHv8wNp14EdYyBQsex9I9ReLa2M/bQKkbhTdj\ngXgM1ucZtAp3xbKEHYK1sIZg/8gL3SR6j/xPfrDAEG6JLiG/GkmwfQzr04wahn1BxG3fKnBOby3F\nPtPWSLnh7HG5yPErsK6boP7rkk8QFPVv7L/LJtjahJdS31UyDsO6fvbEviyHu/1ZuqGaOAXivmcN\nLMgFWz8sAJ6BtVo2wPobwz81Z2A30oIk5+3YT9cZoWMuB35Gvv93HSxw9sZ9WMt6It1vKg3CWq1L\nXf3PJD7w/NO9r69iLc0z6L7m2uVYP2twI3FDLHAUssSVHbdF+8Ar0Yn1H5+NtTg3A36MrSVXSBeW\njvJC8l8wm9B9RY6wQ7Dk7WC/JnLuGlG1CpQDsa6WN7Ev4nPqVI6EKBD3PfdhrdxgOxNLc/gElgJz\noXv8y9A5M+ie5nEm9o8snObxj9hNpFuw1uqTwF6h18sZNvchltZyT+yGXODPbvsndrd+BRYkw9cO\nrv82NjrjaqxV+S7dfwpfhK0xdz+WhvMR7GZYLfU2beUPsZb8C9hNzpuwPtu4a52G/Rr5G/Z+H8C6\njwrZyR23HEuNeSL5scOlUpiW+x7Cx96AdbO8jHWFPFLGdUVERBI1AMsBPh+7R3FuzHEXY8M9FxC/\nEK6IiFQouI/RD/vFEh0X/1Xs1yvY4g1/owh1TYiI9F6wCvca2A3lNyOv74cNPQRrPa9LkZE9CsQi\nIr3XjHVN/BsbHfR05PVN6H5v41/kb7oWvJiIiPROFzYuf1NslmdbgWOiI0xib3JqhY6KbZbrPoZf\nRBIygypzXAyA3MryD19O/HDLt7GJTDuRTxcANupkaOj5pm5fQQrEFXsRmiZUfnpuOjTtXvn5C+NS\nE5Tp0gnwgwkVnz5+VOU/ptrxmymm2vKbqhxJOz0Hu1dxjfFVzmub8D8w4eTSx8WeH/sDuzztVP75\nN/fvz/hVq3arrgY27fCXJY8yZ+Sndwc2wMbFL8NW+/4yNnY+bAo2Vv8W4HPu2H8TQ4FYRDKpf+Wn\nbozdiGt2243AX7GZj2A5P+7DRk4swsaYF5uarkAsItlURfB7EsvEF3VF5PkJCdRFqtPqt/id27wV\n3eqtZJUP0DbGb/mtfov/SBUt4ppTIPalaXjpY+pJgdib4Z6zNbR93m/5rX6L/0iagl+a6iIikhi1\niEVEPFvLdwVCFIhFJJPSFPzSVBcRkcSoa0JExLM0BeJGzTUxAVsKqBJfxFauXQUcXKsKiUi69Ctz\nS6oujaiaFQReBL6LLXYpIg1KLeLaOwLLgj8fW+olrB1bH2weNiNmZ7f/QuDn7vFe5Ndve9EdV2hd\nMBFpEGoR19a2wOnYysFvYqvinki+VZzDRqrsgK0mfC0wCvgp8DgwG1sHbZ9Eay0iXmn4Wm3tga2i\nG2TIf6vAMcFqvbOwdHaDsYUnj3H7fgQsrm81RSRN0tQ10QiBOEfvl/gOWsvbAa9j2fSLHRfz6vTQ\nk1b/05ZFGlAH+WWrmzo7a3bdNAW/NNWlUg8CdwG/wVrF67n9TaG/38T6isdieUGXA5sBJ2FdFlOx\n5eQfC123iVIBvpp8wiJSllby+SmaW1po76rN7Ru1iGvraeBs7GZbJ3ZTroPufcQrsSFp/YDvuf1X\nY0PcXgWOBiZhWfZHA3difc3jsKFwo+r9JkQkWVUEv6HYoICNsPhyJXBx5JhTgMNDRW2NJZRfVuO6\npMoN9BwtEXYj8OPIvi+HHs/FuinAbuANRUQaWhUt4lVYPJkPDATmAA8A/wgd82u3gTXo/ouYIAyN\nE4hFRHqlikD8qtsA3sUC8CfpHojDDiM/YKCgLARideSKSA9rlRv9Vhd9tRW7z/RozOsfw+Yp/KDY\nRbIQiEVEeugXE/1mdcLs8u4HDgRux4a/vhtzzL7YXIXYbglQIBaRjOrfUnj/Hi02OSFw3nuFTwfu\nAH6PjbiKcygluiVAgVhEMiquRVyGJuAabMTWhUWOWwdLInZYybpUXBURkT6sf+XR7wvAt4GF2HBZ\ngJ8Bw9zjYDXnA4BpwIpSF1QgFpFsiumaKMNsykuYdr3bSlIgFpFsSlH0S1FVREQSNMB3BfIUiEUk\nmyrvmqg5BWIRyaYURb8UVUVEJEEpin4pqkoftOBMf2VvN9Ff2SKNQF0TIiKepSj6pagqIiIJSlH0\nS1FVREQStKbvCuQpEItINqUo+qWoKiIiCUpR9EtRVUREEqRREyIinqUo+pWTQUhEpPH0K3PraSgw\nHfg78BRwYpFSdsYWWzqoVFVERLKn8lET5aziDNb5cT7wZyyZfCy1iEUkmypvEb+KBWHovopz1A+x\nNe1eL1WVRg3EE4CTKzz3JOwnxwLgL+Sz7otII2kpcyuulcKrOG8C7A9c5p7nil2kUQNx0Tddwlxg\nR2A09m32q5rUSETSpfIWcaDYKs4XAv+NxaImSnRNNEof8RFYCziHrSP1fOi1duxnxG7Y+/0e8Dj2\nQb0B/ALYC1tzajd3fOBRbG0qEWk0MdGv/WVof6Xk2aVWcd4RuMU93gDYB+tbntKLqvQp2wKnA2OA\nN4Eh2F3MoFWcA9bCfj7sClwLjAJ+igXk2cBF2AcVdTRwXx3rLiK+xES/ts1sC0yc0+OQclZx3jz0\n+DrgHmKCcJGq9Cl7AH/AgjDAWwWOudn9nQUMdts7wDFu34+AxZFzvg18Brs7KiKNpvIJHeWu4ly2\nRgjEQR9Mb88B2A67o7lJ5PUvYR/sF7GfE4VdOiH/eOc220SkpjrcBtDU2Vm7C1e+Zl25qzgHjip1\nQCME4geBu4DfYK3i9dz+ptDfb2J9v2OBZcByYDNshMQOwFSsn+cx9/xyrN94adGSfzChVu9BRGK0\nug2guaWF9q6u2lxYU5xr6mngbGAG0In9VOigex/xSmw0RHCzDuBq7Abfq1hf8CRsFsyvgLWxu6EA\nLwIH1PctiEjiUhT9UlSVqtzgtjg30rOv98uhx3OxborofhFpVCmKfimqiohIglIU/VJUlbrZ3XcF\nRCSF1EcsIuJZiqJfiqoiIpIgrVknIuJZiqJfiqoiIpKgFEW/FFVFRCRBulknIuJZiqJfiqoiIpKg\nFEW/FFVFRCRBKYp+jbpCh4hIcWuWufVUzirOWwGPYHluSi7blqLvBBGRBFUe/cpZxfkNbPHQshKG\nKRBXYfxof7ddc9WsylcDZzWd6a3srpfO8lY2+P/sJw71W37DqPyf76tug+6rOIcD8etu+1o5F1Qg\nFpFsqk30a6XwKs4eqiIi0tdUH/2KreKccFVERPqimK6J9rnQPq/wayGlVnHuFQViEcmmuFWcd7Et\nMPG6HoeUs4pz+NhKqyIi0uAqXzy0nFWcPwE8jq0Y34V1X2xDTBeGArGIZFPloybKWcX5VWy8cVkU\niEUkm1IU/VJUFRGRBKUo+qWoKiIiCVIaTBERz1IU/fpC0p8JlJE0owztwGfc43uxu5lxTgAWYXc7\n16tB2SKSNv3K3BKqStrVamZ/+Dql5n/PBu7BgreINKBcihYPTWuL+HTgWWAWMBLrzZkTen3L0PMO\nrNU8BxvXN7KM63dgLd21sdbxfOBJ4Bvu9fnAi5VXX0TSrrNfeVsS0tgi3hH4JjAam0Y4Fwuyb7t9\nC4CjgGvd8Tksy9GOwPHAKcAxJcoIWsd7Ay+TbyEX664QkQaSVJAtRxpbxLsCd2IJlZcDU9z+q7EA\n3Iy1XCeHzrnT/Z2LZUMq10Lgy8B5wFjgnUorLSJ9y+qW5rK2JKToO+EjObrPz25y++4AxgMPYi3k\nt0LHfOD+dpJ/T9OAjbBphsfGlPUclsLua8Avgb8Cvyi3ou2hxLStQGtTWdPKRaQXOtwG0NTZWbPr\ndvYrN/x9WLMy46QxEM8EJgHnYl0T44DLsWA7DbgM+F4Z19mrjGM2xgL6TVjXx9EFjomNrm0KvCJ1\n10r+Z25zSwvtXV01uW5nS3oGEqcxEM8DbsX6gl8DHgu9Nhk4ELg/tC8XeVzOKIvgmFHABdgwtVXA\n993+E4FTgY9j3Rf3Et+qFpE+6APWKPPIFXWtB6QzEAOc47aosdhNunCw3Tz0eA6wR8w1dy9wzv10\nD+qBi90mIg2qM0XhLz01Ke0uYDjxgVZEpGydlc9xvha7r/Qa9qs6agMsYfwnsBj7a6y7NVYaR03E\nORDYHnjTd0VEpO/rpKWsrYDrsKGvcU7Auli3B9qA/6FEo7cvtYhFRGqmihbxLIoPk/0/YDv3eDDw\nBrC62AUViEUkk1bXL/3aVdgw21eAQeRn7MZSIBaRTKrjzbqfYWkS2oARwAPYrODlcScoEItIJn0Y\nM3ztifb3eKL9/Wou/XngbPf4eWAxlgPnibgTFIhFJJPiuia2bxvM9m35tDNXTlza20s/A3wJeAib\nizASeKHYCQrEIpJJVXRN3Azshg1TewlLvdDfvXYFNgfiOmxSWjPwE0qM9lIgFpFMqmLUxLdKvL4U\n2Lc3F1QgFpFMqiIQ15wCsYhkkgKxiIhndRxH3GsKxNXI1Wo5vd5r9pyBc/WSstM211zzsPHeygZo\nWnim1/JXL/EbQCYO9Vp8zXxIehatUyAWkUxS14SIiGfqmhAR8Uz5iEVEPFPXhIiIZwrEIiKeqY9Y\nRMQzDV8TEfFMXRMiIp4pEIuIeJamPuIkVnGeAJxcp2vvD2xdh+teC/wbeLIO1xaRFOikX1lbAaXi\nQxvwNraS8zzgjFJ1SSIQ1zMhw4HANnW4bqnlskWkj+ukpaytgHLiwwxgB7f9slRd6hWITweexZad\nHgm0AHNCr28Zet6BtZrnAAvd8YWcB/wdy3p/ATAGS758AfatMxxbqG8qtjbUzNC1JgGXA4+7en3N\n7d8WeNSdvwDYwu2fBbzVq3csIn1KFYG4nPjQq7Rc9egj3hH4JrZqaX9gLhZk33b7FgBHYc17sBbz\n6+6844FTgGMi11wfOADYyj0fDLwDTAHuAe50+/8KHAcsAj4LXArs6V4bBuyMBdvp7u/3gYuAydhn\noT5zkYz4oH7D13LYAqILgJexmPZ0sRPqEXh2xQLjSrdNcfuvxgLwScA3sKAYCALpXOCgAtdc5q51\nDfAntwWCb56BWCv5ttBrwTKtOeAP7vEibCG/rYCHsdb7pq4Oi8p7iyLS19Vx1MRcYCjwPrAP8Efg\nU8VOqEcgztG9Wd7k9t2BLbL3INZCDjftP3B/O0N1mgZshHUnHAvsgrVuvw6cQL6lG/RBN2MBe4cy\n69mFLQL4N2AccB/Wmp5e5vm0hx63uk1EaqvDbQBNnZ01u25cIH6xvYMX21+s5tLLQ4+nYr/M16PI\nAqL1CMQzsT7Zc7GuiXFY/+wHWHC9DPheGdfZK/R4bbdNxVqxz7v9y7FuCrCuisVYoL4d+wIYhfU7\nNwGHANcDm7vtWff3BeC3WNfFKHoRiNvKPVBEKtZKvpHT3NJCe1dXTa4bF4g3bRvBpm0jPno+e+LM\n3l7648BrWCNxFyz+JL6K8zzgVqx/5DXgsdBrk7GRDveH9uUijwuNshgE3A0MwN7Uj93+W4CrgB9i\nAfhwLNCfgX0J3IwF4hywxNVlMNby/RALzt8BVgH/B5ztrhssl70+tlz2mdidUhFpEFWMIw7iwwZY\nfBiPxRuAK7BYdDywGuueOLTUBet1c+oct0WNxW7ShYPt5qHHc4A9Cpz3KnbzLephbORD2D4xdXoA\n+3DCzndbVKnlskWkj6siH3Gp+PA7t5UtyVECd2FDzAoFWhGRRH340b18/5IMxAcmWFbUUR7LFpEU\nStMUZ42bFZFM0lJJIiKeKfuaiIhnCsQiIp4pEIuIeKabdSIinmnNOhERz9Q1ISLimbomREQ80zhi\nERHP1DUhfV5TEqsdxsjNH++vcIDtJnotvmmJ1+IbRpoCscd/TiIi/qympaytgFKrOB+OpQFeCDwE\nbFeqLmoRi0gmVTF87TpsMYkbYl5/Afgitk7n3sCVwOeKXVCBWEQyqYquiVkUXxntkdDjR7E1MYtS\nIBaRTEqoj/hobD3MohSIRSSTEhhHvDu2PucXSh2oQCwimRQ3jvi99id4v/2Jai+/Hbae5t50X7G+\nIAViEcmkuK6JAW2fZUBbfonMpROv7O2lhwF3At8GFpVzggKxiGRSFX3EpVZxPhMYgq0oD7ZK/C7F\nLqhALCKZ9EHlw9dKreL8H24rmwKxiGRS1mbWTQBOrtO19we2rvE1hwLTgb8DTwEn1vj6IpICnbSU\ntSUhiRZxro7XPhC4B/hHDa+5CvgxMB8YCMwBHqhxGSLiWWdX47eITweexWagjARasIAW2DL0vANr\nNc/B5maPjLnmeVgrdQFwATAG2Nc9ngcMB0YAU4EngJmha00CLgced/X6mtu/LTbzZZ677hbAq1gQ\nBngXC8CfLP+ti0hfsHp1S1lbEurRIt4R+CYwGruTOBcLsm+7fQuAo7DEGWAt5tfdeccDpwDHRK65\nPnAAsJV7Phh4B5iCtYjvdPv/ChyHDRn5LHApsKd7bRiwMxZsp7u/3wcuAiZjn0X082gFdsCCtYg0\nkM7V6blFVo+a7IoFxpVum+L2X40F4JOAb2BBMRAE0rnAQQWuucxd6xrgT24LNLm/A7FW8m2h19Zw\nf3PAH9zjRVhSjq2Ah7HW+6auDuExfwOB24EfYS1jEWkgnQm1dstRj0CcIx8ccY9zwB3YeLsHsRZy\neLbJB+5vZ6hO04CNsO6EY7FxeHsCXwdOIN/SDfqgm7GAvUOZ9ezCxgP+DRiHzQc/Dmst93f1/T3w\nx7gLtIcet1I8C4iIVKbDbQBNnZ01u+4HK9YofVBC6hGIZ2J9sudiAW0c1j/7ARZcL8PmX5eyV+jx\n2m6birVin3f7l2PdFGBdFYuxQH079gUwCut3bgIOAa4HNnfbs+7vC1hKu2Hu+OlYy/tp4MJiFWwr\n402ISHVayTdymltaaO/qqsl1uzrT0zVRj5t184Bbsb7g+4DHQq9Nxlqi94f25SKPC42yGIT1BS/A\nbgD+2O2/BTgVa2EPxxIyH43dbHsK2C903SWuLkHL90MsOD/l6rwtll90LDY1cXe3fx42X1xEGsnq\nlvK2BNTrK+Ect0WNxW7ShYPt5qHHc4A9Cpz3KnbzLephLICG7RNTpwewm4Fh57stbDZauUSk8TV4\nH3Gcu7BWa6FAKyKSrNVNpY9JSJKB+MAEy4o6ymPZIpJGq31XIC89vdUiIkla6bsCeeoLFZFsWlXm\nVtjewDPAc8BpBV4fgnXHLsAmhEXvZXWjQCwi2dRZ5tZTC3AJFoy3wdJiRpOP/QyboDYaOAKbwRtL\ngVhEsml1mVtPu2CzcDuwNvMtWCbIsK2xOQlgcxZagQ3jqqJALCLZVHkg3gRbmSPwL7cvbAH5dA27\nAJthqRQK0s06EcmmykdNlJPa9zysO2Ie8KT7Gzs/W4FYRLIpLhAvbIcn24ud+TK2gERgKNYqDltO\n91QOi7F0CgUpEItINsUNX/tUm22ByROjRzyB5VRvBV7B0v5G17FbB1iBpVI4BphBkSyOCsQikk3x\nQ9NKWY1lgJyGjaC4BltA4jj3+hXYaIpJWDfGU1gOnFjpmePX9+Ry0R8jiRbur2yAszbzV/aZL/n9\n3zZXm+RfFes37Odeyz8zd5a3spv792f8qlVQfezKcVOZ/4gOb6pFeUWpRSwi2aQpziIinikQi4h4\npkAsIuKZArGIiGcpyr6mQCwi2VT58LWaUyAWkWyq3YLQVVMgFpFsUh+xiIhnCsQiIp4pEIuIeJai\nm3W9TQw/ATi5DvUAy3AfXW6kGt8FNq7h9USkkXxQ5paA3gbieqaaORDLWFRISwXXOxL4ZC/P0S8E\nkayofIWOmisnEJ+Orbk0CxiJBcU5ode3DD3vwFrNc4CF7vhCzgP+ji0ncgEwBtjXPZ4LbA60A/8L\nPA78CLhBnf+0AAAQg0lEQVQOODh0jXBuz9NcefOBc91xOwE3uesNcHVbzx2/E/n1pCYANwKzgeuB\nDYDbgcfc9vmY9yAifVl1qzjXVKkW4I5Y0uPRQH8sqM0B3nb7FgBHAde643PA6+6844FTsKTIYesD\nBwBbueeDgXeAKcA9wJ2ha/UHdnbPr4tcJ2id7wPsh60LtRJYF1iG5Qs92dU5fHwhWwFjsR8ik7Ev\ngIeAYcCfiW+pi0hfVd044r2BC7GG6dXA+QWOacNiSX9gqXteUKlAvCsWGFe6bYrbfzUWgE8CvkE+\nWEI+kM4lv3he2DJ3rWuAP7ktEM35eWuJ+gF8CfsiCCYsLityvUJy2PsKeoO+RPe+6kHAx4D3oydO\n+J/847Yx0Ka2s0jNdbgNoKmzhrMwKu92aAEuwWLFy9iv9ilYcvjAusDvgL2wZZQ2KHbBUoE4R/dg\n1uT23QGMBx7EWshvhY4JAlpn6PrTgI1chY/FWq97Al/HWq57hsoLey/0eDX5rpRmYI2YOkbrX+j8\nAZHjwkG2CfgstsRJURPqddtSRD7S6jaA5pYW2rtqlJm/8kC8C7CI/PfDLdhgg3AgPgyLk8HyEUuL\nXbBUH/FMrBthANYyHOf2f4AF18vId0sUsxewAxaE18a+LaZiLerR7pjlWDdFnA6sywOsK6K/e/wA\n1jpfyz0fEnO9DqxvGLr3NUeD+P3AiaHn2xepk4j0VZX3EW8CvBR6/i+3L2xL7J7UdGyNu+8Uq0qp\nQDwP6x5YANyH3bwKTAa6sMAVyEUeF+qXHYT1BS/AbgD+2O2/BTgVa2FvXuC8q4DdsBtynyN/s24a\n9rPgCVffoJ06Cbic/M26idjy1o9j34VB3aL1PBEL2AuwG4rHFqiLiPR1lQ9fK2f0WH/gM8BXsYbo\nz7HgXFA5w7XOcVvUWKw1HK5UOIDOAfYocN6r2E//qIeBbUPPd4+8/ho2uiLw36HH59Ozs/xO8v3V\nYKMiCo3iiC7R+gZwaIHjRKSRxHVNvNEOb7YXO/NlYGjo+VDyXRCBl7DuiBVum4n9+n+u0AUrHTd7\nFzCcwoFWRCT94oamDW6zLbAo2lbjCax12wq8go0s+1bkmLuxG3otwJpY4/M3cVWpNBAfWOF5IiLp\nUPkAjNXYIINpWKC9BrtRd5x7/QrgGWzo60KsC/cq4Om4C2ommYhkU3Wz5qa6LeyKyPNfu60kBWIR\nySZlXxMR8Uxr1omIeKYWsYiIZwrEIiKepSgxvAKxiGSTVnEWEfFMXRMiIp6lKBCXk69XCstNyPCn\nl6vnolklNGX4cwfoyvn9AM7iTG9l9+/fzKpV46H62JVjSJn/E7/VVIvyilKLWESyKUUtYgViEckm\nBWIREc80fE1ExLMUDV8rtUKHiEhjypW5FbY3luryOeC0Aq/vj63yM4/4RTI+ohaxiEjvlLOK81+w\n5PAAo7DFNLaIu6BaxCIivRNexXkV+VWcw8Ir0A+kxCrOahGLSEZVfLeu0CrOhdbhPAA4F9gY+Eqx\nC6pFLCIZtbrMrYdypzP9Edga2Be4sdiBahGLSEbFtYhnYYu+xypnFefoBfsB62OrxPfQyC3idYDj\nKzx3AnCyezwJOLgG9RGRVIlrAY8BTg1tPYRXcV4DW8V5SuSYEeSnRX/G/S0YhKGxW8RDgB8Al1Vw\nbnjgSvFBLCLSR1XcR1zOKs4HA0e4Qt4FDi12wUYOxOdh30rzgAewb6PDsaWtpwI/da9fAmwIvA8c\nAzzrzg8n+ch4mhmRRlTV1LpSqzj/ym1laeRAfBqwLbADsA9wBjbsZCWwrjvmSuxbbBF21/NSYM/E\nayoiHqzwXYGPNHIgDrdivwRcS37d1mXY2L4xwG2h49ZIpmoi4l96sv40ciAOy9Gze6EZC8g7FDmn\nqOmhI1qB4erAEKmDDrdBZ2ct/5GlJ+tPIwfi5cAg9/gvwM+Bm7DfI0OAt4DFwNeB27FAPQpY6M4p\n+V98dwVekQS0ug1aWprp6mqv0XXT0yJu5OFrbwAPAU9iCTemYMNO5pEfmnY4cDQwH3gK2C90fi7m\nsYg0hFVlbvXXyC1isEAbdn7keQd2Iy9qYujxUbWskIikRXpaxI0eiEVEYqiPWETEMw1fExHxTF0T\nIiKeqWtCRMQzBWIREc/UNSEi4ll6WsSNPKFDRKSIFWVuBZVaxRngYvf6AuJTKQAKxN4s9jxXz2f5\nHf6KBrL92QN0eJ8o2uG5/EDFSyUFqzjvDWwDfAtbEinsq9iqzVsCx1IiL7oCsScdGS7fZ9kq33/5\naaiBqXiKczmrOO8HXO8eP4ql3v14XE0UiEUkoypuERdaxXmTMo7ZNK4mulknIhlV8c26cvt2ovkZ\nffcJNaR28uvZadOmLbmtner1prx3Iud+Dvhz6PlP6XnD7nK6r1P3DEW6JkREpHf6Ac+TX8V5PoVv\n1t3nHn8O+FtSlRMRyYp9sIWGF2EtYrD1L48LHXOJe30B8JlEayciIiIiIiJ9QFIrr2wN7ImtGh62\nd0Llh33MQ5kiqfdkAmUMwwaczwZ+BvQPvfbHBMqPc6XHsqH7GM96ORHrR/wj8CJwQOi1eQmUH/g8\n8DT597w9cGkC5R4MHOT+RreDEii/z9A44vo7uMC+HDbGcOMEyr8WW6X6UWyh1BnYrJ+lwGZ1Lnu9\nmP1NwNfqXDYU/6LbKIHyjwV2BN7F7rDf7v5emEDZYRdiLfC73fP5wG4JlLsv9v96nDsTqEOfoEBc\nf7cAk4GuyP4mYEAC5W+IjWkEOAH4NjAT+0dSb0uxlmBcveptIywAvVXgtYcTKL8JC8Jg02HbgDuw\nL8DoYP96WxJ5nkQOyCMTKKMhKBDX35PAryncOtszgfL7YQF/pXv+e+BVYBqwdp3LfgF7j4WCcRJd\nA/difbOFugFmJFD+a1g3wHz3/F1gHHANsF0C5QeWAF9wj9fAukz+kUC5J5P/9RcInueA3yRQhz6h\nxXcFMuAf2D/Itwu8Nht4pc7lD8D+O3eE9r0AzAJGAzfWsewu4HUs8Ed1Yt0l9XQ3PVuCgTvqXDbA\ng9isrHdD+7qAu4C/El+3etRjPPalcDLwPvBD6r965l5Y4A9va4b+JvFlKFLSGp7LX9Nz+UnZgnw3\n0O5Yi3DdjJTfD7gpobJEUm8GMDz0fBdgYUbK/wYw2D3+OdYiTHKm0QIsIG0B/BO4gPz00yyUPxu/\nX7ojsV8Af3fPtwPO8FcdybK9sMQf/wmcg/VbJhmMfJYf9I+PxRK2jAMeS6hsyPcR/wT7SR7el4Xy\nbwQex74ET3bbSQmWPxP4LPn33EQ+KAu6WZekacDxwANYv+kOFO47bcTyO93fccBVwJ+AXyRUNsCH\nwGHAEeRHi/SPP7zhyn/ebc3YzcvgZllSPkb3+wE50rRgnGTKz4GngDFYYpBnscCUhfLvxSZwLMb6\nRgdgP9eTsi3wW2xJG4DNiV9nrBHL920q1i0TtIi/7vaJJO5CYK3Q882w1mkWyl8bm9iypXu+MfCV\nhMoOrImNEhmFn5ukPsufXmB7MMHyR2B9xO9jo4Qewia2iJP0oHKxm1Y5YHkGyh+MDd+Km2H3ZgJ1\nAJvFdzk2bA+sRXocyd0w813+TqHHA7AvxdXAqQmVHxiIdY9EE62LJGZn7KbVi25bQPd/II1Y/r3u\nbwfWLRHeXog5px6exX4aB0a4fVkpv5DHEyzrXLoP1xsC/DLB8kU+8iSwa+j5WJIdvua7fJ+iQaep\nwL5GLn+90LYBNu07yS+C+QX2JTlqJPU0aiI5q7HZbIHZJDPf32f5pYbHza1z+YE5WDfAH9zzQ4An\nyGcAq3fyGd/lzyU/SmI19gvl6DqXGdZM92n2a+F/MlOqqI84OcHNspvd829i/2MGU4zrHZR8lN9O\n8WFSu9ehzEImub9BXaLDt+qdm9h3+eEgWGxfvZyGZfy7FnvvRwFTgPMTKj/1FIiT0073f3zRf4z1\nDkq+yxd/5tLz10mhffXyK2ykxpew/+f+AuyBTXAREQ98JIT3PcXWV/kbY/mQn8GC7o7ub5vbl5RC\n/cFJLIrQZ6hFXH8nhx4XapHWOxWg7/Kj5mGz+pI0Exuqdbkruwmb3LJtg5f/XSwn8E5Yn3RgOdZd\nUu++6eOBH2CjRJ4P7R+EjSU+vM7l9xm6WVd/g0h2Omnayo96zUOZvqfY+ir/ercdTDJpP6MmYzPo\nzsP6iYOG33LgDQ/1EeEGbPxkYD3gugyV75PvKba+y9c4XhGn0FjKQvsasfyRWLKfB8jmFFvf5Wsc\nb8qpayI5TVgrNJjWux7JrpDis/zbgMuAq8lnYkuyu6SL/JL2wRTb4UXPaKzyNY5XxDkCm830C+xn\n4bNuXxbKn5NQOXEKtf6SrJPv8k/DWuFHA//hHmcp+1vqqUWcnBuwf3x7YK3BA4GnM1L+PVhC+juB\nD0L76530Z2tgG2AdbBZbMFJkMMmsoO27/MD52HT2YBzvWVh+akkJDV+TJHTQsysih2Uhq6f9sS+c\nfbGZXIHlwC3Aww1evohIaozJePnvYsF/OfaLpAulokyVZt8VkEyYg3VNDCl1YJ0chHUH9MdGLywF\nvpOh8gdi48kHYTfqDgIuTbB8EUmBLbEFSxcBt2ILmSbZLRYsy3QgcA3WZ5tkClDf5ReS5NBJKUE3\n6yQJzwE/w/IrjMOycHW5vxdR/5t2wf/n44DbgbdJdvic7/IPDj1uxnJOrEiwfClBgViSMhpLf7gP\nNt12Mpac/kFg+zqXfQ+W5GYllv9gI5JLAZmG8sOLxAb5iPdPsHwRSYE5WMA9DFtEM+yuhOqwPvkJ\nLGsDn0ioXN/ltwAnJVSWVEjD16SegsxvLRSeUZd05jewNJzHeijXZ/mPY2sWSkqpa0LqKcj8NhIL\nBFOwL/9xwGOe6uQ7IPkofzZwCXaj9L3Q/qSWqhKRFJiFBeXAILqvn5ck3zPKfJTfTj7ZUngTkQx5\nlu5Tegfgfzn5LCk0g7HesxqlF9Q1IUm4AeuKuBPrmjgAS1ielJHAKVjqyeD/+RyWdyML5d9Oz/Xp\nbsOGsUkKKBBLEs4G/gzsigWgI0k2H67vNJy+yg+SDq2L36RDUoICsSRlDv7SYa7CAqEvvsr/FJZw\naB33N7AcOMZDfSSGhq9JFkwAXif5NJxpKX8M8EhCZYmIFNQBLI5sL2So/Avwm3RIRCTz0ph0SEKU\nBlOywHcaTt/l+046JCUoEEsWHApsgk319ZGG03f5QdKhHbGuiaSTDomIfKQZ2A94GXgJmIitZp2F\n8n0nPRIRYTRwITaj72Lgc9gki6QSpPsuP3BlwuWJiAD+03D6Lj8syYk0UiaNI5ZG5jsNp+/yC5mG\n9VFLimhmnTQy32k4fZdfiIKwiHjhOw2n7/JHAlcBD5BPgflgguVLCWoRSxZshOV7CKxy+7JSvu+k\nR1KCArFkge80nL7L9530SErQzTrJih3Jp+GcSfKjB3yWPwG/SYekBAVikcbXQc+uiBxapUNERMS0\nlD5ERPq4OVgKzEUox0QqKemPSOPznXRIREQc30mPJIZaxCLZMBqbUn0BcAdwCLZ2nSZ2iIgkIE1J\nh6QA9ROJNK40Jh2SAjSzTqRxpTHpkIhIJvlOOiQl6GadSOPznXRISlDXhEjj8510SErQzTqRbPCd\n9EhEREREREREREREREREREREJOL/AQGL+2m2DGQqAAAAAElFTkSuQmCC\n",
"text": [
"<matplotlib.figure.Figure at 0x3cd1b00>"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, let's output a list for each paper of the other papers, sorted in order of decreasing similarity:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for paper in dist_df.columns:\n",
" sim_papers_df = dist_df.sort(columns=paper)[paper]\n",
" sim_papers = sim_papers_df.drop([paper]).index\n",
" print 'Papers most similar to ' + paper + ':'\n",
" print ', '.join(sim_papers)\n",
" print '\\n'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Papers most similar to clpx1:\n",
"clpx2, dyn-structure, dyn-steps1, tcell, dyn-steps2, dyn-lis1\n",
"\n",
"\n",
"Papers most similar to clpx2:\n",
"clpx1, dyn-structure, dyn-steps1, tcell, dyn-steps2, dyn-lis1\n",
"\n",
"\n",
"Papers most similar to dyn-lis1:\n",
"dyn-steps1, dyn-steps2, dyn-structure, clpx2, clpx1, tcell\n",
"\n",
"\n",
"Papers most similar to dyn-steps1:\n",
"dyn-steps2, dyn-lis1, dyn-structure, clpx2, clpx1, tcell\n",
"\n",
"\n",
"Papers most similar to dyn-steps2:\n",
"dyn-steps1, dyn-lis1, dyn-structure, clpx2, clpx1, tcell\n",
"\n",
"\n",
"Papers most similar to dyn-structure:\n",
"dyn-steps1, clpx2, dyn-steps2, clpx1, dyn-lis1, tcell\n",
"\n",
"\n",
"Papers most similar to tcell:\n",
"clpx2, dyn-structure, clpx1, dyn-steps1, dyn-steps2, dyn-lis1\n",
"\n",
"\n"
]
}
],
"prompt_number": 16
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"See also: <a href=\"http://www.frankcleary.com/svdimage\">SVD Image Compression</a>"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment