Jessime/Pearson Check.ipynb

## Pearson Check.ipynb
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a random matrix to use in place of the full kmer counts matrix. We'll use numbers between 0 and 10, and use 6 rows and 4 columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "counts = np.random.randint(0, 10, [6, 4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[9, 7, 7, 6],\n",
       "       [5, 8, 5, 8],\n",
       "       [7, 1, 8, 8],\n",
       "       [4, 6, 0, 4],\n",
       "       [7, 5, 7, 9],\n",
       "       [2, 1, 7, 1]])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "counts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll save the matrix for reproducibility and so we can load it from a file into my pearson correlation script."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "counts_file = '/home/jessime/Desktop/count_test.npy'\n",
    "np.save(counts_file, counts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I've got a script that will calculate the pearson correlation matrix. We can go ahead and run that. Since we want to calculate the pearson of all of the rows of this column against itself, we'll pass the pearson function the count matrix twice, along with a file name to save the correlation matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/home/jessime/Code/kmers\n"
     ]
    }
   ],
   "source": [
    "cd kmers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from basic_similarity import pearson"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done with 0 rows.\n"
     ]
    }
   ],
   "source": [
    "pearson(counts_file, counts_file, '/home/jessime/Desktop/counts_pearson.npy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can retrieve the matrix we just made and see what that look like. We can round to 3 decimals so it looks better."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1.    -0.688  0.     0.053 -0.324  0.023]\n",
      " [-0.688  1.    -0.514  0.688  0.    -0.704]\n",
      " [ 0.    -0.514  1.    -0.708  0.849  0.448]\n",
      " [ 0.053  0.688 -0.708  1.    -0.324 -0.945]\n",
      " [-0.324  0.     0.849 -0.324  1.     0.   ]\n",
      " [ 0.023 -0.704  0.448 -0.945  0.     1.   ]]\n"
     ]
    }
   ],
   "source": [
    "dist_mat = np.load('/home/jessime/Desktop/counts_pearson.npy')\n",
    "dist_mat = np.around(dist_mat, decimals=3)\n",
    "print dist_mat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This looks fine. The next step is to row standardize the original counts matrix. We can center out the means first and then normalize the variance by dividing by the standard deviation.\n",
    "\n",
    "Numpy naturally does things columnwise, so we'll have to transpose the matrix, subtract, and transpose it back. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1.75, -0.25, -0.25, -1.25],\n",
       "       [-1.5 ,  1.5 , -1.5 ,  1.5 ],\n",
       "       [ 1.  , -5.  ,  2.  ,  2.  ],\n",
       "       [ 0.5 ,  2.5 , -3.5 ,  0.5 ],\n",
       "       [ 0.  , -2.  ,  0.  ,  2.  ],\n",
       "       [-0.75, -1.75,  4.25, -1.75]])"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "counts_centered = (counts.T - np.mean(counts, axis=1)).T\n",
    "counts_centered"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The third row makes it particularly easy to make sure that the centering is correct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1.60591014, -0.22941573, -0.22941573, -1.14707867],\n",
       "       [-1.        ,  1.        , -1.        ,  1.        ],\n",
       "       [ 0.34299717, -1.71498585,  0.68599434,  0.68599434],\n",
       "       [ 0.22941573,  1.14707867, -1.60591014,  0.22941573],\n",
       "       [ 0.        , -1.41421356,  0.        ,  1.41421356],\n",
       "       [-0.30151134, -0.70352647,  1.70856429, -0.70352647]])"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "counts_norm = (counts_centered.T / np.std(counts_centered, axis=1)).T\n",
    "counts_norm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can take the dot product of this normalized matrix and its transpose (equivalent to the inner product). We'll also print the pearson calculated matrix just to have it nearby for comparison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 4.    -2.753  0.     0.211 -1.298  0.092]\n",
      " [-2.753  4.    -2.058  2.753  0.    -2.814]\n",
      " [ 0.    -2.058  4.    -2.833  3.395  1.793]\n",
      " [ 0.211  2.753 -2.833  4.    -1.298 -3.781]\n",
      " [-1.298  0.     3.395 -1.298  4.     0.   ]\n",
      " [ 0.092 -2.814  1.793 -3.781  0.     4.   ]]\n",
      "\n",
      "[[ 1.    -0.688  0.     0.053 -0.324  0.023]\n",
      " [-0.688  1.    -0.514  0.688  0.    -0.704]\n",
      " [ 0.    -0.514  1.    -0.708  0.849  0.448]\n",
      " [ 0.053  0.688 -0.708  1.    -0.324 -0.945]\n",
      " [-0.324  0.     0.849 -0.324  1.     0.   ]\n",
      " [ 0.023 -0.704  0.448 -0.945  0.     1.   ]]\n"
     ]
    }
   ],
   "source": [
    "count_product = np.dot(counts_norm, counts_norm.T)\n",
    "count_product = np.around(count_product, decimals=3)\n",
    "print count_product\n",
    "print \"\"\n",
    "print dist_mat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now all that's left to do is divide by n where n is the number of columns of the counts matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1.    -0.688  0.     0.053 -0.324  0.023]\n",
      " [-0.688  1.    -0.514  0.688  0.    -0.704]\n",
      " [ 0.    -0.514  1.    -0.708  0.849  0.448]\n",
      " [ 0.053  0.688 -0.708  1.    -0.324 -0.945]\n",
      " [-0.324  0.     0.849 -0.324  1.     0.   ]\n",
      " [ 0.023 -0.704  0.448 -0.945  0.     1.   ]]\n",
      "\n",
      "[[ 1.    -0.688  0.     0.053 -0.324  0.023]\n",
      " [-0.688  1.    -0.514  0.688  0.    -0.704]\n",
      " [ 0.    -0.514  1.    -0.708  0.849  0.448]\n",
      " [ 0.053  0.688 -0.708  1.    -0.324 -0.945]\n",
      " [-0.324  0.     0.849 -0.324  1.     0.   ]\n",
      " [ 0.023 -0.704  0.448 -0.945  0.     1.   ]]\n"
     ]
    }
   ],
   "source": [
    "print np.around(count_product/4, decimals=3)\n",
    "print \"\"\n",
    "print dist_mat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#Column Standardizing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'm going to rerun through everything, but column standardize this time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[9, 7, 7, 6],\n",
       "       [5, 8, 5, 8],\n",
       "       [7, 1, 8, 8],\n",
       "       [4, 6, 0, 4],\n",
       "       [7, 5, 7, 9],\n",
       "       [2, 1, 7, 1]])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "counts = np.load('/home/jessime/Desktop/count_test.npy')\n",
    "counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 3.3333,  2.3333,  1.3333,  0.    ],\n",
       "       [-0.6667,  3.3333, -0.6667,  2.    ],\n",
       "       [ 1.3333, -3.6667,  2.3333,  2.    ],\n",
       "       [-1.6667,  1.3333, -5.6667, -2.    ],\n",
       "       [ 1.3333,  0.3333,  1.3333,  3.    ],\n",
       "       [-3.6667, -3.6667,  1.3333, -5.    ]])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Subtract the mean\n",
    "counts_col = counts - np.mean(counts, axis=0)\n",
    "np.around(counts_col, decimals=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1.4586,  0.8489,  0.4961,  0.    ],\n",
       "       [-0.2917,  1.2127, -0.2481,  0.7223],\n",
       "       [ 0.5835, -1.3339,  0.8682,  0.7223],\n",
       "       [-0.7293,  0.4851, -2.1086, -0.7223],\n",
       "       [ 0.5835,  0.1213,  0.4961,  1.0835],\n",
       "       [-1.6045, -1.3339,  0.4961, -1.8058]])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Divide out the standard deviation\n",
    "counts_col = counts_col / np.std(counts_col, axis=0)\n",
    "np.around(counts_col, decimals=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 3.0944,  0.4808,  0.1495, -1.6982,  1.2002, -3.2266],\n",
       "       [ 0.4808,  2.139 , -1.4815,  0.8023,  0.6364, -2.577 ],\n",
       "       [ 0.1495, -1.4815,  3.3954, -3.4251,  1.392 , -0.0303],\n",
       "       [-1.6982,  0.8023, -3.4251,  5.7351, -2.1955,  0.7813],\n",
       "       [ 1.2002,  0.6364,  1.392 , -2.1955,  1.7752, -2.8083],\n",
       "       [-3.2266, -2.577 , -0.0303,  0.7813, -2.8083,  7.8609]])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Take the dot product of the column normalized count matrix and its transpose.\n",
    "counts_col = np.dot(counts_col, counts_col.T)\n",
    "np.around(counts_col, decimals=4)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import numpy as np"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Create a random matrix to use in place of the full kmer counts matrix. We'll use numbers between 0 and 10, and use 6 rows and 4 columns."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 15,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"counts = np.random.randint(0, 10, [6, 4])"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 16,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[9, 7, 7, 6],\n",
	" [5, 8, 5, 8],\n",
	" [7, 1, 8, 8],\n",
	" [4, 6, 0, 4],\n",
	" [7, 5, 7, 9],\n",
	" [2, 1, 7, 1]])"
	]
	},
	"execution_count": 16,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"counts"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We'll save the matrix for reproducibility and so we can load it from a file into my pearson correlation script."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 17,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"counts_file = '/home/jessime/Desktop/count_test.npy'\n",
	"np.save(counts_file, counts)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"I've got a script that will calculate the pearson correlation matrix. We can go ahead and run that. Since we want to calculate the pearson of all of the rows of this column against itself, we'll pass the pearson function the count matrix twice, along with a file name to save the correlation matrix."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"/home/jessime/Code/kmers\n"
	]
	}
	],
	"source": [
	"cd kmers"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"from basic_similarity import pearson"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Done with 0 rows.\n"
	]
	}
	],
	"source": [
	"pearson(counts_file, counts_file, '/home/jessime/Desktop/counts_pearson.npy')"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we can retrieve the matrix we just made and see what that look like. We can round to 3 decimals so it looks better."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 44,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 1. -0.688 0. 0.053 -0.324 0.023]\n",
	" [-0.688 1. -0.514 0.688 0. -0.704]\n",
	" [ 0. -0.514 1. -0.708 0.849 0.448]\n",
	" [ 0.053 0.688 -0.708 1. -0.324 -0.945]\n",
	" [-0.324 0. 0.849 -0.324 1. 0. ]\n",
	" [ 0.023 -0.704 0.448 -0.945 0. 1. ]]\n"
	]
	}
	],
	"source": [
	"dist_mat = np.load('/home/jessime/Desktop/counts_pearson.npy')\n",
	"dist_mat = np.around(dist_mat, decimals=3)\n",
	"print dist_mat"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This looks fine. The next step is to row standardize the original counts matrix. We can center out the means first and then normalize the variance by dividing by the standard deviation.\n",
	"\n",
	"Numpy naturally does things columnwise, so we'll have to transpose the matrix, subtract, and transpose it back. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 29,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 1.75, -0.25, -0.25, -1.25],\n",
	" [-1.5 , 1.5 , -1.5 , 1.5 ],\n",
	" [ 1. , -5. , 2. , 2. ],\n",
	" [ 0.5 , 2.5 , -3.5 , 0.5 ],\n",
	" [ 0. , -2. , 0. , 2. ],\n",
	" [-0.75, -1.75, 4.25, -1.75]])"
	]
	},
	"execution_count": 29,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"counts_centered = (counts.T - np.mean(counts, axis=1)).T\n",
	"counts_centered"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The third row makes it particularly easy to make sure that the centering is correct."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 49,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 1.60591014, -0.22941573, -0.22941573, -1.14707867],\n",
	" [-1. , 1. , -1. , 1. ],\n",
	" [ 0.34299717, -1.71498585, 0.68599434, 0.68599434],\n",
	" [ 0.22941573, 1.14707867, -1.60591014, 0.22941573],\n",
	" [ 0. , -1.41421356, 0. , 1.41421356],\n",
	" [-0.30151134, -0.70352647, 1.70856429, -0.70352647]])"
	]
	},
	"execution_count": 49,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"counts_norm = (counts_centered.T / np.std(counts_centered, axis=1)).T\n",
	"counts_norm"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now we can take the dot product of this normalized matrix and its transpose (equivalent to the inner product). We'll also print the pearson calculated matrix just to have it nearby for comparison."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 50,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 4. -2.753 0. 0.211 -1.298 0.092]\n",
	" [-2.753 4. -2.058 2.753 0. -2.814]\n",
	" [ 0. -2.058 4. -2.833 3.395 1.793]\n",
	" [ 0.211 2.753 -2.833 4. -1.298 -3.781]\n",
	" [-1.298 0. 3.395 -1.298 4. 0. ]\n",
	" [ 0.092 -2.814 1.793 -3.781 0. 4. ]]\n",
	"\n",
	"[[ 1. -0.688 0. 0.053 -0.324 0.023]\n",
	" [-0.688 1. -0.514 0.688 0. -0.704]\n",
	" [ 0. -0.514 1. -0.708 0.849 0.448]\n",
	" [ 0.053 0.688 -0.708 1. -0.324 -0.945]\n",
	" [-0.324 0. 0.849 -0.324 1. 0. ]\n",
	" [ 0.023 -0.704 0.448 -0.945 0. 1. ]]\n"
	]
	}
	],
	"source": [
	"count_product = np.dot(counts_norm, counts_norm.T)\n",
	"count_product = np.around(count_product, decimals=3)\n",
	"print count_product\n",
	"print \"\"\n",
	"print dist_mat"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now all that's left to do is divide by n where n is the number of columns of the counts matrix."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 52,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[[ 1. -0.688 0. 0.053 -0.324 0.023]\n",
	" [-0.688 1. -0.514 0.688 0. -0.704]\n",
	" [ 0. -0.514 1. -0.708 0.849 0.448]\n",
	" [ 0.053 0.688 -0.708 1. -0.324 -0.945]\n",
	" [-0.324 0. 0.849 -0.324 1. 0. ]\n",
	" [ 0.023 -0.704 0.448 -0.945 0. 1. ]]\n",
	"\n",
	"[[ 1. -0.688 0. 0.053 -0.324 0.023]\n",
	" [-0.688 1. -0.514 0.688 0. -0.704]\n",
	" [ 0. -0.514 1. -0.708 0.849 0.448]\n",
	" [ 0.053 0.688 -0.708 1. -0.324 -0.945]\n",
	" [-0.324 0. 0.849 -0.324 1. 0. ]\n",
	" [ 0.023 -0.704 0.448 -0.945 0. 1. ]]\n"
	]
	}
	],
	"source": [
	"print np.around(count_product/4, decimals=3)\n",
	"print \"\"\n",
	"print dist_mat"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"#Column Standardizing"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"I'm going to rerun through everything, but column standardize this time."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[9, 7, 7, 6],\n",
	" [5, 8, 5, 8],\n",
	" [7, 1, 8, 8],\n",
	" [4, 6, 0, 4],\n",
	" [7, 5, 7, 9],\n",
	" [2, 1, 7, 1]])"
	]
	},
	"execution_count": 2,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"counts = np.load('/home/jessime/Desktop/count_test.npy')\n",
	"counts"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 18,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 3.3333, 2.3333, 1.3333, 0. ],\n",
	" [-0.6667, 3.3333, -0.6667, 2. ],\n",
	" [ 1.3333, -3.6667, 2.3333, 2. ],\n",
	" [-1.6667, 1.3333, -5.6667, -2. ],\n",
	" [ 1.3333, 0.3333, 1.3333, 3. ],\n",
	" [-3.6667, -3.6667, 1.3333, -5. ]])"
	]
	},
	"execution_count": 18,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#Subtract the mean\n",
	"counts_col = counts - np.mean(counts, axis=0)\n",
	"np.around(counts_col, decimals=4)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 19,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 1.4586, 0.8489, 0.4961, 0. ],\n",
	" [-0.2917, 1.2127, -0.2481, 0.7223],\n",
	" [ 0.5835, -1.3339, 0.8682, 0.7223],\n",
	" [-0.7293, 0.4851, -2.1086, -0.7223],\n",
	" [ 0.5835, 0.1213, 0.4961, 1.0835],\n",
	" [-1.6045, -1.3339, 0.4961, -1.8058]])"
	]
	},
	"execution_count": 19,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#Divide out the standard deviation\n",
	"counts_col = counts_col / np.std(counts_col, axis=0)\n",
	"np.around(counts_col, decimals=4)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 20,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([[ 3.0944, 0.4808, 0.1495, -1.6982, 1.2002, -3.2266],\n",
	" [ 0.4808, 2.139 , -1.4815, 0.8023, 0.6364, -2.577 ],\n",
	" [ 0.1495, -1.4815, 3.3954, -3.4251, 1.392 , -0.0303],\n",
	" [-1.6982, 0.8023, -3.4251, 5.7351, -2.1955, 0.7813],\n",
	" [ 1.2002, 0.6364, 1.392 , -2.1955, 1.7752, -2.8083],\n",
	" [-3.2266, -2.577 , -0.0303, 0.7813, -2.8083, 7.8609]])"
	]
	},
	"execution_count": 20,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"#Take the dot product of the column normalized count matrix and its transpose.\n",
	"counts_col = np.dot(counts_col, counts_col.T)\n",
	"np.around(counts_col, decimals=4)"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 2",
	"language": "python",
	"name": "python2"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 2
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython2",
	"version": "2.7.10"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}