Skip to content

Instantly share code, notes, and snippets.

@rmitsch
Created November 25, 2017 20:43
Show Gist options
  • Save rmitsch/092122035648392b14a5a71c1eec2502 to your computer and use it in GitHub Desktop.
Save rmitsch/092122035648392b14a5a71c1eec2502 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 5-2: PCA"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[1, 0, 1, 0],\n",
" [0, 0, 0, 0],\n",
" [3, 3, 1, 1]])"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"x = np.array( [ (1,0,3), (0,0,3), (1,0,1), (0,0,1) ] ).T\n",
"x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Did you find any problem about this dataset? How to solve it?*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Calculate covariance matrix, eigenvalues and eigenvector."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"y = [ 0. 0. 0. 0.]\n"
]
}
],
"source": [
"eigenvalues, normalized_eigenvectors = np.linalg.eigh(np.cov(x, rowvar=True))\n",
"W = normalized_eigenvectors[np.argmin(eigenvalues)]\n",
"y = W.T.dot(x)\n",
"print(\"y = \", y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Problem**: Transforming back into the original space yields useless result of 0s, because dimension #2 has no variance at all. Therefore using this dimension as principal component yields a useless subspace.\n",
"\n",
"**Solution**: Prune dimensions not containing any variance before applying PCA."
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"x = x[np.var(x, axis=1) > 0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Apply PCA again."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"y = [ 1. 0. 1. 0.]\n"
]
}
],
"source": [
"eigenvalues, normalized_eigenvectors = np.linalg.eigh(np.cov(x, rowvar=True))\n",
"W = normalized_eigenvectors[np.argmin(eigenvalues)]\n",
"y = W.T.dot(x)\n",
"print(\"y = \", y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please note that this problem wouldn't have occured if we took the largest eigenvector (as is the correct procedure to my understanding),so I'm not quite sure about what this actually tells about the behaviour of eigenvector selection/calculation in the context of PCA."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 @ /development/datamining",
"language": "python",
"name": "datamining"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment