Skip to content

Instantly share code, notes, and snippets.

@hrit-ikkumar
Created March 25, 2019 10:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hrit-ikkumar/26daa175dfefafe1ef6a4530436ff7f9 to your computer and use it in GitHub Desktop.
Save hrit-ikkumar/26daa175dfefafe1ef6a4530436ff7f9 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Mean/Covariance of a data set and effect of a linear transformation\n",
"\n",
"We are going to investigate how the mean and (co)variance of a dataset changes\n",
"when we apply affine transformation to the dataset."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning objectives\n",
"1. Get Farmiliar with basic programming using Python and Numpy/Scipy.\n",
"2. Learn to appreciate implementing\n",
" functions to compute statistics of dataset in vectorized way.\n",
"3. Understand the effects of affine transformations on a dataset.\n",
"4. Understand the importance of testing in programming for machine learning."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's import the packages that we will use for the week"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# PACKAGE: DO NOT EDIT THIS CELL\n",
"import numpy as np\n",
"import matplotlib\n",
"matplotlib.use('Agg')\n",
"import matplotlib.pyplot as plt\n",
"matplotlib.style.use('fivethirtyeight')\n",
"from sklearn.datasets import fetch_lfw_people, fetch_olivetti_faces\n",
"import time\n",
"import timeit"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from ipywidgets import interact"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we are going to retrieve Olivetti faces dataset.\n",
"\n",
"When working with some datasets, before digging into further analysis, it is almost always\n",
"useful to do a few things to understand your dataset. First of all, answer the following\n",
"set of questions:\n",
"\n",
"1. What is the size of your dataset?\n",
"2. What is the dimensionality of your data?\n",
"\n",
"The dataset we have are usually stored as 2D matrices, then it would be really important\n",
"to know which dimension represents the dimension of the dataset, and which represents\n",
"the data points in the dataset. \n",
"\n",
"__When you implement the functions for your assignment, make sure you read\n",
"the docstring for what each dimension of your inputs represents the data points, and which \n",
"represents the dimensions of the dataset!__. For this assignment, our data is organized as\n",
"__(D,N)__, where D is the dimensionality of the samples and N is the number of samples."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Shape of the faces dataset: (4096, 400)\n",
"400 data points\n"
]
}
],
"source": [
"image_shape = (64, 64)\n",
"# Load faces data\n",
"dataset = fetch_olivetti_faces('./')\n",
"faces = dataset.data.T\n",
"\n",
"print('Shape of the faces dataset: {}'.format(faces.shape))\n",
"print('{} data points'.format(faces.shape[1]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When your dataset are images, it's a really good idea to see what they look like.\n",
"\n",
"One very\n",
"convenient tool in Jupyter is the `interact` widget, which we use to visualize the images (faces). For more information on how to use interact, have a look at the documentation [here](http://ipywidgets.readthedocs.io/en/stable/examples/Using%20Interact.html).\n",
"\n",
"We have created two function which help you visuzlie the faces dataset. You do not need to modify them."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def show_face(face):\n",
" plt.figure()\n",
" plt.imshow(face.reshape((64, 64)), cmap='gray')\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "05ffd545d42e47edaa1c162d4cfb6123",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(IntSlider(value=0, description='n', max=399), Output()), _dom_classes=('widget-interact'…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"@interact(n=(0, faces.shape[1]-1))\n",
"def display_faces(n=0):\n",
" plt.figure()\n",
" plt.imshow(faces[:,n].reshape((64, 64)), cmap='gray')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Mean and Covariance of a Dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this week, you will need to implement functions in the cell below which compute the mean and covariance of a dataset.\n",
"\n",
"You will implement both mean and covariance in two different ways. First, we will implement them using Python's for loops to iterate over the entire dataset. Later, you will learn to take advantage of Numpy and use its library routines. In the end, we will compare the speed differences between the different approaches."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"# ===YOU SHOULD EDIT THIS FUNCTION===\n",
"def mean_naive(X):\n",
" \"\"\"Compute the mean for a dataset by iterating over the dataset\n",
" \n",
" Arguments\n",
" ---------\n",
" X: (N, D) ndarray representing the dataset.\n",
" \n",
" Returns\n",
" -------\n",
" mean: (D, ) ndarray which is the mean of the dataset.\n",
" \"\"\"\n",
" N, D = X.shape\n",
" \n",
" mean = np.zeros(D)\n",
" for n in range(N):\n",
" for i in range(D):\n",
" mean[i] = mean[i] + X[n, i]\n",
" \n",
" mean = mean/N\n",
" \n",
" return mean\n",
"\n",
"# ===YOU SHOULD EDIT THIS FUNCTION===\n",
"def cov_naive(X):\n",
" \"\"\"Compute the covariance for a dataset\n",
" Arguments\n",
" ---------\n",
" X: (N, D) ndarray representing the dataset.\n",
" \n",
" Returns\n",
" -------\n",
" covariance: (D, D) ndarray which is the covariance matrix of the dataset.\n",
" \n",
" \"\"\"\n",
" N, D = X.shape\n",
" covariance = np.zeros((D, D))\n",
" mat = np.zeros((N, D))\n",
" mean = mean_naive(X)\n",
" for i in range(N):\n",
" mat[i] = X[i,:] - mean\n",
" for i in range(D):\n",
" for j in range(D):\n",
" covariance[i, j] = covariance[i, j] + mat[:,i]@mat[:,j]\n",
" return covariance/N\n",
"# GRADED FUNCTION: DO NOT EDIT THIS LINE\n",
"\n",
"# ===YOU SHOULD EDIT THIS FUNCTION===\n",
"def mean(X):\n",
" \"\"\"Compute the mean for a dataset\n",
" \n",
" Arguments\n",
" ---------\n",
" X: (N, D) ndarray representing the dataset.\n",
" \n",
" Returns\n",
" -------\n",
" mean: (D, ) ndarray which is the mean of the dataset.\n",
" \"\"\"\n",
" mean = np.mean(X, axis = 0) # EDIT THIS\n",
" return mean\n",
" \n",
"# ===YOU SHOULD EDIT THIS FUNCTION===\n",
"def cov(X):\n",
" \"\"\"Compute the covariance for a dataset\n",
" Arguments\n",
" ---------\n",
" X: (N, D) ndarray representing the dataset.\n",
" \n",
" Returns\n",
" -------\n",
" covariance_matrix: (D, D) ndarray which is the covariance matrix of the dataset.\n",
" \n",
" \"\"\"\n",
" # It is possible to vectorize our code for computing the covariance, i.e. we do not need to explicitly\n",
" # iterate over the entire dataset as looping in Python tends to be slow\n",
" N, D = X.shape\n",
" covariance_matrix = np.cov(X, rowvar=False, bias=True) # EDIT THIS\n",
" return covariance_matrix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's see whether our implementations are consistent"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X:\n",
" [[0 1 2]\n",
" [3 4 5]]\n",
"Expected mean:\n",
" [[ 1.]\n",
" [ 4.]]\n",
"Expected covariance:\n",
" [[ 0.66666667 0.66666667]\n",
" [ 0.66666667 0.66666667]]\n"
]
},
{
"ename": "AssertionError",
"evalue": "\nArrays are not almost equal to 7 decimals\n\n(shapes (3,), (2, 1) mismatch)\n x: array([ 1.5, 2.5, 3.5])\n y: array([[ 1.],\n [ 4.]])",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-43-b07e0fb49c61>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Expected covariance:\\n'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexpected_test_cov\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtesting\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0massert_almost_equal\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmean\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX_test\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexpected_test_mean\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 11\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtesting\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0massert_almost_equal\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmean_naive\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX_test\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexpected_test_mean\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/conda/lib/python3.6/site-packages/numpy/testing/utils.py\u001b[0m in \u001b[0;36massert_almost_equal\u001b[0;34m(actual, desired, decimal, err_msg, verbose)\u001b[0m\n\u001b[1;32m 561\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mactual\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m\\\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 562\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdesired\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mndarray\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 563\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0massert_array_almost_equal\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mactual\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdesired\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdecimal\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0merr_msg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 564\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 565\u001b[0m \u001b[0;31m# If one of desired/actual is not finite, handle it specially here:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/conda/lib/python3.6/site-packages/numpy/testing/utils.py\u001b[0m in \u001b[0;36massert_array_almost_equal\u001b[0;34m(x, y, decimal, err_msg, verbose)\u001b[0m\n\u001b[1;32m 960\u001b[0m assert_array_compare(compare, x, y, err_msg=err_msg, verbose=verbose,\n\u001b[1;32m 961\u001b[0m \u001b[0mheader\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Arrays are not almost equal to %d decimals'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mdecimal\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 962\u001b[0;31m precision=decimal)\n\u001b[0m\u001b[1;32m 963\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 964\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/conda/lib/python3.6/site-packages/numpy/testing/utils.py\u001b[0m in \u001b[0;36massert_array_compare\u001b[0;34m(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)\u001b[0m\n\u001b[1;32m 713\u001b[0m \u001b[0mverbose\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mverbose\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mheader\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mheader\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 714\u001b[0m names=('x', 'y'), precision=precision)\n\u001b[0;32m--> 715\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mAssertionError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 716\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 717\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misnumber\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0misnumber\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mAssertionError\u001b[0m: \nArrays are not almost equal to 7 decimals\n\n(shapes (3,), (2, 1) mismatch)\n x: array([ 1.5, 2.5, 3.5])\n y: array([[ 1.],\n [ 4.]])"
]
}
],
"source": [
"# Let's first test the functions on some hand-crafted dataset.\n",
"\n",
"X_test = np.arange(6).reshape(2,3)\n",
"expected_test_mean = np.array([1., 4.]).reshape(-1, 1)\n",
"expected_test_cov = np.array([[2/3., 2/3.], [2/3.,2/3.]])\n",
"print('X:\\n', X_test)\n",
"print('Expected mean:\\n', expected_test_mean)\n",
"print('Expected covariance:\\n', expected_test_cov)\n",
"\n",
"np.testing.assert_almost_equal(mean(X_test), expected_test_mean)\n",
" \n",
"np.testing.assert_almost_equal(mean_naive(X_test), expected_test_mean)\n",
"\n",
"np.testing.assert_almost_equal(cov(X_test), expected_test_cov)\n",
"\n",
"np.testing.assert_almost_equal(cov_naive(X_test), expected_test_cov)\n",
"print(mean_naive(X_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now test that both implementation should give identical results running on the faces dataset."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"np.testing.assert_almost_equal(mean(faces), mean_naive(faces), decimal=6)\n",
"np.testing.assert_almost_equal(cov(faces), cov_naive(faces))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With the `mean` function implemented, let's take a look at the _mean_ face of our dataset!"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def mean_face(faces):\n",
" return faces.mean(axis=1).reshape((64, 64))\n",
"\n",
"plt.imshow(mean_face(faces), cmap='gray');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Loops in Python are slow, and most of the time you want to utilise the fast native code provided by Numpy without explicitly using\n",
"for loops. To put things into perspective, we can benchmark the two different implementation with the `%time` function\n",
"in the following way:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 3.61 ms, sys: 4.09 ms, total: 7.71 ms\n",
"Wall time: 67.5 ms\n",
"CPU times: user 1.02 ms, sys: 41 µs, total: 1.06 ms\n",
"Wall time: 779 µs\n"
]
}
],
"source": [
"# We have some HUUUGE data matrix which we want to compute its mean\n",
"X = np.random.randn(20, 1000)\n",
"# Benchmarking time for computing mean\n",
"%time mean_naive(X)\n",
"%time mean(X)\n",
"pass"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 2.39 s, sys: 0 ns, total: 2.39 s\n",
"Wall time: 4.81 s\n",
"CPU times: user 34.9 ms, sys: 39.3 ms, total: 74.2 ms\n",
"Wall time: 184 ms\n"
]
}
],
"source": [
"# Benchmarking time for computing covariance\n",
"%time cov_naive(X)\n",
"%time cov(X)\n",
"pass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, using Numpy's functions makes the code much faster! Therefore, whenever you can use something that's implemented in Numpy, be sure that you take advantage of that."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Affine Transformation of Datasets\n",
"In this week we are also going to verify a few properties about the mean and\n",
"covariance of affine transformation of random variables.\n",
"\n",
"Consider a data matrix $\\boldsymbol X$ of size $(D, N)$. We would like to know\n",
"what is the covariance when we apply affine transformation $\\boldsymbol A\\boldsymbol x_i + \\boldsymbol b$ for each datapoint $\\boldsymbol x_i$ in $\\boldsymbol X$, i.e.,\n",
"we would like to know what happens to the mean and covariance for the new dataset if we apply affine transformation.\n",
"\n",
"For this assignment, you will need to implement the `affine_mean` and `affine_covariance` in the cell below."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"# GRADED FUNCTION: DO NOT EDIT THIS LINE\n",
"def affine_mean(mean, A, b):\n",
" \"\"\"Compute the mean after affine transformation\n",
" Args:\n",
" x: ndarray, the mean vector\n",
" A, b: affine transformation applied to x\n",
" Returns:\n",
" mean vector after affine transformation\n",
" \"\"\"\n",
" ### Edit the code below to compute the mean vector after affine transformation\n",
" affine_m = np.zeros(mean.shape) # affine_m has shape (D, 1)\n",
" ### Update affine_m\n",
" \n",
" ###\n",
" return affine_m\n",
"\n",
"def affine_covariance(S, A, b):\n",
" \"\"\"Compute the covariance matrix after affine transformation\n",
" Args:\n",
" S: ndarray, the covariance matrix\n",
" A, b: affine transformation applied to each element in X \n",
" Returns:\n",
" covariance matrix after the transformation\n",
" \"\"\"\n",
" ### EDIT the code below to compute the covariance matrix after affine transformation\n",
" affine_cov = np.zeros(S.shape) # affine_cov has shape (D, D)\n",
" ### Update affine_cov\n",
" \n",
" ###\n",
" return affine_cov"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the two functions above are implemented, we can verify the correctness our implementation. Assuming that we have some $\\boldsymbol A$ and $\\boldsymbol b$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"random = np.random.RandomState(42)\n",
"A = random.randn(4,4)\n",
"b = random.randn(4,1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next we can generate some random matrix $\\boldsymbol X$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = random.randn(4,100) # D = 4, N = 100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assuming that for some dataset $\\boldsymbol X$, the mean and covariance are $\\boldsymbol m$, $\\boldsymbol S$, and for the new dataset after affine transformation $\\boldsymbol X'$, the mean and covariance are $\\boldsymbol m'$ and $\\boldsymbol S'$, then we would have the following identity:\n",
"\n",
"$$\\boldsymbol m' = \\text{affine_mean}(\\boldsymbol m, \\boldsymbol A, \\boldsymbol b)$$\n",
"\n",
"$$\\boldsymbol S' = \\text{affine_covariance}(\\boldsymbol S, \\boldsymbol A, \\boldsymbol b)$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X1 = (A @ X) + b # applying affine transformation to each sample in X\n",
"X2 = (A @ X1) + b # twice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One very useful way to compare whether arrays are equal/similar is use the helper functions\n",
"in `numpy.testing`.\n",
"\n",
"Check the Numpy [documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.testing.html)\n",
"for details. The mostly used function is `np.testing.assert_almost_equal`, which raises AssertionError if the two arrays are not almost equal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.testing.assert_almost_equal(mean(X1), affine_mean(mean(X), A, b))\n",
"np.testing.assert_almost_equal(cov(X1), affine_covariance(cov(X), A, b))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.testing.assert_almost_equal(mean(X2), affine_mean(mean(X1), A, b))\n",
"np.testing.assert_almost_equal(cov(X2), affine_covariance(cov(X1), A, b))"
]
}
],
"metadata": {
"coursera": {
"course_slug": "mathematics-machine-learning-pca",
"graded_item_id": "YoDq1",
"launcher_item_id": "vCPZ0"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment