Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save DiegoHernanSalazar/f52ac744cf9d2bcdb977c56a90c59132 to your computer and use it in GitHub Desktop.
Save DiegoHernanSalazar/f52ac744cf9d2bcdb977c56a90c59132 to your computer and use it in GitHub Desktop.
Stanford Online/ DeepLearning.AI. Supervised Machine Learaning: Regression and Classification, Python, NumPy and Vectorization.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Optional Lab: Python, NumPy and Vectorization\n",
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
"\n",
"# Outline\n",
"- [  1.1 Goals](#toc_40015_1.1)\n",
"- [  1.2 Useful References](#toc_40015_1.2)\n",
"- [2 Python and NumPy <a name='Python and NumPy'></a>](#toc_40015_2)\n",
"- [3 Vectors](#toc_40015_3)\n",
"- [&nbsp;&nbsp;3.1 Abstract](#toc_40015_3.1)\n",
"- [&nbsp;&nbsp;3.2 NumPy Arrays](#toc_40015_3.2)\n",
"- [&nbsp;&nbsp;3.3 Vector Creation](#toc_40015_3.3)\n",
"- [&nbsp;&nbsp;3.4 Operations on Vectors](#toc_40015_3.4)\n",
"- [4 Matrices](#toc_40015_4)\n",
"- [&nbsp;&nbsp;4.1 Abstract](#toc_40015_4.1)\n",
"- [&nbsp;&nbsp;4.2 NumPy Arrays](#toc_40015_4.2)\n",
"- [&nbsp;&nbsp;4.3 Matrix Creation](#toc_40015_4.3)\n",
"- [&nbsp;&nbsp;4.4 Operations on Matrices](#toc_40015_4.4)\n"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np # Get ‘numpy’ library as ‘np’ constructor. It is an unofficial standard to use np for numpy.\n",
"import time # Module that allows to work with time in Python. Getting the current time, \n",
" # pausing the program from executing, etc."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_1.1\"></a>\n",
"## 1.1 Goals\n",
"In this lab, you will:\n",
"- Review the features of NumPy and Python that are used in Course 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_1.2\"></a>\n",
"## 1.2 Useful References\n",
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_2\"></a>\n",
"# 2 Python and NumPy <a name='Python and NumPy'></a>\n",
"Python is the programming language we will be using in this course. It has a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3\"></a>\n",
"# 3 Vectors\n",
"<a name=\"toc_40015_3.1\"></a>\n",
"## 3.1 Abstract\n",
"<img align=\"right\" src=\"./images/C1_W2_Lab04_Vectors.PNG\" style=\"width:340px;\" >Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector does not, for example, contain both characters and numbers. The number of elements in the array is often referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.2\"></a>\n",
"## 3.2 NumPy Arrays\n",
"\n",
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
"\n",
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.3\"></a>\n",
"## 3.3 Vector Creation\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"np.zeros(4) : a = [0. 0. 0. 0.], a shape = (4,), a data type = float64\n",
"np.zeros(4,) : a = [0. 0. 0. 0.], a shape = (4,), a data type = float64\n",
"np.random.random_sample(4): a = [0.04997798 0.77390955 0.93782363 0.5792328 ], a shape = (4,), a data type = float64\n"
]
}
],
"source": [
"# NumPy routines which allocate memory and fill arrays with value\n",
"\n",
"# Create a numpy 1D array with ALL zero ‘0.0’ values and 4 elements, using a single size/shape value (4) as input. \n",
"# The data type is float X.X\n",
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
"\n",
"# Create a numpy 1D array with ALL zero ‘0.0’ values and 4 elements, using a tuple with size/shape values (4,) as imput. \n",
"# The data type is float X.X \n",
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
"\n",
"# Create a numpy 1D array with (4) float random values, all between [0-1). Use a single size/shape value (4) as input \n",
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some data creation routines do not take a shape tuple:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"np.arange(4.): a = [0. 1. 2. 3.], a shape = (4,), a data type = float64\n",
"np.random.rand(4): a = [0.53516563 0.80204309 0.24814448 0.59096694], a shape = (4,), a data type = float64\n"
]
}
],
"source": [
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
"# 'np.arange(stop)': Return evenly spaced values within a given interval.\n",
"# Values are generated within the half-open interval [0, stop) being default start = 0\n",
"# (in other words, the interval includes start but excludes stop).\n",
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
"\n",
"# Create a 1D array of the given input shape (4 elements) and populate it with random float samples \n",
"# from a uniform distribution over [0, 1).\n",
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy array values can be specified manually as well. "
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"np.array([5,4,3,2]): a = [5 4 3 2], a shape = (4,), a data type = int64\n",
"np.array([5.,4,3,2]): a = [5. 4. 3. 2.], a shape = (4,), a data type = float64\n"
]
}
],
"source": [
"# NumPy routines which allocate memory and fill with user specified values\n",
"# Integer inputs/elements.\n",
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
"\n",
"# Float inputs/elements. Just one float element, becomes all float elements.\n",
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.4\"></a>\n",
"## 3.4 Operations on Vectors\n",
"Let's explore some operations using vectors.\n",
"<a name=\"toc_40015_3.4.1\"></a>\n",
"### 3.4.1 Indexing\n",
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 1 2 3 4 5 6 7 8 9]\n",
"a[2].shape: () a[2] = 2, Accessing an element returns a scalar\n",
"a[-1] = 9\n",
"The error message you'll see is:\n",
"index 10 is out of bounds for axis 0 with size 10\n"
]
}
],
"source": [
"# Vector indexing operation on 1D arrays/vectors -> array[index]\n",
"# Create a numpy array (start, stop) = (default=0, stop=10-1) -> [0,1,2,3,4,5,6,7,8,9] \n",
"a = np.arange(10) \n",
"print(a) # a = [0,1,2,3,4,5,6,7,8,9]\n",
"\n",
"#Access to 3rd element of array a -> a[index] = a[2]\n",
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
"\n",
"# Access the last element of an array, using negative (-) indexes count, from the end.\n",
"# a = [0 , 1 ,2, 3, 4, 5, 6, 7, 8, 9]\n",
"# index = [-10,-9,-8,-7,-6,-5,-4,-3,-2,-1] \n",
"print(f\"a[-1] = {a[-1]}\")\n",
"\n",
"# indexes must be within the range of the vector a -> index=[0,...,9] or they will produce and error.\n",
"# The error given from python is called \"Exception\", using try: except sentence.\n",
"try:\n",
" c = a[10] # index=[0,...,9] of vector a=[0,...,9] is out of range a[10]\n",
"except Exception as e: # Assign the Python \"Exception\"/error as object \"e\" when try: doesn't execute \n",
" print(\"The error message you'll see is:\")\n",
" print(e) # print \"Exception\"/error object \"e\" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.4.2\"></a>\n",
"### 3.4.2 Slicing\n",
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a = [0 1 2 3 4 5 6 7 8 9]\n",
"a[2:7:1] = [2 3 4 5 6]\n",
"a[2:7:2] = [2 4 6]\n",
"a[3:] = [3 4 5 6 7 8 9]\n",
"a[:3] = [0 1 2]\n",
"a[:] = [0 1 2 3 4 5 6 7 8 9]\n"
]
}
],
"source": [
"# Vector slicing operations -> a[start:stop-1:step]\n",
"# Create a numpy array (start, stop-1) = (default=0, stop=10-1) -> [0,1,2,3,4,5,6,7,8,9]\n",
"a = np.arange(10)\n",
"print(f\"a = {a}\")\n",
"\n",
"# Access 5 consecutive elements a[start:stop-1:step] = a[start=2,stop=7-1,step=1]\n",
"# a = [2,3,4,5,6]\n",
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
"\n",
"# Access 3 elements separated by step=2 a[start:stop-1:step] = a[start=2,stop=7-1,step=2]\n",
"# a = [2,4,6]\n",
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
"\n",
"# Access ALL elements, above index 3 until end -> a[start:] = a[3:]\n",
"c = a[3:]; print(\"a[3:] = \", c)\n",
"\n",
"# Access ALL elements below index 3, until selected index 3 -> a[:stop-1] = a[:3-1]\n",
"c = a[:3]; print(\"a[:3] = \", c)\n",
"\n",
"# Access all elements inside a -> a[ALL] = a[:]\n",
"c = a[:]; print(\"a[:] = \", c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.4.3\"></a>\n",
"### 3.4.3 Single vector operations\n",
"There are a number of useful operations that involve operations on a single vector."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a : [1 2 3 4]\n",
"b = -a : [-1 -2 -3 -4]\n",
"b = np.sum(a) : 10\n",
"b = np.mean(a): 2.5\n",
"b = a**2 : [ 1 4 9 16]\n"
]
}
],
"source": [
"# Create a Numpy 1D array/vector a = [1,2,3,4] introducing inputs manually.\n",
"a = np.array([1,2,3,4])\n",
"print(f\"a : {a}\")\n",
"\n",
"# Negate elements of a -> -a = [-1,-2,-3,-4]\n",
"b = -a \n",
"print(f\"b = -a : {b}\")\n",
"\n",
"# Sum ALL elements inside a, returns a scalar. -> .sum(a) = 1+2+3+4 = 10\n",
"b = np.sum(a) \n",
"print(f\"b = np.sum(a) : {b}\")\n",
"\n",
"# Get the average of ALL elements inside a -> .mean(a) = (1+2+3+4) / 4 = 10/4 = 2.5\n",
"b = np.mean(a)\n",
"print(f\"b = np.mean(a): {b}\")\n",
"\n",
"# Get the squared ^2 value for each element inside a -> a**2 = [1,4,9,16] \n",
"b = a**2\n",
"print(f\"b = a**2 : {b}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.4.4\"></a>\n",
"### 3.4.4 Vector Vector element-wise operations\n",
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
"$$ c_i = a_i + b_i $$"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Binary operators work element wise: [0 0 6 8]\n"
]
}
],
"source": [
"# Create a new Numpy 1D array, introducing inputs manually -> a = [1,2,3,4]\n",
"a = np.array([ 1, 2, 3, 4])\n",
"\n",
"# Create a new Numpy 1D array, introducing inputs manually -> b = [-1,-2,3,4]\n",
"b = np.array([-1,-2, 3, 4])\n",
"\n",
"# Perform element-wise ADD(+) operation a + b = [1,2,3,4] + [-1,-2,3,4] = [0,0,6,8]\n",
"print(f\"Binary operators work element wise: {a + b}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, for this to work correctly, the vectors must be of the same size:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The error message you'll see is:\n",
"operands could not be broadcast together with shapes (4,) (2,) \n"
]
}
],
"source": [
"# Try a mismatched vector operation. -> a + c = [1,2,3,4] + [1,2] gives an error/exception object e\n",
"# Create a new Numpy 1D array, introducing inputs manually -> c = [1,2] \n",
"c = np.array([1, 2])\n",
"\n",
"try:\n",
" d = a + c # d = [1,2,3,4] + [1,2] can't be executed, because they are different sizes. \n",
" \n",
"except Exception as e: # Assign the Python \"Exception\"/error as object \"e\" when try: doesn't execute\n",
" \n",
" print(\"The error message you'll see is:\")\n",
" print(e) # Display the Python error/Exception object \"e\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.4.5\"></a>\n",
"### 3.4.5 Scalar Vector operations\n",
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"b = 5 * a : [ 5 10 15 20]\n"
]
}
],
"source": [
"# Create a new Numpy 1D array, introducing inputs manually -> a = [1,2,3,4]\n",
"a = np.array([1, 2, 3, 4])\n",
"\n",
"# Multiply ALL the vector/array a elements, by a number/scalar\n",
"# 5*a = 5*[1,2,3,4] = [5,10,15,20]\n",
"b = 5 * a \n",
"print(f\"b = 5 * a : {b}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.4.6\"></a>\n",
"### 3.4.6 Vector Vector dot product\n",
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"./images/C1_W2_Lab04_dot_notrans.gif\" width=800> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
"Vector dot product requires the dimensions of the two vectors to be the same. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's implement our own version of the dot product below:\n",
"\n",
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
"Assume both `a` and `b` are the same shape."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"def my_dot(a, b): # Input vectors/arrays a and b\n",
" \n",
" \"\"\"\n",
" Compute the dot product of two vectors\n",
" \n",
" Input Arguments:\n",
" a 1D (ndarray (n,)): input vector \n",
" b 1D (ndarray (n,)): input vector with same dimension as a\n",
" \n",
" Outputs or returns:\n",
" x (scalar): \n",
" \"\"\"\n",
" x = 0 # Init x counter as '0' at the beginning, to be added \n",
" # with each dot product a.b result, after each for loop iteration.\n",
" \n",
" for i in range(a.shape[0]): # a.shape(0) = len(a), so 'range()' give us the same total \n",
" # iters/indexes i=[0,1,...,n-1] as elements exist inside vector a.\n",
" # Perform dot product a.b = a0*b0 + a1*b1 +...+ an-1*bn-1\n",
" x = x + a[i] * b[i] # new_x = prev_x + a[i]*b[i]\n",
" # (init as 0)\n",
" return x # return dot product between vectors a.b = x"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"my_dot(a, b) = 24\n"
]
}
],
"source": [
"# Test of dot product a.b between two 1D vectors a and b\n",
"# Create 1D vector a = [1,2,3,4]\n",
"a = np.array([1, 2, 3, 4])\n",
"\n",
"# Create 1D vector b = [-1,4,3,2]\n",
"b = np.array([-1, 4, 3, 2])\n",
"\n",
"# Execute dot product function a.b = my_dot(a,b)\n",
"# 0 init + (-1) + 8 + 9 + 8 = 24\n",
"print(f\"my_dot(a, b) = {my_dot(a, b)}\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note, the dot product is expected to return a scalar value. \n",
"\n",
"Let's try the same operations using `np.dot`. "
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NumPy 1D np.dot(a, b) = 24, np.dot(a, b).shape = () \n",
"NumPy 1D np.dot(b, a) = 24, np.dot(a, b).shape = () \n",
"a.b = b.a = 24\n"
]
}
],
"source": [
"# Test of 'np.dot()' product function, between two 1D vectors a and b\n",
"# Create 1D vector a = [1,2,3,4]\n",
"a = np.array([1, 2, 3, 4])\n",
"\n",
"# Create 1D vector b = [-1,4,3,2]\n",
"b = np.array([-1, 4, 3, 2])\n",
"\n",
"# Get dot product a.b = 24 using 'np.dot()' function \n",
"c = np.dot(a, b)\n",
"print(f\"NumPy 1D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
"\n",
"# Get dot product b.a = 24 using 'np.dot()' function\n",
"c = np.dot(b, a)\n",
"print(f\"NumPy 1D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n",
"\n",
"print(f\"a.b = b.a = 24\") # Dot product between vectors/arrays is commutative a.b = b.a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above, you will note that the results for 1-D matched our implementation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_3.4.7\"></a>\n",
"### 3.4.7 The Need for Speed: vector vs for loop\n",
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"np.dot(a, b) = 2501072.5817\n",
"Vectorized version duration (ms): 197.5088 ms \n",
"my_dot(a, b) = 2501072.5817\n",
"(non-vectorized) own loop version duration (ms): 9546.0217 ms \n"
]
}
],
"source": [
"# Sets the seed for generating random numbers. By specifying a seed value, the function ensures that \n",
"# the sequence of random numbers generated remains the same across multiple runs, \n",
"# providing deterministic behavior and allowing reproducibility in random number generation\n",
"np.random.seed(1)\n",
"\n",
"# Create two 1D arrays/vectors a and b, of a given input shape (10.000.000 elements) \n",
"# and populate it with random float samples, from a uniform distribution over [0, 1).\n",
"a = np.random.rand(10000000) # very large arrays\n",
"b = np.random.rand(10000000)\n",
"\n",
"# Get Elapsed time = end time (s) - start time (s)\n",
"tic = time.time() # Capture start time (s) and \n",
"c = np.dot(a, b) # execute vectorized 'np.dot(a,b)' function\n",
"toc = time.time() # After executing vectorized 'np.dot(a,b)' function, capture end time (s).\n",
"\n",
"# Display scalar result of dot product using 'a.b = np.dot(a,b)' function, with 4 decimals\n",
"print(f\"np.dot(a, b) = {c:.4f}\") \n",
"\n",
"# Compute Elapsed time (ms) = [end time (s) - start time (s)] * 1000 (ms), with 4 decimals\n",
"# 1(s) -> 1000(ms) \n",
"print(f\"Vectorized version duration (ms): {1000*(toc-tic):.4f} ms \")\n",
"\n",
"# Get Elapsed time = end time (s) - start time (s)\n",
"tic = time.time() # capture start time (s) and\n",
"c = my_dot(a,b) # execute our 'my_dot(a,b)' function that includes a for loop (non-vectorized)\n",
"toc = time.time() # After executing non-vectorized 'my_dot(a,b)' function, capture end time (s)\n",
"\n",
"# Display scalar result of dot product using 'a.b = my_dot(a,b)' function, with 4 decimals\n",
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
"\n",
"# Compute Elapsed time (ms) = [end time (s) - start time (s)] * 1000 (ms), with 4 decimals\n",
"# 1(s) -> 1000(ms)\n",
"print(f\"(non-vectorized) own loop version duration (ms): {1000*(toc-tic):.4f} ms \")\n",
"\n",
"del(a);del(b) # Use 'del(obj)' keyword to remove these big arrays/objects from the memory.\n",
" # In Python everything is an object, so the 'del()' keyword can also be used to delete \n",
" # variables, lists, or parts of a list etc..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_12345_3.4.8\"></a>\n",
"### 3.4.8 Vector Vector operations in Course 1\n",
"Vector Vector operations will appear frequently in course 1. Here is why:\n",
"- Going forward, our examples will be stored in an array, `X_train` of dimension (m,n). This will be explained more in context, but here it is important to note it is a 2 Dimensional array or matrix (see next section on matrices).\n",
"- `w` will be a 1-dimensional vector of shape (n,).\n",
"- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`\n",
"- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector. \n",
"\n",
"That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[1]\n",
" [2]\n",
" [3]\n",
" [4]]\n",
"X[1] = [2] has shape (1,)\n",
"w = [2] has shape (1,)\n",
"c = 4 has shape ()\n"
]
}
],
"source": [
"# Show common Course 1 example\n",
"X = np.array([[1],[2],[3],[4]]) # x_train array is a 2D Array with 4 arrays inside, each with (1) element inside\n",
" # Matrix [m=4 rows x n=1 col] \n",
" # [ [1], \n",
" # [2], \n",
" # [3], \n",
" # [4] ] \n",
"w = np.array([2]) # Weight parameter array, is a 1D Array with (1) element inside\n",
"c = np.dot(X[1], w) # Vectorized 'np.dot([1],[2])' product function is executed -> c = 2*2 = 4\n",
"\n",
"print(X) # Print 2D Array/Matrix [4x1]\n",
"print(f\"X[1] = {X[1]} has shape {X[1].shape}\") # Print X[1] = [2] nested 1D Array, as the element inside first 1D array\n",
"print(f\"w = {w} has shape {w.shape}\") # Print w = [2] parameter nested 1D Array, as the element inside first 1D array \n",
"print(f\"c = {c} has shape {c.shape}\") # Print scalar output from X[1].w = [2].[2]= 2*2 = 4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_4\"></a>\n",
"# 4 Matrices\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_4.1\"></a>\n",
"## 4.1 Abstract\n",
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
"<figure>\n",
" <center> <img src=\"./images/C1_W2_Lab04_Matrices.PNG\" alt='missing' width=900><center/>\n",
" <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>\n",
"<figure/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_4.2\"></a>\n",
"## 4.2 NumPy Arrays\n",
"\n",
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
"\n",
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
"- data creation\n",
"- slicing and indexing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_4.3\"></a>\n",
"## 4.3 Matrix Creation\n",
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a shape = (1, 5), a = [[0. 0. 0. 0. 0.]]\n",
"a shape = (2, 1), a = [[0.]\n",
" [0.]]\n",
"a shape = (1, 1), a = [[0.44236513]]\n"
]
}
],
"source": [
"# Create a 2D array/matrix of shape/size (1 row,5 cols) = [m x n] = [1 row x 5 cols] \n",
"# with ALL zeros '0.0' as float dtype -> a = [[0. 0. 0. 0. 0.]]\n",
"a = np.zeros((1, 5)) \n",
"print(f\"a shape = {a.shape}, a = {a}\") \n",
"\n",
"# Create a 2D array/matrix of shape/size (2 rows,1 col) = [m x n] = [2 rows x 1 col] \n",
"# with ALL zeros '0.0' as float dtype -> a = [[0.],\n",
"# [0.]]\n",
"a = np.zeros((2, 1)) \n",
"print(f\"a shape = {a.shape}, a = {a}\") \n",
"\n",
"# Create a 2D array/matrix of shape/size (1 row,1 col) = [m x n] = [1 row x 1 col] \n",
"# with a random value between [0, 1) as float dtype -> a = [[x.xxxxxxxx]]\n",
"a = np.random.random_sample((1, 1)) \n",
"print(f\"a shape = {a.shape}, a = {a}\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" a shape = (3, 1), np.array: a = [[5]\n",
" [4]\n",
" [3]]\n",
" a shape = (3, 1), np.array: a = [[5]\n",
" [4]\n",
" [3]]\n"
]
}
],
"source": [
"# NumPy routines which allocate memory and fill with user specified values\n",
"# Create 2D array/matrix (3,1) = [m x n] = [3 rows x 1 col] introducing values manually\n",
"# Then print shape (rows,cols) and our 2D array/matrix. \n",
"# [], comma indicates the end of a row and the beginning of another row\n",
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
"\n",
"# Create the same 2D array/matrix (3,1) = [m x n] = [3 rows x 1 col] introducing values manually\n",
"# Then print shape (rows,cols) and our 2D array/matrix\n",
"# [], comma indicates the end of a row and the beginning of another row\n",
"a = np.array([[5], # One can also\n",
" [4], # separate values\n",
" [3]]) # into separate rows\n",
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_4.4\"></a>\n",
"## 4.4 Operations on Matrices\n",
"Let's explore some operations using matrices."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_4.4.1\"></a>\n",
"### 4.4.1 Indexing\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a.shape: (3, 2), \n",
"a= [[0 1]\n",
" [2 3]\n",
" [4 5]]\n",
"\n",
"a[2,0].shape: (), a[2,0] value = 4, type(a[2,0]) = <class 'numpy.int64'> Accessing an element returns a scalar\n",
"\n",
"a[2].shape: (2,), a[2] entire row = [4 5], type(a[2]) = <class 'numpy.ndarray'>\n"
]
}
],
"source": [
"# Vector indexing operations on matrices\n",
"a = np.arange(6) # Create the 1D array a = [0,1,2,3,4,5] with 6 elements \n",
"a = a.reshape(-1, 2) # Then reshape 1D -> 2D array/matrix saying, use as unknown dimension, All the rows you need\n",
" # to fit the data in 2 cols. (reshape is a convenient way to create matrices)\n",
" # col0 col1\n",
" # a= [[0, 1], -> row0\n",
" # [2, 3], -> row1\n",
" # [4, 5]] -> row2 [3 rows x 2 cols] = 6 elements\n",
"\n",
"# Print a reshaped 2D array/matrix with (3 rows, 2 cols) = [3 rows x 2 cols] = 6 elements\n",
"print(f\"a.shape: {a.shape}, \\na= {a}\") \n",
"\n",
"# Access an element at [row=2, col=0] -> value = 4\n",
"# The type() function returns the type of the object, based on the arguments passed.\n",
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] value = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
"\n",
"# Access to the ENTIRE 3rd row -> a[2] = [4, 5]\n",
"# The type() function returns the type of the object, based on the arguments passed.\n",
"print(f\"a[2].shape: {a[2].shape}, a[2] entire row = {a[2]}, type(a[2]) = {type(a[2])}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Reshape** \n",
"The previous example used [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array. \n",
"`a = np.arange(6).reshape(-1, 2) ` \n",
"This line of code first created a *1-D Vector* of six elements. It then reshaped that vector into a *2-D* array using the reshape command. This could have been written: \n",
"`a = np.arange(6).reshape(3, 2) ` \n",
"To arrive at the same 3 row, 2 column array.\n",
"The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_4.4.2\"></a>\n",
"### 4.4.2 Slicing\n",
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a = \n",
"[[ 0 1 2 3 4 5 6 7 8 9]\n",
" [10 11 12 13 14 15 16 17 18 19]]\n",
"\n",
"a[0, 2:7:1] = [2 3 4 5 6] , a[0, 2:7:1].shape = (5,) a 1-D array\n",
"\n",
"a[:, 2:7:1] = \n",
" [[ 2 3 4 5 6]\n",
" [12 13 14 15 16]] , a[:, 2:7:1].shape = (2, 5) a 2-D array\n",
"\n",
"a[:,:] = \n",
" [[ 0 1 2 3 4 5 6 7 8 9]\n",
" [10 11 12 13 14 15 16 17 18 19]] , a[:,:].shape = (2, 10) a 2-D array\n",
"\n",
"a[1,:] = [10 11 12 13 14 15 16 17 18 19] , a[1,:].shape = (10,) a 1-D array\n",
"\n",
"a[1] = [10 11 12 13 14 15 16 17 18 19] , a[1].shape = (10,) a 1-D array\n"
]
}
],
"source": [
"# Vector 2-D slicing operations\n",
"a = np.arange(20) # Create the 1D array a = [0,1,2,...,18,19] with 20 elements\n",
"a = a.reshape(-1, 10) # Then reshape 1D -> 2D array/matrix saying, use as unknown dimension, All the rows you need\n",
" # to fit the data in 10 cols. (reshape is a convenient way to create matrices)\n",
" # col0 col1 col2 col3 col4 col5 col6 col7 col8 col9\n",
" # a= [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], -> row0\n",
" # [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]] -> row1 [2 rows x 10 cols] = 20 elements\n",
"\n",
"# Print a reshaped 2D array/matrix with (2 rows, 10 cols) = [2 rows x 10 cols] = 20 elements\n",
"print(f\"a = \\n{a}\")\n",
"\n",
"# Access 5 consecutive elements (start:stop:step) in one row\n",
"# At 2D array a[at row = 0, go from col = 2: to col = 7: with step = 1] \n",
"# select 5 cols at row 0 (1D array subset)\n",
"print(\"\\na[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
"\n",
"# Access 5 consecutive elements (start:stop:step) in ALL rows\n",
"# At 2D array a[at each pf All rows = :, go from col = 2: to col = 7: with step = 1] \n",
"# select 5 cols at ALL (:) rows (1D array subset) \n",
"print(\"\\na[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
"\n",
"# Access All rows (:) and ALL (:) cols at 2D array/matrix \n",
"# At a[at each of ALL rows = :, select ALL cols = :]\n",
"print(\"\\na[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape, \"a 2-D array\")\n",
"\n",
"# Access All cols/elements, in one row (very common usage)\n",
"# At a[at row = 1, select ALL cols = :]\n",
"print(\"\\na[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
"\n",
"# The same happens, when just select a row (pick ALL cols/elements)\n",
"print(\"\\na[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_40015_5.0\"></a>\n",
"## Congratulations!\n",
"In this lab you mastered the features of Python and NumPy that are needed for Course 1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"dl_toc_settings": {
"rndtag": "40015"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"toc-autonumbering": false
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment