Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save DiegoHernanSalazar/941ce4922278e8ac7f522e24e4cad5b9 to your computer and use it in GitHub Desktop.
Save DiegoHernanSalazar/941ce4922278e8ac7f522e24e4cad5b9 to your computer and use it in GitHub Desktop.
Stanford Online/ DeepLearning.AI. Supervised Machine Learning: Regression and Classification, Multiple Variable Linear Regression.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Optional Lab: Multiple Variable Linear Regression\n",
"\n",
"In this lab, you will extend the data structures and previously developed routines to support multiple features. Several routines are updated making the lab appear lengthy, but it makes minor adjustments to previous routines making it quick to review.\n",
"# Outline\n",
"- [  1.1 Goals](#toc_15456_1.1)\n",
"- [  1.2 Tools](#toc_15456_1.2)\n",
"- [  1.3 Notation](#toc_15456_1.3)\n",
"- [2 Problem Statement](#toc_15456_2)\n",
"- [  2.1 Matrix X containing our examples](#toc_15456_2.1)\n",
"- [  2.2 Parameter vector w, b](#toc_15456_2.2)\n",
"- [3 Model Prediction With Multiple Variables](#toc_15456_3)\n",
"- [  3.1 Single Prediction element by element](#toc_15456_3.1)\n",
"- [  3.2 Single Prediction, vector](#toc_15456_3.2)\n",
"- [4 Compute Cost With Multiple Variables](#toc_15456_4)\n",
"- [5 Gradient Descent With Multiple Variables](#toc_15456_5)\n",
"- [  5.1 Compute Gradient with Multiple Variables](#toc_15456_5.1)\n",
"- [  5.2 Gradient Descent With Multiple Variables](#toc_15456_5.2)\n",
"- [6 Congratulations](#toc_15456_6)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_1.1\"></a>\n",
"## 1.1 Goals\n",
"- Extend our regression model routines to support multiple features\n",
" - Extend data structures to support multiple features\n",
" - Rewrite prediction, cost and gradient routines to support multiple features\n",
" - Utilize NumPy `np.dot` to vectorize their implementations for speed and simplicity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_1.2\"></a>\n",
"## 1.2 Tools\n",
"In this lab, we will make use of: \n",
"- NumPy, a popular library for scientific computing\n",
"- Matplotlib, a popular library for plotting data"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import copy, math # Python 'math' module allows you to perform mathematical tasks on numbers.\n",
" # Use the 'copy' module in Python, to create “real copies” or “clones” of the objects.\n",
" \n",
"import numpy as np # Get numpy library as 'np' constructor\n",
"import matplotlib.pyplot as plt # Get matplotlib, pyplot library as 'plt' constructor\n",
"\n",
"plt.style.use('./deeplearning.mplstyle') # Use or import 'available style sheets', using its 'name',\n",
" # on a common set of example plots: scatter plot, image, \n",
" # bar graph, patches, line plot and histogram.\n",
" \n",
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays\n",
" # These options determine the way floating point numbers, arrays \n",
" # and other NumPy objects are displayed. (Precision = integer or None \n",
" # defining the number of digits of precision for floating point output. \n",
" # default 8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_1.3\"></a>\n",
"## 1.3 Notation\n",
"Here is a summary of some of the notation you will encounter, updated for multiple features. \n",
"\n",
"|General <img width=70/> <br /> Notation <img width=70/> | Description<img width=350/>| Python (if applicable) |\n",
"|: ------------|: ------------------------------------------------------------||\n",
"| $a$ | scalar, non bold ||\n",
"| $\\mathbf{a}$ | vector, bold ||\n",
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
"| **Regression** | | | |\n",
"| $\\mathbf{X}$ | training example matrix | `X_train` | \n",
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
"| m | number of training examples | `m`|\n",
"| n | number of features in each example | `n`|\n",
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
"| $b$ | parameter: bias | `b` | \n",
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x^{(i)}}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_2\"></a>\n",
"# 2 Problem Statement\n",
"\n",
"You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below. Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!\n",
"\n",
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
"| 2104 | 5 | 1 | 45 | 460 | \n",
"| 1416 | 3 | 2 | 40 | 232 | \n",
"| 852 | 2 | 1 | 35 | 178 | \n",
"\n",
"You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
"\n",
"Please run the following code cell to create your `X_train` and `y_train` variables."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X_train =\n",
" Xj=0 Xj=1 Xj=2 Xj=3\n",
" [[2104 5 1 45]\n",
" [1416 3 2 40]\n",
" [ 852 2 1 35]]\n",
"\n",
"y_train =\n",
"yi=0 yi=1 yi=2\n",
" [460 232 178]\n"
]
}
],
"source": [
"# X_train = [X0=size (sqft), X1=Number of bedrooms, X2=Number of floors, X3=Age of Home]\n",
"# col0 col1 col2 col3\n",
"# X0 X1 X2 X3\n",
"X_train = np.array([[2104, 5, 1, 45], # row i = 0\n",
" [1416, 3, 2, 40], # row i = 1\n",
" [852, 2, 1, 35]]) # row i = 2\n",
"\n",
"print(f\"X_train =\\n\"+\" Xj=0 Xj=1 Xj=2 Xj=3\\n\",X_train) # Print X_train = [X0, X1, X2, X3] numpy 2D array\n",
"\n",
"# y_train = [y=Price (1000s dollars)]\n",
"# col0\n",
"# y\n",
"y_train = np.array([460, # row i = 0 \n",
" 232, # row i = 1\n",
" 178]) # row i = 2\n",
"\n",
"print(f\"\\ny_train =\\n\"+\"yi=0 yi=1 yi=2\\n\",y_train) # Print y_train = [y] numpy 1D array"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_2.1\"></a>\n",
"## 2.1 Matrix X containing our training set examples\n",
"Similar to the table above, examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).\n",
"\n",
"\n",
"$$\\mathbf{X} = \n",
"\\begin{pmatrix}\n",
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n-1} \\\\ \n",
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n-1} \\\\\n",
" \\cdots \\\\\n",
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n-1} \n",
"\\end{pmatrix}\n",
"$$\n",
"notation:\n",
"- $\\mathbf{x}^{(i)}$ is vector containing example i. $\\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \\cdots,x^{(i)}_{n-1})$\n",
"- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element. \n",
"\n",
"Display the input data."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X Shape: (3, 4), X Type:<class 'numpy.ndarray'>\n",
"[[2104 5 1 45]\n",
" [1416 3 2 40]\n",
" [ 852 2 1 35]]\n",
"\n",
"y Shape: (3,), y Type:<class 'numpy.ndarray'>\n",
"[460 232 178]\n"
]
}
],
"source": [
"# X_train data was stored as 2D numpy array/matrix of size \n",
"# (m = 3 examples/rows, n = 4 features/cols) = [m = 3 x n = 4] \n",
"print(f\"X Shape: {X_train.shape}, X Type:{type(X_train)}\")\n",
"print(X_train) # Print 'X_train' 2D array/matrix \n",
"\n",
"# y_train data was stored as 1D numpy array/vector of size [1 row x 3 examples/elements]\n",
"print(f\"\\ny Shape: {y_train.shape}, y Type:{type(y_train)}\")\n",
"print(y_train) # Print 'y_train' 1D array/vector"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_2.2\"></a>\n",
"## 2.2 Parameter vector w, b\n",
"\n",
"* $\\mathbf{w}$ is a vector with $n$ elements.\n",
" - Each element contains the parameter associated with one feature.\n",
" - in our dataset, n is 4.\n",
" - notionally, we draw this as a column vector\n",
"\n",
"$$\\mathbf{w} = \\begin{pmatrix}\n",
"w_0 \\\\ \n",
"w_1 \\\\\n",
"\\cdots\\\\\n",
"w_{n-1}\n",
"\\end{pmatrix}\n",
"$$\n",
"* $b$ is a scalar parameter. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For demonstration, $\\mathbf{w}$ and $b$ will be loaded with some initial selected values that are near the optimal. $\\mathbf{w}$ is a 1-D NumPy vector."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"w_init shape: (4,), b_init type: <class 'float'>\n"
]
}
],
"source": [
"# Initialize b parameter as a scalar near the minimum\n",
"b_init = 785.1811367994083\n",
"\n",
"# Initialize w vector parameter with n=4 values, (each related with every input feature) \n",
"# near the minimum. - > w = [w0, w1, w2, w3]\n",
"w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])\n",
"\n",
"print(f\"w_init shape: {w_init.shape}, b_init type: {type(b_init)}\") # Print w size and b data type"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_3\"></a>\n",
"# 3 Model Prediction With Multiple Variables\n",
"The model's prediction with multiple variables is given by the linear model:\n",
"\n",
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \\tag{1}$$\n",
"or in vector notation:\n",
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = \\mathbf{w} \\cdot \\mathbf{x} + b \\tag{2} $$ \n",
"where $\\cdot$ is a vector `dot product`\n",
"\n",
"To demonstrate the dot product, we will implement prediction using (1) and (2)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_3.1\"></a>\n",
"## 3.1 Single Prediction element by element\n",
"Our previous prediction multiplied one feature value by one parameter and added a bias parameter. A direct extension of our previous implementation of prediction to multiple features would be to implement (1) above using loop over each element, performing the multiply with its parameter and then adding the bias parameter at the end.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def predict_single_loop(x, w, b): # Input selected 1D vector Xj^(î) = [x0, x1, x2, x3] from X_train matrix/2D array\n",
" # Input wj = [w0, w1, w2, w3] 1D array of paramters (one per each xj)\n",
" # Input fixed b scalar parameter\n",
" \"\"\"\n",
" single predict using linear regression\n",
" \n",
" Input arguments:\n",
" x 1D (ndarray): Shape (n,) example with multiple features\n",
" w 1D (ndarray): Shape (n,) model parameters \n",
" b num (scalar): model parameter \n",
" \n",
" Output returned:\n",
" p num (scalar): y^ = fw_b(x) prediction\n",
" \"\"\"\n",
" n = x.shape[0] # Extract the size/len of rows/vectors/training examples \n",
" # x.shape=(rows,cols) -> x.shape[0]=(rows) \n",
" p = 0 # Init prediction counter as '0'\n",
" \n",
" for i in range(n): # n = rows/examples -> range(n) = i = 0, 1,..., n-1\n",
" p_i = x[i] * w[i] # updated_pred = wj * xj\n",
" p = p + p_i # new_pred = previous_pred + updated_pred\n",
" # (init as 0)\n",
" p = p + b # Overwrite new_pred = new_pred + b, out of for loop \n",
" return p # return new_pred = p + x[i]*w[i] + b"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x_vec shape (4,), x_vec value: [2104 5 1 45]\n",
"f_wb shape (), prediction: 459.9999976194083\n"
]
}
],
"source": [
"# Select the first row/vector, from our training data -> X_train[row = 0, All cols/elements = :]\n",
"# x_vec = [2104, 5, 1, 45]\n",
"x_vec = X_train[0,:]\n",
"\n",
"# Print selected vector size/shape, and the vector/1D array elements\n",
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\") \n",
"\n",
"# Call the previous function 'predict_single_loop()' and make a prediction y^=f_wb\n",
"f_wb = predict_single_loop(x_vec, w_init, b_init)\n",
"\n",
"# Print scalar prediction y^=f_wb size/shape, and its value\n",
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note the shape of `x_vec`. It is a 1-D NumPy vector with 4 elements, (4,). The result, `f_wb` is a scalar."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_3.2\"></a>\n",
"## 3.2 Single Prediction, vector\n",
"\n",
"Noting that equation (1) above can be implemented using the dot product as in (2) above. We can make use of vector operations to speed up predictions.\n",
"\n",
"Recall from the Python/Numpy lab that NumPy `np.dot()`[[link](https://numpy.org/doc/stable/reference/generated/numpy.dot.html)] can be used to perform a vector dot product. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def predict(x, w, b): # Input selected 1D vector Xj^(î) = [x0, x1, x2, x3] from X_train matrix/2D array\n",
" # Input wj = [w0, w1, w2, w3] 1D array of paramters (one per each xj)\n",
" # Input fixed b scalar parameter\n",
" \"\"\"\n",
" single predict using linear regression\n",
" \n",
" Input arguments:\n",
" x 1D (ndarray): Shape (n,) example with multiple features\n",
" w 1D (ndarray): Shape (n,) model parameters \n",
" b num (scalar): model parameter \n",
" \n",
" Output returned:\n",
" p num (scalar): y^ = fw_b(x) prediction\n",
" \"\"\"\n",
" p = np.dot(x,w) + b # y^ = x.w + b = x[j].w[j] + b \n",
" return p # return prediction y^ "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x_vec shape (4,), x_vec value: [2104 5 1 45]\n",
"f_wb shape (), prediction: 459.99999761940825\n"
]
}
],
"source": [
"# Select the first row/vector, from our training data -> X_train[row = 0, All cols/elements = :]\n",
"# x_vec = [2104, 5, 1, 45]\n",
"x_vec = X_train[0,:]\n",
"\n",
"# Print selected vector size/shape, and the vector/1D array elements\n",
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\") \n",
"\n",
"# Call the previous function 'predict_single_loop()' and make a prediction y^=f_wb\n",
"f_wb = predict(x_vec, w_init, b_init)\n",
"\n",
"# Print scalar prediction y^=f_wb size/shape, and its value\n",
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The results and shapes are the same as the previous version which used looping. Going forward, `np.dot` will be used for these operations. The prediction is now a single statement. Most routines will implement it directly rather than calling a separate predict routine."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_4\"></a>\n",
"# 4 Compute Cost With Multiple Variables\n",
"The equation for the cost function with multiple variables $J(\\mathbf{w},b)$ is:\n",
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 \\tag{3}$$ \n",
"where:\n",
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b \\tag{4} $$ \n",
"\n",
"\n",
"In contrast to previous labs, $\\mathbf{w}$ and $\\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is an implementation of equations (3) and (4). Note that this uses a *standard pattern for this course* where a for loop over all `m` examples is used."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def compute_cost(X, y, w, b): # Input X 2D array/Matrix\n",
" # Input y 1D array/vector\n",
" # Input w parameter 1D array/vector\n",
" # Input b parameter fixed scalar\n",
" \"\"\"\n",
" compute cost\n",
" Args:\n",
" X (ndarray (m,n)): Data, m examples with n features\n",
" y (ndarray (m,)) : target values\n",
" w (ndarray (n,)) : model parameters \n",
" b (scalar) : model parameter\n",
" \n",
" Returns:\n",
" cost (scalar): cost\n",
" \"\"\"\n",
" m = X.shape[0] # Select the total number of samples/rows/vectors at 'X_train' \n",
" # 2D numpy array/matrix (_ , ), and assign to (m).\n",
" cost = 0.0 # Init the value of counter cost = J = 0.0\n",
" \n",
" for i in range(m): # Iterate through each ith row/training sample/vector X[i] = [X0, X1, X2, X3] with \n",
" # range(m) = i = [0, 1, 2, ..., m-1]\n",
" \n",
" f_wb_i = np.dot(X[i], w) + b # X[i] = [X0, X1, X2, X3] each 1D vector is selected, with n = 4 features \n",
" # w = [w0, w1, w2, w3], 1D parameters vector, with n = 4 related features \n",
" # y^[i] = f_wb(x) = X[i] * w + b -> (n=4,)(n=4,) + scalar = scalar + scalar\n",
" # y^[i] = f_wb(x) = X[i]0*w0 + X[i]1*w1 + X[i]2*w2 + X[i]3*w3 + b\n",
" \n",
" cost = cost + (f_wb_i - y[i]) ** 2 # cost_new = cost_prev + (y^[i] - y[i])^2 -> scalar + (scalar - scalar)\n",
" # (init as 0)\n",
" cost = cost / (2 * m) # cost_new = cost_new / (2*m) -> scalar / (scalar) \n",
" return cost # return cost_new = J -> scalar or the minimum value of cost function J(w,b)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cost at optimal w : 1.5578904880036537e-12\n"
]
}
],
"source": [
"# Compute and display cost = J(w,b) using our pre-chosen optimal parameters \n",
"# w = w_init = [0.39133535, 18.75376741, -53.36032453, -26.42131618].\n",
"# b = b_init = 785.1811367994083\n",
"# X_train = [2104 5 1 45\n",
"# 1416 3 2 40\n",
"# 852 2 1 35] is a 2D array/matrix. \n",
"# y_train = [460 232 178] is a 1D array/vector\n",
"cost = compute_cost(X_train, y_train, w_init, b_init)\n",
"\n",
"# Print J(w,b) = computed cost value at w = w_init, before gradient descent\n",
"print(f'Cost at optimal w : {cost}') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Expected Result**: Cost at optimal w : 1.5578904045996674e-12"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_5\"></a>\n",
"# 5 Gradient Descent With Multiple Variables\n",
"Gradient descent for multiple variables:\n",
"\n",
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
"& w_j = w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{5} \\; & \\text{for j = 0..n-1}\\newline\n",
"&b\\ \\ = b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
"\\end{align*}$$\n",
"\n",
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
"\n",
"$$\n",
"\\begin{align}\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{6} \\\\\n",
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{7}\n",
"\\end{align}\n",
"$$\n",
"* m is the number of training examples in the data set\n",
"\n",
" \n",
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_5.1\"></a>\n",
"## 5.1 Compute Gradient with Multiple Variables\n",
"An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an\n",
"- outer loop over all m examples. \n",
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ for the example can be computed directly and accumulated\n",
" - in a second loop over all n features:\n",
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$ is computed for each $w_j$.\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def compute_gradient(X, y, w, b): # Inputs X_train 2D array/matrix\n",
" # Input y 1D array/vector\n",
" # Input w = w_init 1D array/vector\n",
" # Input b = b_init fixed scalar\n",
" \"\"\"\n",
" Computes the gradient for linear regression \n",
" Args:\n",
" X (ndarray (m,n)): Data, m examples with n features\n",
" y (ndarray (m,)) : target values\n",
" w (ndarray (n,)) : model parameters \n",
" b (scalar) : model parameter\n",
" \n",
" Returns:\n",
" dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b. \n",
" \"\"\"\n",
" m,n = X.shape # X.shape = (m,n) -> number of examples m, number of features n\n",
" dj_dw = np.zeros((n,)) # init djdw = [0, 0, 0, 0] 1D array/vector\n",
" dj_db = 0. # init djdb = 0 scalar\n",
"\n",
" for i in range(m): # Iterate rows i = 0 ... m-1 \n",
" err = (np.dot(X[i], w) + b) - y[i] # (X[i].w + b) - y[i] = y^[i] - y[i]\n",
" for j in range(n): # Each i-th row, iterate through cols j = 0 ... n-1 \n",
" # X[i, j] = X[i][j] ----> X_train = i [2104 5 1 45\n",
" # 1416 3 2 40\n",
" # 852 2 1 35] \n",
" dj_dw[j] = dj_dw[j] + err * X[i, j] # new_djdw = prev_djdw + (y^[i] - y[i])*X[i][j] \n",
" # (init as 0) \n",
" dj_db = dj_db + err # new_djdb = prev_djdb + (y^[i] - y[i])\n",
" # (init as 0)\n",
" dj_dw = dj_dw / m # new_djdw = new_djdw / m \n",
" dj_db = dj_db / m # new_djdb = new_djdb / m \n",
" \n",
" return dj_db, dj_dw # return dj_db = new_djdb, dj_dw = new_djdw "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dj_db at initial w,b: -1.673925169143331e-06\n",
"dj_dw at initial w,b: \n",
" [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]\n"
]
}
],
"source": [
"# Call ‘compute_gradient()’ defined above. \n",
"# Input X_train 2D array/Matrix \n",
"# Input y_train 1D array/vector \n",
"# Input w = w_init 1D array/vector \n",
"# Input b = b_init scalar/number \n",
"tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)\n",
"\n",
"# Then display returned gradient as a number/scalar dj_db\n",
"print(f'dj_db at initial w,b: {tmp_dj_db}')\n",
"\n",
"# and display returned gradient as 1D array/vector dj_dw\n",
"print(f'dj_dw at initial w,b: \\n {tmp_dj_dw}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Expected Result**: \n",
"dj_db at initial w,b: -1.6739251122999121e-06 \n",
"dj_dw at initial w,b: \n",
" [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05] "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"toc_15456_5.2\"></a>\n",
"## 5.2 Gradient Descent With Multiple Variables\n",
"The routine below implements equation (5) above."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):\n",
" \n",
" # Input X_train 2D array/matrix\n",
" # Input y_train 1D array/vector\n",
" # Input w = w_init parameter 1D array/vector\n",
" # Input b = b_init fixed parameter scalar\n",
" # Input 'cost_function' name = 'compute_cost' function name\n",
" # Input 'gradient_function' name = 'compute_gradient' function name\n",
" # Input Learning Rate = alpha constant\n",
" # Input num_iters = number of times GD loop is executed\n",
" \n",
" \"\"\"\n",
" Performs batch (All samples) gradient descent to learn w and b. Updates w and b by taking \n",
" num_iters gradient steps with learning rate alpha\n",
" \n",
" Input Args:\n",
" X (ndarray (m,n)) : Data, m examples with n features\n",
" y (ndarray (m,)) : target values\n",
" w_in (ndarray (n,)) : initial model parameters \n",
" b_in (scalar) : initial model parameter\n",
" cost_function : function to compute cost\n",
" gradient_function : function to compute the gradient\n",
" alpha (float) : Learning rate\n",
" num_iters (int) : number of iterations to run gradient descent\n",
" \n",
" Outputs Returned:\n",
" w (ndarray (n,)) : Final Updated 1D array/vector with values or parameters \n",
" b (scalar) : Final Updated scalar value or parameter \n",
" \"\"\"\n",
" \n",
" # An array to store cost J(wj,b) and wj's, at each iteration primarily, for graphing later\n",
" J_history = [] # Empty array to be filled with J(wj,b)^(i) COST values, each iter\n",
" w = copy.deepcopy(w_in) # Avoid modifying global w within function. A copy of w_in is copied into w.\n",
" # It means that any change made to w (a copy of w_in) do not affect the original w_in vector.\n",
" b = b_in # First b parameter value = initial parameter b_init\n",
" \n",
" for i in range(num_iters): # GD loop is executed the total number of iters i = 0, 1, 2, ..., N-1\n",
"\n",
" # Calculate gradients/slopes/derivative terms and update the parameters w and b, \n",
" # using JUST the name of the function 'gradient_function(X,y,w,b)' <- 'compute_gradient(X,y,w,b)'\n",
" dj_db,dj_dw = gradient_function(X, y, w, b) \n",
"\n",
" # Update Parameters wj and b simultaneously (one after another), \n",
" # using previous GRADIENT DESCENT equations at (5) above\n",
" w = w - alpha * dj_dw # updated_w = actual_w – learning_rate * dJ(wj,b) / dwj\n",
" b = b - alpha * dj_db # updated_b = actual_b – learning_rate * dJ(wj,b) / db\n",
" \n",
" # Save cost J(wj,b)^(i) and [wj],b parameters, at each i-th GD iteration\n",
" if i<100000: # Append Cost and parameters lists, until total iters < 100.000 (prevent resource exhaustion)\n",
" \n",
" J_history.append(cost_function(X, y, w, b)) # Calculate 'cost_function(X,y,w,b)' <- input 'compute_cost(X,y,w,b)'\n",
" # every GD loop iteration, then append each cost result per iter \n",
" # into 'J_history []' list\n",
"\n",
" # Python 'math.ceil()' method, round a number upward to its nearest integer\n",
" # Print 10 cost values -> step = round (total_iters/10). It begins from 0 until (total iters - step)\n",
" if i% math.ceil(num_iters / 10) == 0: # As total iters i[0-999], then last value 1.000 is not printed\n",
" \n",
" print(f\"Iteration {i:4d}: Cost {J_history[-1]:8.2f} \") # Print as f\"\" full string\n",
" # Iteration value has 4 numbers using {i:4}\n",
" # Select each whole Cost value with [-1] at 'J_history [] list',\n",
" # and display 8 characters in total, but 2 of them are decimals \n",
" # 'xxxxx.xx', using {J_history[-1]:8.2f}\n",
" \n",
" return w, b, J_history # return [wj_final vector], b_final scalar, [Cost J appended list] for graphing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the next cell you will test the implementation. "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Iteration 0: Cost 2529.46 \n",
"Iteration 100: Cost 695.99 \n",
"Iteration 200: Cost 694.92 \n",
"Iteration 300: Cost 693.86 \n",
"Iteration 400: Cost 692.81 \n",
"Iteration 500: Cost 691.77 \n",
"Iteration 600: Cost 690.73 \n",
"Iteration 700: Cost 689.71 \n",
"Iteration 800: Cost 688.70 \n",
"Iteration 900: Cost 687.69 \n",
"b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07] \n",
"prediction: 426.19, target value: 460\n",
"prediction: 286.17, target value: 232\n",
"prediction: 171.47, target value: 178\n"
]
}
],
"source": [
"# Initialize parameters\n",
"initial_w = np.zeros_like(w_init) # Create a numpy array with the same size of its input array w_init, but ALL zeros.\n",
" # w_init = [w0, w1, w2, w3] is a 1D array related to (4) Xj input features, \n",
" # so it has n = 4 elements in total (j = 0, 1, 2, 3).\n",
" # It means initial_w = [0, 0, 0, 0]\n",
"initial_b = 0. # initial_b = 0.\n",
"\n",
"# Some gradient descent settings\n",
"iterations = 1000 # Set total GD iterations = 1000\n",
"alpha = 5.0e-7 # learning_rate alpha = 0.0000005\n",
"\n",
"# Run gradient descent executing 'gradient_descent()' function, which receive inputs\n",
"# X_train, y_train, initial_w, initial_b, compute_cost (function name), \n",
"# compute_gradient (function name), learning_Rate, total_GD_iters\n",
"# outputs final updated 1D array w, final updated scalar b, [list of Cost values per GD iteration]\n",
"# print 10 GD iterations with the Cost value, each 100 steps\n",
"w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,\n",
" compute_cost, compute_gradient, \n",
" alpha, iterations)\n",
"\n",
"# Print out b final scalar with 2 decimals, doing {b_final:0.2f} -> x.xx\n",
"# Print out w final vector with 4 elements in total. Doesn't specify decimals used {w_final}\n",
"print(f\"b,w found by gradient descent: {b_final:0.2f},{w_final} \")\n",
"\n",
"# X_train = [m rows/samples x n cols/features] = (m = 3 rows, n = 4 cols) shape\n",
"# Select the total number of rows/samples/vectors\n",
"m,_ = X_train.shape # m = 3 rows/samples/vectors\n",
"\n",
"for i in range(m): # Iterate through each row/vector/training sample with i = 0, 1, 2\n",
" # for accessing to each X_train[i] vector, and obtain prediction \n",
" # y^ = X[i].w_final = X0w0 + X1w1 + X2w2 + X3w3 \n",
" # Also access to each y_train[i] target element\n",
" # Then print each of 3 predictions y^, one per each training sample/row (m=3)\n",
" print(f\"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Expected Result**: \n",
"b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07] \n",
"prediction: 426.19, target value: 460 \n",
"prediction: 286.17, target value: 232 \n",
"prediction: 171.47, target value: 178 "
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 864x288 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Plot COST J vs iters. \n",
"# Create one figure of size = (12,4) inches, and two subplots at positions (1 row, 1 col) and (1 row, 2 col)\n",
"# Also assign the two subplots to objects 'ax1' and 'ax2' \n",
"fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))\n",
"\n",
"ax1.plot(J_hist) # plot the first 'ax1' subplot object with y1 = J_hist list\n",
"\n",
"#print(len(J_hist))\n",
"#print(J_hist[100:])\n",
"x2 = 100 + np.arange(len(J_hist[100:])) # x2 = Create array [0-999] and add 100 iterations -> x = [100-1099]\n",
"y2 = J_hist[100:] # y2 = Add array with Cost data [from 696 to 686]\n",
"\n",
"ax2.plot(x2, y2) # plot the second 'ax2' subplot object with y2 = J_hist[100:] tail of the list \n",
"\n",
"ax1.set_title(\"Cost vs. iteration\"); ax2.set_title(\"Cost vs. iteration (tail)\") # Set ax1 and ax2 subplots, title names\n",
"ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost') # Set ax1 and ax2, y-label names\n",
"ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step') # Set ax1 and ax2, x-label names\n",
"plt.show() # Display 'fig' object, with ax1 and ax2 subplots"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*These results are not inspiring*! Cost is still declining and our predictions are not very accurate. The next lab will explore how to improve on this."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"<a name=\"toc_15456_6\"></a>\n",
"# 6 Congratulations!\n",
"In this lab you:\n",
"- Redeveloped the routines for linear regression, now with multiple variables.\n",
"- Utilized NumPy `np.dot` to vectorize the implementations"
]
}
],
"metadata": {
"dl_toc_settings": {
"rndtag": "15456"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"toc-autonumbering": false
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment