Skip to content

Instantly share code, notes, and snippets.

@hadrienj

hadrienj/blog.md Secret

Created December 28, 2020 21:32
Show Gist options
  • Save hadrienj/bf94ee0be22ff616950575795a575243 to your computer and use it in GitHub Desktop.
Save hadrienj/bf94ee0be22ff616950575795a575243 to your computer and use it in GitHub Desktop.
bg layout mathjax categories tags author date excerpt excerpt-image twitterImg title crawlertitle essential-math-sample
Essential-Math-for-Data-Science-Update/bridge.jpg
post
true
posts
essential-math
python
numpy
hadrienj
2020-12-28
<img src="../../assets/images/ch08_linear_equations/ch08_linear_equations_15_0.png" width=200><em>Each point corresponds to the combination of x and y values.</em>
Essential-Math-for-Data-Science-Update/output_ch06_139_0
true

https://gist.github.com/2f6855d657fe29c5a229714401032be9

** \def\text{Var}{{\text{Var}}} % Variance \def\text{Corr}{{\text{Corr}}} % Correlation \def\text{Cov}{{\text{Cov}}} % Covariance \DeclareMathOperator{\mathbb{E}}{\mathbb{E}} % expected value\n \newcommand\norm[1]{\lVert#1\rVert} % norm \defℝ{{ℝ}} % Sets \def\textrm{X}{{\textrm{X}}} % Scalar random variables \def\textrm{Y}{{\textrm{Y}}} \def\textrm{Z}{{\textrm{Z}}} \def\textbf{X}{{\textbf{X}}} % Vector random variables \def\textbf{Y}{{\textbf{Y}}} \def\textbf{Z}{{\textbf{Z}}} \def\boldsymbol{\theta}{{\boldsymbol{\theta}}} % Vectors \def\boldsymbol{a}{{\boldsymbol{a}}} \def\boldsymbol{b}{{\boldsymbol{b}}} \def\boldsymbol{i}{{\boldsymbol{i}}} \def\boldsymbol{j}{{\boldsymbol{j}}} \def\boldsymbol{p}{{\boldsymbol{p}}} \def\boldsymbol{q}{{\boldsymbol{q}}} \def\boldsymbol{u}{{\boldsymbol{u}}} \def\boldsymbol{v}{{\boldsymbol{v}}} \def\boldsymbol{w}{{\boldsymbol{w}}} \def\boldsymbol{x}{{\boldsymbol{x}}} \def\boldsymbol{y}{{\boldsymbol{y}}} \def\boldsymbol{z}{{\boldsymbol{z}}} \defu{{u}} % Elements of vectors \defv{{v}} \defw{{w}} \defx{{x}} \defy{{y}} \defz{{z}} \def\boldsymbol{A}{{\boldsymbol{A}}} % Matrices \def\boldsymbol{B}{{\boldsymbol{B}}} \def\boldsymbol{C}{{\boldsymbol{C}}} \def\boldsymbol{D}{{\boldsymbol{D}}} \def\boldsymbol{I}{{\boldsymbol{I}}} \def\boldsymbol{Q}{{\boldsymbol{Q}}} \def\boldsymbol{S}{{\boldsymbol{S}}} \def\boldsymbol{T}{{\boldsymbol{T}}} \def\boldsymbol{U}{{\boldsymbol{U}}} \def\boldsymbol{V}{{\boldsymbol{V}}} \def\boldsymbol{W}{{\boldsymbol{W}}} \def\boldsymbol{X}{{\boldsymbol{X}}} \def\boldsymbol{\Lambda}{{\boldsymbol{\Lambda}}} \def\boldsymbol{\Sigma}{{\boldsymbol{\Sigma}}} \defA{{A}} % Elements of matrices \defB{{B}} \defX{{X}} \defT{{T}} % Transformations **

Span, Linear Dependency, and Space Transformation

As you saw in Essential Math for Data Science and Essential Math for Data Science, being able to manipulate vectors and matrices is critical to create machine learning and deep learning pipelines, for instance for reshaping your raw data before using it with machine learning libraries.

The goal of this chapter is to get you to the next level of understanding of vectors and matrices. You'll start seeing matrices, not only as operations on numbers, but also as a way to transform vector spaces. This conception will give you the foundations needed to understand more complex linear algebra concepts like matrix decomposition. You'll build up on what you learned about vector addition and scalar multiplication to understand linear combinations of vectors. You will also see subspace, span, linear dependency, that are major concepts of linear algebra used in machine learning and data science.

Linear Transformations {#sec:ch06_section_linear_transformations}

Intuition

A linear transformation (or simply transformation, sometimes called linear map) is a mapping between two vector spaces: it takes a vector as input and transforms it into a new output vector. A function is said to be linear if the properties of additivity and scalar multiplication are preserved, that is, the same result is obtained if these operations are done before or after the transformation. Linear functions are synonymously called linear transformations.

Linear transformations notation

You can encounter the following notation to describe a linear transformation: *T(\boldsymbol{v})*. This refers to the vector v transformed by *T*. A transformation *T* is associated with a specific matrix. Since additivity and scalar multiplication must be preserved in linear transformation, you can write:

** T(\boldsymbol{v}+\boldsymbol{w}) = T(\boldsymbol{v}) + T(\boldsymbol{w}) **

and

** T(c\boldsymbol{v}) = cT(\boldsymbol{v}) **

Linear Transformations as Vectors and Matrices {#sec:ch06_section_linear_transformations_as_vectors_and_matrices}

In linear algebra, the information concerning a linear transformation can be represented as a matrix. Moreover, every linear transformation can be expressed as a matrix.

When you do the linear transformation associated with a matrix, we say that you apply the matrix to the vector. More concretely, it means that you calculate the matrix-vector product of the matrix and the vector. In this case, the matrix can sometimes be called a transformation matrix. For instance, you can apply a matrix A to a vector v with their product \boldsymbol{A} \boldsymbol{v}.

Applying matrices

Keep in mind that, to apply a matrix to a vector, you *left multiply* the vector by the matrix: the matrix is on the left to the vector.

When you multiply multiple matrices, the corresponding linear transformations are combined in the order from right to left.

For instance, let's say that a matrix A does a 45-degree clockwise rotation and a matrix B does a stretching, the product \boldsymbol{B} \boldsymbol{A} means that you first do the rotation and then the stretching.

This shows that the matrix product is:

  • Not commutative (\boldsymbol{A}\boldsymbol{B} \neq \boldsymbol{B}\boldsymbol{A}): the stretching then the rotation is a different transformation than the rotation then the stretching.
  • Associative (\boldsymbol{A}(\boldsymbol{B}\boldsymbol{C})) = ((\boldsymbol{A}\boldsymbol{B})\boldsymbol{C}): the same transformations associated with the matrices A, B and C are done in the same order.

A matrix-vector product can thus be considered as a way to transform a vector. You saw in Essential Math for Data Science that the shape of A and v must match for the product to be possible.

Geometric Interpretation {#sec:ch06_section_geometric_interpretation}

A good way to understand the relationship between matrices and linear transformations is to actually visualize these transformations. To do that, you'll use a grid of points in a two-dimensional space, each point corresponding to a vector (it is easier to visualize points instead of arrows pointing from the origin).

Let's start by creating the grid using the function meshgrid() from Numpy:

https://gist.github.com/0feb9e4f972af934c685a1ab26669b22

The meshgrid() function allows you to create all combinations of points from the arrays x and y. Let's plot the scatter plot corresponding to xx and yy.

https://gist.github.com/82fc1256fc69517c3c26185cd27a12a1

Figure 1: Each point corresponds to the combination of x and y values. Figure 1: Each point corresponds to the combination of x and y values.

You can see the grid in Figure 1. The color corresponds to the addition of xx and yy values. This will make transformations easier to visualize.

The Linear Transformation associated with a Matrix

As a first example, let's visualize the transformation associated with the following two-dimensional square matrix.

** \boldsymbol{T} = \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix} **

Consider that each point of the grid is a vector defined by two coordinates (x and y).

Let's create the transformation matrix T:

https://gist.github.com/a72079c34927a6f13c929ad3e594b5bc

First, you need to structure the points of the grid to be able to apply the matrix to each of them. For now, you have two 20 by 20 matrices (xx and yy) corresponding to 20 \cdot 20 = 400 points, each having a x value (matrix xx) and a y value (yy). Let's create a 2 by 400 matrix with xx flatten as the first column and yy as the second column.

https://gist.github.com/f0e643cf76614edd06a0ff6b8728a644

(2, 400)

You have now 400 points, each with two coordinates. Let's apply the transformation matrix T to the first two-dimensional point (xy[:, 0]), for instance:

https://gist.github.com/0d4349aa7332257cb3216b4594bcdf89

array([10, 10])

You can similarly apply T to each point by calculating its product with the matrix containing all points:

https://gist.github.com/f825ac9ce93fe1fe13c96e8dbf5ea1fa

(2, 400)

You can see that the shape is still (2, 400). Each transformed vector (that is, each point of the grid) is one of the column of this new matrix. Now, let's reshape this array to have two arrays with a similar shape to xx and yy.

https://gist.github.com/5ac81ec2cabeac2b2794d4aed96a31d0

Let's plot the grid before and after the transformation:

https://gist.github.com/2e6cd78b6bf8c2e0d53919ff0156ab82

Figure 2: The grid of points before (left) and after (right) its transformation by the matrix <b><i>T</i></b>. Figure 2: The grid of points before (left) and after (right) its transformation by the matrix T.

Figure 2 shows that the matrix T rotated the points of the grid.

Shapes of the Input and Output Vectors {#sec:ch08_section_shapes_of_the_input_and_output_vectors}

In the previous example, the output vectors have the same number of dimensions than the input vectors (two dimensions).

You might notice that the shape of the transformation matrix must match the shape of the vectors you want to transform.

Figure 3: Shape of the transformation of the grid points by <b><i>T</i></b>. Figure 3: Shape of the transformation of the grid points by T.

Figure 3 illustrates the shapes of this example. The first matrix with a shape (2, 2) is the transformation matrix T and the second matrix with a shape (2, 400) corresponds to the 400 vectors stacked. As illustrated in blue, the number of rows of the T corresponds to the number of dimensions of the output vectors. As illustrated in red, the transformation matrix must have the same number of columns than the number of dimensions of the matrix you want to transform.

More generally, the size of the transformation matrix tells you the input and output dimensions. A m by n transformation matrix transforms n-dimensional vectors to m-dimensional vectors.

Stretching and Rotation

Let's now visualize the transformation associated with the following matrix:

** \boldsymbol{T} = \begin{bmatrix} 1.3 & -2.4 \\ 0.1 & 2 \end{bmatrix} **

Let's proceed as in the previous example:

https://gist.github.com/e6e3b16bcd36738faee07696cb67b88c

Figure 4: The grid of points before (left) and after (right) the transformation by the new matrix <b><i>T</i></b>. Figure 4: The grid of points before (left) and after (right) the transformation by the new matrix T.

Figure 4 shows that the transformation is different from the previous rotation. This time, there is a rotation, but also a stretching of the space.

Are these transformation linear?

You might wonder why these transformations are called "linear". You saw that a linear transformation implies that the properties of additivity and scalar multiplication are preserved.

Geometrically, there is linearity if the vectors lying on the same line in the input space are also on the same line in the output space, and if the origin remains at the same location.

Special Cases

Inverse Matrices

Transforming the space with a matrix can be reversed if the matrix is invertible. In this case, the inverse \boldsymbol{T}^{-1} of the matrix T is associated to a transformation that takes back the space to the initial state after T has been applied.

Let's take again the example of the transformation associated with the following matrix:

** \boldsymbol{T} = \begin{bmatrix} 1.3 & -2.4 \\ 0.1 & 2 \end{bmatrix} **

You'll plot the initial grid of point, the grid after being transformed by T, and the grid after successive application of T and \boldsymbol{T}^{-1} (remember that matrices must be left-multiplied):

https://gist.github.com/6c5395f21ca2ec973f5729a5bb3f92ac

Figure 5: Inverse of a transformation: the initial space (left) is transformed with the matrix <b><i>T</i></b> (middle) and transformed back using \boldsymbol{T}^{-1} (right). Figure 5: Inverse of a transformation: the initial space (left) is transformed with the matrix T (middle) and transformed back using \boldsymbol{T}^{-1} (right).

As you can see in Figure 5, the inverse \boldsymbol{T}^{-1} of the matrix T is associated with a transformation that reverses the one associated with T.

Mathematically, the transformation of a vector v by T is defined as:

** \boldsymbol{T} \boldsymbol{v} **

To transform it back, you multiply by the inverse of T:

** \boldsymbol{T}^{-1} \boldsymbol{T} \boldsymbol{v} **

Order of the matrix products

Note that the order of the products is from right to left. The vector on the right of the product is first transformed by T and then the result is transformed by *\boldsymbol{T}^{-1}*.

Since you saw in Essential Math for Data Science that \boldsymbol{T}^{-1} \boldsymbol{T} = \boldsymbol{I}, you have:

** \boldsymbol{T}^{-1} \boldsymbol{T} \boldsymbol{v} = \boldsymbol{I} \boldsymbol{v} = \boldsymbol{v} **

meaning that you get back the initial vector v.

Non Invertible Matrices

The linear transformation associated with a singular matrix (that is a non invertible matrix, see more details in Essential Math for Data Science) can't be reversed. It can occur when there is a loss of information with the transformation. Take the following matrix:

** \boldsymbol{T} = \begin{bmatrix} 3 & 6 \\ 2 & 4 \end{bmatrix} **

Let's see how it transforms the space:

https://gist.github.com/dcd32dfd62b52285620cf5be8a9e643f

Figure 6: The initial space (left) is transformed into a line (right) with the matrix <b><i>T</i></b>. Multiple input vectors land on the same location in the output space. Figure 6: The initial space (left) is transformed into a line (right) with the matrix T. Multiple input vectors land on the same location in the output space.

You can see in Figure 6 that the transformed vectors are on a line. There are points that land on the same place after the transformation. Thus, it is not possible to go back. In this case, the matrix T is not invertible: it is singular.

undefined

Basis

Definitions

The basis is a coordinate system used to describe vector spaces (sets of vectors). It is a reference that you use to associate numbers with geometric vectors. You'll for instance see in Essential Math for Data Science that the concept of basis is important to understand eigendecomposition.

To be considered as a basis, a set of vectors must:

  • Be linearly independent.
  • Span the space.

Every vectors in the space is a unique combination of the basis vectors. The dimension of a space is defined to be the size of a basis set. For instance, there are two basis vectors in ℝ^2 (corresponding to the x and y axis in the Cartesian plane), or three in ℝ^3.

As you saw in the last section, if the number of vectors in a set is larger than the dimensions of the space, they can't be linearly independent. If a set contains fewer vectors than number of dimensions, they can't span the whole space.

As you saw, vectors can be represented as arrows going from the origin to a point in space. The coordinates of this point can be stored in a list. The geometric representation of a vector in the Cartesian plane implies that we take a reference: the directions given by the two axes x and y.

Basis vectors are the vectors corresponding to this reference. In the Cartesian plane, the basis vectors are orthogonal unit vectors (length of one), generally denoted as i and j.

Figure 7: The basis vectors in the Cartesian plane. Figure 7: The basis vectors in the Cartesian plane.

For instance, in Figure 7, the basis vectors i and j point in the direction of the axis x and y respectively. These vectors give the standard basis. If you put these basis vectors into a matrix, you have the following identity matrix (see Essential Math for Data Science):

** \boldsymbol{I}_2 = \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} **

Thus, the columns of \boldsymbol{I}_2 span ℝ^2. In the same way, the columns of \boldsymbol{I}_3 span ℝ^3 and so on.

Orthogonal basis

Basis vectors can be orthogonal because orthogonal vectors are independent. However, the converse is not necessarily true: non orthogonal vectors can be linearly independent and thus form a basis (but not a standard basis).

The basis of your vector space is very important because the values of the coordinates corresponding to the vectors depends on this basis. By the way, you can choose different basis vectors, like in the ones in Figure 8 for instance.

Figure 8: Another set of basis vectors. Figure 8: Another set of basis vectors.

Keep in mind that vector coordinates depend on an implicit choice of basis vectors.

Linear Combination of Basis Vectors

You can consider any vector in a vector space as a linear combination of the basis vectors.

For instance, take the following two-dimensional vector v:

** \boldsymbol{v} = \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} **

Figure 9: Components of the vector <b><i>v</i></b>. Figure 9: Components of the vector v.

The components of the vector v are the projections on the x-axis and on the y-axis (v_x and v_y, as illustrated in Figure 9). The vector v corresponds to the sum of its components: \boldsymbol{v} = v_x + v_y, and you can obtain these components by scaling the basis vectors: v_x = 2 \boldsymbol{i} and v_y = -0.5 \boldsymbol{j}. Thus, the vector v shown in Figure 9 can be considered as a linear combination of the two basis vectors i and j:

** \begin{aligned} \boldsymbol{v} &= 2\boldsymbol{i} - 0.5\boldsymbol{j} \\ &= 2\begin{bmatrix} 1 \\ 0 \end{bmatrix}

  • 0.5\begin{bmatrix} 0 \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} 2 \cdot 1 \\ 2 \cdot 0 \end{bmatrix}
  • \begin{bmatrix} 0.5 \cdot 0 \\ 0.5 \cdot 1 \end{bmatrix} \\ &= \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} \end{aligned} **

Other Bases

The columns of identity matrices are not the only case of linearly independent columns vectors. It is possible to find other sets of n vectors linearly independent in ℝ^n.

For instance, let's consider the following vectors in ℝ^2:

** \boldsymbol{v} = \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} **

and

** \boldsymbol{w} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} **

The vectors v and w are represented in Figure 10.

Figure 10: Another basis in a two-dimensional space. Figure 10: Another basis in a two-dimensional space.

From the definition above, the vectors v and w are a basis because they are linearly independent (you can't obtain one of them from combinations of the other) and they span the space (all the space can be reached from the linear combinations of these vectors).

It is critical to keep in mind that, when you use the components of vectors (for instance v_x and v_y, the x and y components of the vector v), the values are relative to the basis you chose. If you use another basis, these values will be different.

You'll see later that the ability to change the bases is fundamental in linear algebra and is key to understand eigendecomposition (Essential Math for Data Science) or Singular Value Decomposition (Essential Math for Data Science).

Vectors Are Defined With Respect to a Basis

You saw that, to associate geometric vectors (arrows in the space) with coordinate vectors (arrays of numbers), you need a reference. This reference is the basis of your vector space. For this reason, a vector should always be defined with respect to a basis.

Let's take the following vector:

** \boldsymbol{v} = \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} **

The values of the x and y components are respectively 2 and -0.5. The standard basis is used when not specified.

You could write \boldsymbol{I} \boldsymbol{v} to specify that these numbers correspond to coordinates with respect to the standard basis. In this case I is called the change of basis matrix.

** \boldsymbol{v} = \boldsymbol{I}\boldsymbol{v} = \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} **

You can define vectors with respect to another basis by using another matrix than I. You'll see more about change of basis in Essential Math for Data Science.

...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment