bg | layout | mathjax | categories | tags | author | date | excerpt | excerpt-image | twitterImg | title | crawlertitle | essential-math-sample | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Essential-Math-for-Data-Science-Update/bridge.jpg |
post |
true |
posts |
|
hadrienj |
2020-12-28 |
<img src="../../assets/images/ch08_linear_equations/ch08_linear_equations_15_0.png" width=200><em>Each point corresponds to the combination of x and y values.</em> |
Essential-Math-for-Data-Science-Update/output_ch06_139_0 |
true |
https://gist.github.com/2f6855d657fe29c5a229714401032be9
** \def\text{Var}{{\text{Var}}} % Variance \def\text{Corr}{{\text{Corr}}} % Correlation \def\text{Cov}{{\text{Cov}}} % Covariance \DeclareMathOperator{\mathbb{E}}{\mathbb{E}} % expected value\n \newcommand\norm[1]{\lVert#1\rVert} % norm \defℝ{{ℝ}} % Sets \def\textrm{X}{{\textrm{X}}} % Scalar random variables \def\textrm{Y}{{\textrm{Y}}} \def\textrm{Z}{{\textrm{Z}}} \def\textbf{X}{{\textbf{X}}} % Vector random variables \def\textbf{Y}{{\textbf{Y}}} \def\textbf{Z}{{\textbf{Z}}} \def\boldsymbol{\theta}{{\boldsymbol{\theta}}} % Vectors \def\boldsymbol{a}{{\boldsymbol{a}}} \def\boldsymbol{b}{{\boldsymbol{b}}} \def\boldsymbol{i}{{\boldsymbol{i}}} \def\boldsymbol{j}{{\boldsymbol{j}}} \def\boldsymbol{p}{{\boldsymbol{p}}} \def\boldsymbol{q}{{\boldsymbol{q}}} \def\boldsymbol{u}{{\boldsymbol{u}}} \def\boldsymbol{v}{{\boldsymbol{v}}} \def\boldsymbol{w}{{\boldsymbol{w}}} \def\boldsymbol{x}{{\boldsymbol{x}}} \def\boldsymbol{y}{{\boldsymbol{y}}} \def\boldsymbol{z}{{\boldsymbol{z}}} \defu{{u}} % Elements of vectors \defv{{v}} \defw{{w}} \defx{{x}} \defy{{y}} \defz{{z}} \def\boldsymbol{A}{{\boldsymbol{A}}} % Matrices \def\boldsymbol{B}{{\boldsymbol{B}}} \def\boldsymbol{C}{{\boldsymbol{C}}} \def\boldsymbol{D}{{\boldsymbol{D}}} \def\boldsymbol{I}{{\boldsymbol{I}}} \def\boldsymbol{Q}{{\boldsymbol{Q}}} \def\boldsymbol{S}{{\boldsymbol{S}}} \def\boldsymbol{T}{{\boldsymbol{T}}} \def\boldsymbol{U}{{\boldsymbol{U}}} \def\boldsymbol{V}{{\boldsymbol{V}}} \def\boldsymbol{W}{{\boldsymbol{W}}} \def\boldsymbol{X}{{\boldsymbol{X}}} \def\boldsymbol{\Lambda}{{\boldsymbol{\Lambda}}} \def\boldsymbol{\Sigma}{{\boldsymbol{\Sigma}}} \defA{{A}} % Elements of matrices \defB{{B}} \defX{{X}} \defT{{T}} % Transformations **
As you saw in Essential Math for Data Science and Essential Math for Data Science, being able to manipulate vectors and matrices is critical to create machine learning and deep learning pipelines, for instance for reshaping your raw data before using it with machine learning libraries.
The goal of this chapter is to get you to the next level of understanding of vectors and matrices. You'll start seeing matrices, not only as operations on numbers, but also as a way to transform vector spaces. This conception will give you the foundations needed to understand more complex linear algebra concepts like matrix decomposition. You'll build up on what you learned about vector addition and scalar multiplication to understand linear combinations of vectors. You will also see subspace, span, linear dependency, that are major concepts of linear algebra used in machine learning and data science.
A linear transformation (or simply transformation, sometimes called linear map) is a mapping between two vector spaces: it takes a vector as input and transforms it into a new output vector. A function is said to be linear if the properties of additivity and scalar multiplication are preserved, that is, the same result is obtained if these operations are done before or after the transformation. Linear functions are synonymously called linear transformations.
Linear transformations notation
You can encounter the following notation to describe a linear transformation: *T(\boldsymbol{v})*. This refers to the vector v transformed by *T*. A transformation *T* is associated with a specific matrix. Since additivity and scalar multiplication must be preserved in linear transformation, you can write:
** T(\boldsymbol{v}+\boldsymbol{w}) = T(\boldsymbol{v}) + T(\boldsymbol{w}) **
and
** T(c\boldsymbol{v}) = cT(\boldsymbol{v}) **
Linear Transformations as Vectors and Matrices {#sec:ch06_section_linear_transformations_as_vectors_and_matrices}
In linear algebra, the information concerning a linear transformation can be represented as a matrix. Moreover, every linear transformation can be expressed as a matrix.
When you do the linear transformation associated with a matrix, we say that you apply the matrix to the vector. More concretely, it means that you calculate the matrix-vector product of the matrix and the vector. In this case, the matrix can sometimes be called a transformation matrix. For instance, you can apply a matrix A to a vector v with their product \boldsymbol{A} \boldsymbol{v}.
Applying matrices
Keep in mind that, to apply a matrix to a vector, you *left multiply* the vector by the matrix: the matrix is on the left to the vector.
When you multiply multiple matrices, the corresponding linear transformations are combined in the order from right to left.
For instance, let's say that a matrix A does a 45-degree clockwise rotation and a matrix B does a stretching, the product \boldsymbol{B} \boldsymbol{A} means that you first do the rotation and then the stretching.
This shows that the matrix product is:
- Not commutative (\boldsymbol{A}\boldsymbol{B} \neq \boldsymbol{B}\boldsymbol{A}): the stretching then the rotation is a different transformation than the rotation then the stretching.
- Associative (\boldsymbol{A}(\boldsymbol{B}\boldsymbol{C})) = ((\boldsymbol{A}\boldsymbol{B})\boldsymbol{C}): the same transformations associated with the matrices A, B and C are done in the same order.
A matrix-vector product can thus be considered as a way to transform a vector. You saw in Essential Math for Data Science that the shape of A and v must match for the product to be possible.
A good way to understand the relationship between matrices and linear transformations is to actually visualize these transformations. To do that, you'll use a grid of points in a two-dimensional space, each point corresponding to a vector (it is easier to visualize points instead of arrows pointing from the origin).
Let's start by creating the grid using the function meshgrid()
from
Numpy:
https://gist.github.com/0feb9e4f972af934c685a1ab26669b22
The meshgrid()
function allows you to create all combinations of
points from the arrays x
and y
. Let's plot the scatter plot
corresponding to xx
and yy
.
https://gist.github.com/82fc1256fc69517c3c26185cd27a12a1
Figure 1: Each point corresponds to the combination of x and y values.
You can see the grid in Figure 1. The color
corresponds to the addition of xx
and yy
values. This will make
transformations easier to visualize.
As a first example, let's visualize the transformation associated with the following two-dimensional square matrix.
** \boldsymbol{T} = \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix} **
Consider that each point of the grid is a vector defined by two coordinates (x and y).
Let's create the transformation matrix T:
https://gist.github.com/a72079c34927a6f13c929ad3e594b5bc
First, you need to structure the points of the grid to be able to apply
the matrix to each of them. For now, you have two 20 by 20 matrices
(xx
and yy
) corresponding to 20 \cdot 20 = 400 points, each having
a x value (matrix xx
) and a y value (yy
). Let's create a 2 by
400 matrix with xx
flatten as the first column and yy
as the second
column.
https://gist.github.com/f0e643cf76614edd06a0ff6b8728a644
(2, 400)
You have now 400 points, each with two coordinates. Let's apply the
transformation matrix T to the first two-dimensional point
(xy[:, 0]
), for instance:
https://gist.github.com/0d4349aa7332257cb3216b4594bcdf89
array([10, 10])
You can similarly apply T to each point by calculating its product with the matrix containing all points:
https://gist.github.com/f825ac9ce93fe1fe13c96e8dbf5ea1fa
(2, 400)
You can see that the shape is still (2, 400). Each transformed vector
(that is, each point of the grid) is one of the column of this new
matrix. Now, let's reshape this array to have two arrays with a similar
shape to xx
and yy
.
https://gist.github.com/5ac81ec2cabeac2b2794d4aed96a31d0
Let's plot the grid before and after the transformation:
https://gist.github.com/2e6cd78b6bf8c2e0d53919ff0156ab82
Figure 2: The grid of points before (left) and after (right) its transformation by the matrix T.
Figure 2 shows that the matrix T rotated the points of the grid.
In the previous example, the output vectors have the same number of dimensions than the input vectors (two dimensions).
You might notice that the shape of the transformation matrix must match the shape of the vectors you want to transform.
Figure 3: Shape of the transformation of the grid points by T.
Figure 3 illustrates the shapes of this example. The first matrix with a shape (2, 2) is the transformation matrix T and the second matrix with a shape (2, 400) corresponds to the 400 vectors stacked. As illustrated in blue, the number of rows of the T corresponds to the number of dimensions of the output vectors. As illustrated in red, the transformation matrix must have the same number of columns than the number of dimensions of the matrix you want to transform.
More generally, the size of the transformation matrix tells you the input and output dimensions. A m by n transformation matrix transforms n-dimensional vectors to m-dimensional vectors.
Let's now visualize the transformation associated with the following matrix:
** \boldsymbol{T} = \begin{bmatrix} 1.3 & -2.4 \\ 0.1 & 2 \end{bmatrix} **
Let's proceed as in the previous example:
https://gist.github.com/e6e3b16bcd36738faee07696cb67b88c
Figure 4: The grid of points before (left) and after (right) the transformation by the new matrix T.
Figure 4 shows that the transformation is different from the previous rotation. This time, there is a rotation, but also a stretching of the space.
Are these transformation linear?
You might wonder why these transformations are called "linear". You saw that a linear transformation implies that the properties of additivity and scalar multiplication are preserved.
Geometrically, there is linearity if the vectors lying on the same line in the input space are also on the same line in the output space, and if the origin remains at the same location.
Transforming the space with a matrix can be reversed if the matrix is invertible. In this case, the inverse \boldsymbol{T}^{-1} of the matrix T is associated to a transformation that takes back the space to the initial state after T has been applied.
Let's take again the example of the transformation associated with the following matrix:
** \boldsymbol{T} = \begin{bmatrix} 1.3 & -2.4 \\ 0.1 & 2 \end{bmatrix} **
You'll plot the initial grid of point, the grid after being transformed by T, and the grid after successive application of T and \boldsymbol{T}^{-1} (remember that matrices must be left-multiplied):
https://gist.github.com/6c5395f21ca2ec973f5729a5bb3f92ac
Figure 5: Inverse of a transformation: the initial space (left) is transformed with the matrix T (middle) and transformed back using \boldsymbol{T}^{-1} (right).
As you can see in Figure 5, the inverse \boldsymbol{T}^{-1} of the matrix T is associated with a transformation that reverses the one associated with T.
Mathematically, the transformation of a vector v by T is defined as:
** \boldsymbol{T} \boldsymbol{v} **
To transform it back, you multiply by the inverse of T:
** \boldsymbol{T}^{-1} \boldsymbol{T} \boldsymbol{v} **
Order of the matrix products
Note that the order of the products is from right to left. The vector on the right of the product is first transformed by T and then the result is transformed by *\boldsymbol{T}^{-1}*.
Since you saw in Essential Math for Data Science that \boldsymbol{T}^{-1} \boldsymbol{T} = \boldsymbol{I}, you have:
** \boldsymbol{T}^{-1} \boldsymbol{T} \boldsymbol{v} = \boldsymbol{I} \boldsymbol{v} = \boldsymbol{v} **
meaning that you get back the initial vector v.
The linear transformation associated with a singular matrix (that is a non invertible matrix, see more details in Essential Math for Data Science) can't be reversed. It can occur when there is a loss of information with the transformation. Take the following matrix:
** \boldsymbol{T} = \begin{bmatrix} 3 & 6 \\ 2 & 4 \end{bmatrix} **
Let's see how it transforms the space:
https://gist.github.com/dcd32dfd62b52285620cf5be8a9e643f
Figure 6: The initial space (left) is transformed into a line (right) with the matrix T. Multiple input vectors land on the same location in the output space.
You can see in Figure 6 that the transformed vectors are on a line. There are points that land on the same place after the transformation. Thus, it is not possible to go back. In this case, the matrix T is not invertible: it is singular.
undefined
The basis is a coordinate system used to describe vector spaces (sets of vectors). It is a reference that you use to associate numbers with geometric vectors. You'll for instance see in Essential Math for Data Science that the concept of basis is important to understand eigendecomposition.
To be considered as a basis, a set of vectors must:
- Be linearly independent.
- Span the space.
Every vectors in the space is a unique combination of the basis vectors. The dimension of a space is defined to be the size of a basis set. For instance, there are two basis vectors in ℝ^2 (corresponding to the x and y axis in the Cartesian plane), or three in ℝ^3.
As you saw in the last section, if the number of vectors in a set is larger than the dimensions of the space, they can't be linearly independent. If a set contains fewer vectors than number of dimensions, they can't span the whole space.
As you saw, vectors can be represented as arrows going from the origin to a point in space. The coordinates of this point can be stored in a list. The geometric representation of a vector in the Cartesian plane implies that we take a reference: the directions given by the two axes x and y.
Basis vectors are the vectors corresponding to this reference. In the Cartesian plane, the basis vectors are orthogonal unit vectors (length of one), generally denoted as i and j.
Figure 7: The basis vectors in the Cartesian plane.
For instance, in Figure 7, the basis vectors i and j point in the direction of the axis x and y respectively. These vectors give the standard basis. If you put these basis vectors into a matrix, you have the following identity matrix (see Essential Math for Data Science):
** \boldsymbol{I}_2 = \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} **
Thus, the columns of \boldsymbol{I}_2 span ℝ^2. In the same way, the columns of \boldsymbol{I}_3 span ℝ^3 and so on.
Orthogonal basis
Basis vectors can be orthogonal because orthogonal vectors are independent. However, the converse is not necessarily true: non orthogonal vectors can be linearly independent and thus form a basis (but not a standard basis).
The basis of your vector space is very important because the values of the coordinates corresponding to the vectors depends on this basis. By the way, you can choose different basis vectors, like in the ones in Figure 8 for instance.
Figure 8: Another set of basis vectors.
Keep in mind that vector coordinates depend on an implicit choice of basis vectors.
You can consider any vector in a vector space as a linear combination of the basis vectors.
For instance, take the following two-dimensional vector v:
** \boldsymbol{v} = \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} **
Figure 9: Components of the vector v.
The components of the vector v are the projections on the x-axis and on the y-axis (v_x and v_y, as illustrated in Figure 9). The vector v corresponds to the sum of its components: \boldsymbol{v} = v_x + v_y, and you can obtain these components by scaling the basis vectors: v_x = 2 \boldsymbol{i} and v_y = -0.5 \boldsymbol{j}. Thus, the vector v shown in Figure 9 can be considered as a linear combination of the two basis vectors i and j:
** \begin{aligned} \boldsymbol{v} &= 2\boldsymbol{i} - 0.5\boldsymbol{j} \\ &= 2\begin{bmatrix} 1 \\ 0 \end{bmatrix}
- 0.5\begin{bmatrix} 0 \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} 2 \cdot 1 \\ 2 \cdot 0 \end{bmatrix}
- \begin{bmatrix} 0.5 \cdot 0 \\ 0.5 \cdot 1 \end{bmatrix} \\ &= \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} \end{aligned} **
The columns of identity matrices are not the only case of linearly independent columns vectors. It is possible to find other sets of n vectors linearly independent in ℝ^n.
For instance, let's consider the following vectors in ℝ^2:
** \boldsymbol{v} = \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} **
and
** \boldsymbol{w} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} **
The vectors v and w are represented in Figure 10.
Figure 10: Another basis in a two-dimensional space.
From the definition above, the vectors v and w are a basis because they are linearly independent (you can't obtain one of them from combinations of the other) and they span the space (all the space can be reached from the linear combinations of these vectors).
It is critical to keep in mind that, when you use the components of vectors (for instance v_x and v_y, the x and y components of the vector v), the values are relative to the basis you chose. If you use another basis, these values will be different.
You'll see later that the ability to change the bases is fundamental in linear algebra and is key to understand eigendecomposition (Essential Math for Data Science) or Singular Value Decomposition (Essential Math for Data Science).
You saw that, to associate geometric vectors (arrows in the space) with coordinate vectors (arrays of numbers), you need a reference. This reference is the basis of your vector space. For this reason, a vector should always be defined with respect to a basis.
Let's take the following vector:
** \boldsymbol{v} = \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} **
The values of the x and y components are respectively 2 and -0.5. The standard basis is used when not specified.
You could write \boldsymbol{I} \boldsymbol{v} to specify that these numbers correspond to coordinates with respect to the standard basis. In this case I is called the change of basis matrix.
** \boldsymbol{v} = \boldsymbol{I}\boldsymbol{v} = \begin{bmatrix} 2 \\ -0.5 \end{bmatrix} **
You can define vectors with respect to another basis by using another matrix than I. You'll see more about change of basis in Essential Math for Data Science.