hadrienj/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
  bg
  layout
  mathjax
  categories
  tags
  author
  date
  excerpt
  excerpt-image
  twitterImg
  title
  crawlertitle
  essential-math-sample
  
  
  Essential-Math-for-Data-Science-Update/bridge.jpg
  post
  true
  posts
  
  
  essential-math
  python
  numpy
  
  
  hadrienj
  2020-12-28
  
  <img src="../../assets/images/ch08_linear_equations/ch08_linear_equations_15_0.png" width=200><em>Each point corresponds to the combination of x and y values.</em>
  Essential-Math-for-Data-Science-Update/output_ch06_139_0
  
  
  true
  
  
https://gist.github.com/2f6855d657fe29c5a229714401032be9
**
\def\text{Var}{{\text{Var}}} % Variance
\def\text{Corr}{{\text{Corr}}} % Correlation
\def\text{Cov}{{\text{Cov}}} % Covariance
\DeclareMathOperator{\mathbb{E}}{\mathbb{E}} % expected value\n
\newcommand\norm[1]{\lVert#1\rVert} % norm
\defℝ{{ℝ}} % Sets
\def\textrm{X}{{\textrm{X}}} % Scalar random variables
\def\textrm{Y}{{\textrm{Y}}}
\def\textrm{Z}{{\textrm{Z}}}
\def\textbf{X}{{\textbf{X}}} % Vector random variables
\def\textbf{Y}{{\textbf{Y}}}
\def\textbf{Z}{{\textbf{Z}}}
\def\boldsymbol{\theta}{{\boldsymbol{\theta}}} % Vectors
\def\boldsymbol{a}{{\boldsymbol{a}}}
\def\boldsymbol{b}{{\boldsymbol{b}}}
\def\boldsymbol{i}{{\boldsymbol{i}}}
\def\boldsymbol{j}{{\boldsymbol{j}}}
\def\boldsymbol{p}{{\boldsymbol{p}}}
\def\boldsymbol{q}{{\boldsymbol{q}}}
\def\boldsymbol{u}{{\boldsymbol{u}}}
\def\boldsymbol{v}{{\boldsymbol{v}}}
\def\boldsymbol{w}{{\boldsymbol{w}}}
\def\boldsymbol{x}{{\boldsymbol{x}}}
\def\boldsymbol{y}{{\boldsymbol{y}}}
\def\boldsymbol{z}{{\boldsymbol{z}}}
\defu{{u}} % Elements of vectors
\defv{{v}}
\defw{{w}}
\defx{{x}}
\defy{{y}}
\defz{{z}}
\def\boldsymbol{A}{{\boldsymbol{A}}} % Matrices
\def\boldsymbol{B}{{\boldsymbol{B}}}
\def\boldsymbol{C}{{\boldsymbol{C}}}
\def\boldsymbol{D}{{\boldsymbol{D}}}
\def\boldsymbol{I}{{\boldsymbol{I}}}
\def\boldsymbol{Q}{{\boldsymbol{Q}}}
\def\boldsymbol{S}{{\boldsymbol{S}}}
\def\boldsymbol{T}{{\boldsymbol{T}}}
\def\boldsymbol{U}{{\boldsymbol{U}}}
\def\boldsymbol{V}{{\boldsymbol{V}}}
\def\boldsymbol{W}{{\boldsymbol{W}}}
\def\boldsymbol{X}{{\boldsymbol{X}}}
\def\boldsymbol{\Lambda}{{\boldsymbol{\Lambda}}}
\def\boldsymbol{\Sigma}{{\boldsymbol{\Sigma}}}
\defA{{A}} % Elements of matrices
\defB{{B}}
\defX{{X}}
\defT{{T}} % Transformations
**
Span, Linear Dependency, and Space Transformation

As you saw in Essential Math for Data Science and Essential Math for Data Science, being able to manipulate vectors and matrices
is critical to create machine learning and deep learning pipelines, for
instance for reshaping your raw data before using it with machine
learning libraries.
The goal of this chapter is to get you to the next level of
understanding of vectors and matrices. You'll start seeing matrices, not
only as operations on numbers, but also as a way to transform vector
spaces. This conception will give you the foundations needed to
understand more complex linear algebra concepts like matrix
decomposition. You'll build up on what you learned about vector addition
and scalar multiplication to understand linear combinations of vectors.
You will also see subspace, span, linear dependency, that are major
concepts of linear algebra used in machine learning and data science.
Linear Transformations {#sec:ch06_section_linear_transformations}

Intuition

A linear transformation (or simply transformation, sometimes called
linear map) is a mapping between two vector spaces: it takes a vector
as input and transforms it into a new output vector. A function is
said to be linear if the properties of additivity and scalar
multiplication are preserved, that is, the same result is obtained if
these operations are done before or after the transformation. Linear
functions are synonymously called linear transformations.

Linear transformations notation

You can encounter the following notation to describe a linear
transformation: *T(\boldsymbol{v})*. This refers to the vector v transformed by
*T*. A transformation *T* is associated with a specific matrix. Since
additivity and scalar multiplication must be preserved in linear
transformation, you can write:
**
T(\boldsymbol{v}+\boldsymbol{w}) = T(\boldsymbol{v}) + T(\boldsymbol{w})
**
and
**
T(c\boldsymbol{v}) = cT(\boldsymbol{v})
**

Linear Transformations as Vectors and Matrices {#sec:ch06_section_linear_transformations_as_vectors_and_matrices}

In linear algebra, the information concerning a linear transformation
can be represented as a matrix. Moreover, every linear transformation
can be expressed as a matrix.
When you do the linear transformation associated with a matrix, we say
that you apply the matrix to the vector. More concretely, it means
that you calculate the matrix-vector product of the matrix and the
vector. In this case, the matrix can sometimes be called a
transformation matrix. For instance, you can apply a matrix A to a
vector v with their product \boldsymbol{A} \boldsymbol{v}.

Applying matrices

Keep in mind that, to apply a matrix to a vector, you *left multiply*
the vector by the matrix: the matrix is on the left to the
vector.
When you multiply multiple matrices, the corresponding linear
transformations are combined in the order from right to
left.
For instance, let's say that a matrix A does a 45-degree clockwise
rotation and a matrix B does a stretching, the product \boldsymbol{B} \boldsymbol{A}
means that you first do the rotation and then the
stretching.
This shows that the matrix product is:

Not commutative (\boldsymbol{A}\boldsymbol{B} \neq \boldsymbol{B}\boldsymbol{A}): the stretching then the
rotation is a different transformation than the rotation then the
stretching.
Associative (\boldsymbol{A}(\boldsymbol{B}\boldsymbol{C})) = ((\boldsymbol{A}\boldsymbol{B})\boldsymbol{C}): the same
transformations associated with the matrices A, B and C
are done in the same order.


A matrix-vector product can thus be considered as a way to transform a
vector. You saw in Essential Math for Data Science that the shape of A and v must match for
the product to be possible.
Geometric Interpretation {#sec:ch06_section_geometric_interpretation}

A good way to understand the relationship between matrices and linear
transformations is to actually visualize these transformations. To do
that, you'll use a grid of points in a two-dimensional space, each point
corresponding to a vector (it is easier to visualize points instead of
arrows pointing from the origin).
Let's start by creating the grid using the function meshgrid() from
Numpy:
https://gist.github.com/0feb9e4f972af934c685a1ab26669b22
The meshgrid() function allows you to create all combinations of
points from the arrays x and y. Let's plot the scatter plot
corresponding to xx and yy.
https://gist.github.com/82fc1256fc69517c3c26185cd27a12a1

Figure 1: Each point corresponds to the combination of x and y
values.
You can see the grid in Figure 1. The color
corresponds to the addition of xx and yy values. This will make
transformations easier to visualize.
The Linear Transformation associated with a Matrix

As a first example, let's visualize the transformation associated with
the following two-dimensional square matrix.
**
\boldsymbol{T} = \begin{bmatrix}
-1 & 0 \\
0 & -1
\end{bmatrix}
**
Consider that each point of the grid is a vector defined by two
coordinates (x and y).
Let's create the transformation matrix T:
https://gist.github.com/a72079c34927a6f13c929ad3e594b5bc
First, you need to structure the points of the grid to be able to apply
the matrix to each of them. For now, you have two 20 by 20 matrices
(xx and yy) corresponding to 20 \cdot 20 = 400 points, each having
a x value (matrix xx) and a y value (yy). Let's create a 2 by
400 matrix with xx flatten as the first column and yy as the second
column.
https://gist.github.com/f0e643cf76614edd06a0ff6b8728a644
(2, 400)

You have now 400 points, each with two coordinates. Let's apply the
transformation matrix T to the first two-dimensional point
(xy[:, 0]), for instance:
https://gist.github.com/0d4349aa7332257cb3216b4594bcdf89
array([10, 10])

You can similarly apply T to each point by calculating its product
with the matrix containing all points:
https://gist.github.com/f825ac9ce93fe1fe13c96e8dbf5ea1fa
(2, 400)

You can see that the shape is still (2, 400). Each transformed vector
(that is, each point of the grid) is one of the column of this new
matrix. Now, let's reshape this array to have two arrays with a similar
shape to xx and yy.
https://gist.github.com/5ac81ec2cabeac2b2794d4aed96a31d0
Let's plot the grid before and after the transformation:
https://gist.github.com/2e6cd78b6bf8c2e0d53919ff0156ab82

Figure 2: The grid of points before (left) and after (right) its
transformation by the matrix
T.
Figure 2 shows that the matrix T
rotated the points of the grid.
Shapes of the Input and Output Vectors {#sec:ch08_section_shapes_of_the_input_and_output_vectors}

In the previous example, the output vectors have the same number of
dimensions than the input vectors (two dimensions).
You might notice that the shape of the transformation matrix must match
the shape of the vectors you want to transform.

Figure 3: Shape of the transformation of the grid points by
T.
Figure 3 illustrates the
shapes of this example. The first matrix with a shape (2, 2) is the
transformation matrix T and the second matrix with a shape (2, 400)
corresponds to the 400 vectors stacked. As illustrated in blue, the
number of rows of the T corresponds to the number of dimensions of
the output vectors. As illustrated in red, the transformation matrix
must have the same number of columns than the number of dimensions of
the matrix you want to transform.
More generally, the size of the transformation matrix tells you the
input and output dimensions. A m by n transformation matrix
transforms n-dimensional vectors to m-dimensional vectors.
Stretching and Rotation

Let's now visualize the transformation associated with the following
matrix:
**
\boldsymbol{T} = \begin{bmatrix}
1.3 & -2.4 \\
0.1 & 2
\end{bmatrix}
**
Let's proceed as in the previous example:
https://gist.github.com/e6e3b16bcd36738faee07696cb67b88c

Figure 4: The grid of points before (left) and after (right) the
transformation by the new matrix
T.
Figure 4 shows that the
transformation is different from the previous rotation. This time, there
is a rotation, but also a stretching of the space.

Are these transformation linear?

You might wonder why these transformations are called "linear". You saw
that a linear transformation implies that the properties of additivity
and scalar multiplication are preserved.
Geometrically, there is linearity if the vectors lying on the same line
in the input space are also on the same line in the output space, and if
the origin remains at the same location.

Special Cases

Inverse Matrices

Transforming the space with a matrix can be reversed if the matrix is
invertible. In this case, the inverse \boldsymbol{T}^{-1} of the matrix T is
associated to a transformation that takes back the space to the initial
state after T has been applied.
Let's take again the example of the transformation associated with the
following matrix:
**
\boldsymbol{T} = \begin{bmatrix}
1.3 & -2.4 \\
0.1 & 2
\end{bmatrix}
**
You'll plot the initial grid of point, the grid after being transformed
by T, and the grid after successive application of T and
\boldsymbol{T}^{-1} (remember that matrices must be left-multiplied):
https://gist.github.com/6c5395f21ca2ec973f5729a5bb3f92ac

Figure 5: Inverse of a transformation: the initial space (left) is
transformed with the matrix T (middle) and transformed back using
\boldsymbol{T}^{-1}
(right).
As you can see in Figure 5,
the inverse \boldsymbol{T}^{-1} of the matrix T is associated with a
transformation that reverses the one associated with T.
Mathematically, the transformation of a vector v by T is defined
as:
**
\boldsymbol{T} \boldsymbol{v}
**
To transform it back, you multiply by the inverse of T:
**
\boldsymbol{T}^{-1}  \boldsymbol{T} \boldsymbol{v}
**

Order of the matrix products

Note that the order of the products is from right to left. The vector on
the right of the product is first transformed by T and then the
result is transformed by *\boldsymbol{T}^{-1}*.

Since you saw in Essential Math for Data Science that \boldsymbol{T}^{-1} \boldsymbol{T} = \boldsymbol{I}, you have:
**
\boldsymbol{T}^{-1}  \boldsymbol{T} \boldsymbol{v} = \boldsymbol{I} \boldsymbol{v} = \boldsymbol{v}
**
meaning that you get back the initial vector v.
Non Invertible Matrices

The linear transformation associated with a singular matrix (that is a
non invertible matrix, see more details in Essential Math for Data Science) can't be reversed. It can
occur when there is a loss of information with the transformation. Take
the following matrix:
**
\boldsymbol{T} = \begin{bmatrix}
3 & 6 \\
2 & 4
\end{bmatrix}
**
Let's see how it transforms the space:
https://gist.github.com/dcd32dfd62b52285620cf5be8a9e643f

Figure 6: The initial space (left) is transformed into a line (right)
with the matrix T. Multiple input vectors land on the same location
in the output
space.
You can see in Figure 6 that
the transformed vectors are on a line. There are points that land on the
same place after the transformation. Thus, it is not possible to go
back. In this case, the matrix T is not invertible: it is singular.
undefined
Basis

Definitions

The basis is a coordinate system used to describe vector spaces (sets
of vectors). It is a reference that you use to associate numbers with
geometric vectors. You'll for instance see in Essential Math for Data Science that the concept of
basis is important to understand eigendecomposition.
To be considered as a basis, a set of vectors must:

Be linearly independent.
Span the space.

Every vectors in the space is a unique combination of the basis vectors.
The dimension of a space is defined to be the size of a basis set. For
instance, there are two basis vectors in ℝ^2 (corresponding to the
x and y axis in the Cartesian plane), or three in ℝ^3.
As you saw in the last section, if the number of vectors in a set is
larger than the dimensions of the space, they can't be linearly
independent. If a set contains fewer vectors than number of dimensions,
they can't span the whole space.
As you saw, vectors can be represented as arrows going from the origin
to a point in space. The coordinates of this point can be stored in a
list. The geometric representation of a vector in the Cartesian plane
implies that we take a reference: the directions given by the two axes
x and y.
Basis vectors are the vectors corresponding to this reference. In the
Cartesian plane, the basis vectors are orthogonal unit vectors (length
of one), generally denoted as i and j.

Figure 7: The basis vectors in the Cartesian
plane.
For instance, in Figure 7, the basis vectors
i and j point in the direction of the axis x and y
respectively. These vectors give the standard basis. If you put these
basis vectors into a matrix, you have the following identity matrix (see
Essential Math for Data Science):
**
\boldsymbol{I}_2 = \begin{bmatrix}
1 & 0\\
0 & 1
\end{bmatrix}
**
Thus, the columns of \boldsymbol{I}_2 span ℝ^2. In the same way, the
columns of \boldsymbol{I}_3 span ℝ^3 and so on.

Orthogonal basis

Basis vectors can be orthogonal because orthogonal vectors are
independent. However, the converse is not necessarily true: non
orthogonal vectors can be linearly independent and thus form a basis
(but not a standard basis).

The basis of your vector space is very important because the values of
the coordinates corresponding to the vectors depends on this basis. By
the way, you can choose different basis vectors, like in the ones in
Figure 8 for instance.

Figure 8: Another set of basis
vectors.
Keep in mind that vector coordinates depend on an implicit choice of
basis vectors.
Linear Combination of Basis Vectors

You can consider any vector in a vector space as a linear combination of
the basis vectors.
For instance, take the following two-dimensional vector v:
**
\boldsymbol{v} = \begin{bmatrix}
2 \\
-0.5
\end{bmatrix}
**

Figure 9: Components of the vector
v.
The components of the vector v are the projections on the x-axis
and on the y-axis (v_x and v_y, as illustrated in Figure
9). The vector v
corresponds to the sum of its components: \boldsymbol{v} = v_x + v_y, and
you can obtain these components by scaling the basis vectors:
v_x = 2 \boldsymbol{i} and v_y = -0.5 \boldsymbol{j}. Thus, the vector v shown
in Figure 9 can be considered
as a linear combination of the two basis vectors i and j:
**
\begin{aligned}
\boldsymbol{v} &= 2\boldsymbol{i} - 0.5\boldsymbol{j} \\
&= 2\begin{bmatrix}
1 \\
0
\end{bmatrix}

0.5\begin{bmatrix}
0 \\
1
\end{bmatrix} \\
&= \begin{bmatrix}
2 \cdot 1 \\
2 \cdot 0
\end{bmatrix}
\begin{bmatrix}
0.5 \cdot 0 \\
0.5 \cdot 1
\end{bmatrix} \\
&= \begin{bmatrix}
2 \\
-0.5
\end{bmatrix}
\end{aligned}
**

Other Bases

The columns of identity matrices are not the only case of linearly
independent columns vectors. It is possible to find other sets of n
vectors linearly independent in ℝ^n.
For instance, let's consider the following vectors in ℝ^2:
**
\boldsymbol{v} = \begin{bmatrix}
2 \\
-0.5
\end{bmatrix}
**
and
**
\boldsymbol{w} = \begin{bmatrix}
1 \\
1
\end{bmatrix}
**
The vectors v and w are represented in Figure
10.

Figure 10: Another basis in a two-dimensional
space.
From the definition above, the vectors v and w are a basis
because they are linearly independent (you can't obtain one of them from
combinations of the other) and they span the space (all the space can be
reached from the linear combinations of these vectors).
It is critical to keep in mind that, when you use the components of
vectors (for instance v_x and v_y, the x and y components
of the vector v), the values are relative to the basis you chose. If
you use another basis, these values will be different.
You'll see later that the ability to change the bases is fundamental in
linear algebra and is key to understand eigendecomposition (Essential Math for Data Science) or
Singular Value Decomposition (Essential Math for Data Science).
Vectors Are Defined With Respect to a Basis

You saw that, to associate geometric vectors (arrows in the space) with
coordinate vectors (arrays of numbers), you need a reference. This
reference is the basis of your vector space. For this reason, a vector
should always be defined with respect to a basis.
Let's take the following vector:
**
\boldsymbol{v} = \begin{bmatrix}
2 \\
-0.5
\end{bmatrix}
**
The values of the x and y components are respectively 2 and -0.5.
The standard basis is used when not specified.
You could write \boldsymbol{I} \boldsymbol{v} to specify that these numbers correspond to
coordinates with respect to the standard basis. In this case I is
called the change of basis matrix.
**
\boldsymbol{v} = \boldsymbol{I}\boldsymbol{v} = \begin{bmatrix}
2 \\
-0.5
\end{bmatrix}
**
You can define vectors with respect to another basis by using another
matrix than I. You'll see more about change of basis in Essential Math for Data Science.
...