Skip to content

Instantly share code, notes, and snippets.

@NicolaBernini
Last active November 20, 2020 06:40
Show Gist options
  • Save NicolaBernini/9a97a885a76e8e1a52571a9533832235 to your computer and use it in GitHub Desktop.
Save NicolaBernini/9a97a885a76e8e1a52571a9533832235 to your computer and use it in GitHub Desktop.
Riemannian Geometry and Machine Learning - An Introduction

Riemannian Geometry and Machine Learning - An Introduction

A summary of the notes I have taken while studying the intersection between these 2 topics

NOTE: This is still work in progress but please follow if you are interested in the topic

@NicolaBernini
Copy link
Author

NicolaBernini commented Aug 2, 2020

Introduction to the connection between Riemannian Geometry and Information Theory (Information Geometry)

Let's take the perspective of Information Geometry to build a connection between a Probabilist Density Function and a Riemannian Manifold, the idea is to work on stat related aspects in the geometric domain so we have to build connections between the 2 domains

Classically, information geometry considered a parametrized statistical model as a Riemannian manifold.

The next step is to define the proper metric in the geometric domain, as in inner product

Let's recall that an inner product will give us

  • the notion of angles and hence of orthogonality
  • the notion of length and hence of distance

As in the PDF domain a commonly used similarity metric between PDFs is the Kullback–Leibler divergence what can be the related one in the geometric domain?

Actually, it is worth of observing the Fisher Information Matrix can be interpreted as the curvature of the relative entropy between a pair of PDFs which are extremely similar (for more details see here) so using this as the chosen Riemannian Metric for our Manifolds builds a solid connection with the Stat Side (and also extends KL divergence from a pseudo-distance to an actual distance)

image

@NicolaBernini
Copy link
Author

NicolaBernini commented Aug 2, 2020

Connection between KL Divergence and Fisher Information Matrix

This section is to be intended as a guide to read this more effectively

  • The connection between KL Divergence / Relative Entropy of a pair of PDF and the Fisher Information Matrix of a PDF becomes clear when we focus on the convergence of the 2 PDFs
  • Let's take a pair of PDFs $p(\cdot), q(\cdot)$ and let's assume they are part of the same family $f(\cdot, \theta)$ so they differ only in terms of their parameterization $\theta_{0}, \theta_{1}$
  • Let's consider the case when the 2 PDFs are very similar so we can express this formally as follows: $\theta_{0} - \theta_{1} \rightarrow 0$
  • In this case, let's take $\theta_{0}$ as a reference so $\theta_{1} \rightarrow \theta_{0}$ then the let's change the formalism a little bit $D_{KL, \theta_{0}}(\theta)$
    • NOTE: we can't just express this as a function $\Delta \theta = |\theta_{0} - \theta_{1}|$ as the KL Divergence is not symmetric
  • So $D_{KL, \theta_{0}}(\theta)$ is in in general a non linear function of $\theta$ but as we are interested in the limit of $\lim_{\theta \rightarrow \theta_{0}}$ then we can linearize it with a series expansion
    • NOTE: In this limit the KL Divergence is expected to become more and more symmetric, which makes the arbitrary choice of the reference frame less and less relevant
  • It can be shown the 2nd term of the KL Divergence expansion in this limit is equal to the Fisher Information Matrix of $f(\cdot, \theta)$ PDF
  • As a result of this, we can interpret the Fisher Information Matrix as the Hessian or Curvature of the KL Divergence / Relative Entropy of the 2 PDFs in the limit when they are very close to each other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment