Skip to content

Instantly share code, notes, and snippets.

@CarstenSchelp
Last active June 22, 2024 14:24
Show Gist options
  • Save CarstenSchelp/b992645537660bda692f218b562d0712 to your computer and use it in GitHub Desktop.
Save CarstenSchelp/b992645537660bda692f218b562d0712 to your computer and use it in GitHub Desktop.
A function to plot the confidence ellipse of the covariance of a 2D dataset. Uses matplotlib.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import matplotlib.transforms as transforms
def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
"""
Create a plot of the covariance confidence ellipse of `x` and `y`
See how and why this works: https://carstenschelp.github.io/2018/09/14/Plot_Confidence_Ellipse_001.html
This function has made it into the matplotlib examples collection:
https://matplotlib.org/devdocs/gallery/statistics/confidence_ellipse.html#sphx-glr-gallery-statistics-confidence-ellipse-py
Or, once matplotlib 3.1 has been released:
https://matplotlib.org/gallery/index.html#statistics
I update this gist according to the version there, because thanks to the matplotlib community
the code has improved quite a bit.
Parameters
----------
x, y : array_like, shape (n, )
Input data.
ax : matplotlib.axes.Axes
The axes object to draw the ellipse into.
n_std : float
The number of standard deviations to determine the ellipse's radiuses.
Returns
-------
matplotlib.patches.Ellipse
Other parameters
----------------
kwargs : `~matplotlib.patches.Patch` properties
"""
if x.size != y.size:
raise ValueError("x and y must be the same size")
cov = np.cov(x, y)
pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
# Using a special case to obtain the eigenvalues of this
# two-dimensionl dataset.
ell_radius_x = np.sqrt(1 + pearson)
ell_radius_y = np.sqrt(1 - pearson)
ellipse = Ellipse((0, 0),
width=ell_radius_x * 2,
height=ell_radius_y * 2,
facecolor=facecolor,
**kwargs)
# Calculating the stdandard deviation of x from
# the squareroot of the variance and multiplying
# with the given number of standard deviations.
scale_x = np.sqrt(cov[0, 0]) * n_std
mean_x = np.mean(x)
# calculating the stdandard deviation of y ...
scale_y = np.sqrt(cov[1, 1]) * n_std
mean_y = np.mean(y)
transf = transforms.Affine2D() \
.rotate_deg(45) \
.scale(scale_x, scale_y) \
.translate(mean_x, mean_y)
ellipse.set_transform(transf + ax.transData)
return ax.add_patch(ellipse)
# render plot with "plt.show()".
@CarstenSchelp
Copy link
Author

Hi artemis-2020,
Thank you for your question. Covariance is a measure of how datasets are linear dependent from each other.
So I doubt that there is anything useful to plot or see when making the axes logarithmic. Even if the ellipse would fit the datapoints. (And it would be quite distorted, then).
If you want to make the resulting shape fit your datapoints, you might want to experiment with randolf-scholz's method (see above). With that method you get actual points instead of a 'shape' and you could try and shift and squeeze them into place. But again, would the result be of any use, visually?
I like your idea to do logarithms on the input data. Then you can investigate on the linear dependency between those "logged" datasets.
The still linear axes would have to be tweaked a little, in order to look like proper "log" axes, though.
But that's where it ends, in my opinion. There is no use in trying to go back to linear, somehow.

After all, the purpose of the confidence ellipse is to give an easy insight in what's "in" and what's "out". When its clear shape and symmetry is gone, you don't see anything much, anymore.
I hope that I understood your question well, an I hope even more that I am making some sense, here ;-)
Is this any help to you?

Kind regards, Carsten

@artemis-2020
Copy link

Carsten, thanks for your reply.
I might try manually generating some shapes and squeezing them to fit.

@EmilioCMartinez
Copy link

how would this be called from a pandas df? can x,y arguments be from columns of a pandas df that are both type:float64 ?

I cannot seem to generate a plot correctly, thank you!

@CarstenSchelp
Copy link
Author

how would this be called from a pandas df? can x,y arguments be from columns of a pandas df that are both type:float64 ?

I cannot seem to generate a plot correctly, thank you!

Yes, pandas is wrapped all around numpy and it is easy to go back and forth, concerning data types.
For plotting, try this:

`import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import matplotlib.transforms as transforms

# def confidence_ellipse function, here ...

# create 2D-data

k = 100
data = np.random.normal(size=(2 * k))
data.shape = (2, k)

# create some interdependency

data[1] = (data[0] + data[1])/2

# now pandas:

df = pd.DataFrame(data=data.T, columns=["x", "y"])

# and plot:

fig, ax = plt.subplots(figsize=(10, 10))
confidence_ellipse(df.x, df.y, ax, edgecolor='darkgrey')
plt.xlim((-4, 4))
plt.ylim((-4, 4))
plt.show()`

@ilkot
Copy link

ilkot commented Feb 2, 2021

Hi, thanks for the implementation that helps a lot!
I wonder if there is any method to extract 'inside the ellipse' points?
I tried contains_point() method but it gives False continously even its center, I wonder why this is happening?

@CarstenSchelp
Copy link
Author

Hi, thanks for the implementation that helps a lot!
I wonder if there is any method to extract 'inside the ellipse' points?
I tried contains_point() method but it gives False continously even its center, I wonder why this is happening?

Hi ilkot,
Are you calling methods on the ellipse object that is returned? That is outside my matplotlib knowledge. When submitting this as an example to the matplotlib docs, I was told that it is a good idea to return the ellipse object so that a user cold still operate on it. I just followed that recommendation, without ever using that feature myself.
You are probably looking for the Mahalanobis distance. It is turn-key implemented in scipy, as far as I know.

@vic1309
Copy link

vic1309 commented Feb 7, 2021

Thank you for the implementation, @CarstenSchelp. I was wondering about an easy way to obtain the ellipses border. I would like to plot it in a map object, so I have to convert the original data to the map projection.

Cheers,
V

@CarstenSchelp
Copy link
Author

Thank you for the implementation, @CarstenSchelp. I was wondering about an easy way to obtain the ellipses border. I would like to plot it in a map object, so I have to convert the original data to the map projection.

Cheers,
V

Hi Vic,
Thanks for using it :-)
Are you thinking of creation data for the ellipse, like center x, y, radius a & b and angle of rotation so that you can plot another ellipse object in another framework (like QGis or ArcGis)? Or do you need a set of points on that ellipse?

@vic1309
Copy link

vic1309 commented Feb 8, 2021

Thank you for the quick reply, Carsten!

No, not in GIS frameworks, but rather in Basemap or Cartopy. I have a set of latitude and longitude data which I fitted the ellipse, but I would like to project it on a map now. I kind of solve it by firstly projecting my data into the map object (x, y = m(lon, lat)), then fitting the ellipse and plotting it in the same ax which the map is referenced to. Maybe not the most efficient way..

@CarstenSchelp
Copy link
Author

Thank you for the quick reply, Carsten!

No, not in GIS frameworks, but rather in Basemap or Cartopy. I have a set of latitude and longitude data which I fitted the ellipse, but I would like to project it on a map now. I kind of solve it by firstly projecting my data into the map object (x, y = m(lon, lat)), then fitting the ellipse and plotting it in the same ax which the map is referenced to. Maybe not the most efficient way..

Hi @vic1309
Sounds good to me, actually. Working with your framework and not against it.
Through the kwargs you can have the ellipse rendered in any way matplotlib allows.
Things tend to get kind of messy and hard to maintain when you do things like generating points and such.
So I think your approach is just fine.
Have you read the kwargs section at the end of this article?
https://matplotlib.org/gallery/statistics/confidence_ellipse.html#sphx-glr-gallery-statistics-confidence-ellipse-py

Or do I not understand your question fully?

@adiringenbach
Copy link

Hi Carsten,
Thank you for putting this script together, it's proven helpful for me. I am wondering if there is a (simple) way of getting a functional form of the ellipses (i.e. actual data points that if plotted would map out the trajectory of the ellipse). I am interested in extracting the direction of the major and minor axes and think if I had the data that outlines the ellipses, I could calculate these directions.
Thank you in advance,
Alvaro

Hi Alvaro,
It took a while for me to respond. My apologies. Looks like what you are looking for are the vertices of the ellipse. You can extract them using the code in this gist, but you can also just run the eigen-decomposition of the given covariance-matrix. The eigen_vectors_ give you the direction of the ellipses' axes and the corresponding eigen_values_ give you the square of the length along the eigenvector where the vertex is located. So the vertexes are not a big problem. However, an ellipse that is not aligned with the axes of your coordinate system is a major headache, because very few of those nice and simple ellipse-equations remain valid, in that case. The code in the gist leaves that problem to matplotlib's (great!) transform functionality. You might experiment with rotating the found vertices back and forth to find an arbitrary point on the ellipse. (Remember, if you normalize the covariance first the angle will always be 45 degrees, then rescaling will produce the right dimensions and the actual angle.) If you came to this gist directly without the corresponding article, you might want to check out the article. It explains a bit more: An Alternative Way to Plot the Covariance Ellipse
Kind regards, Carsten

Hi Carsten,
Thanks for the code in the first place. I have a question, similar to Alvaros, or to your supposed answer to it: I want to export the ellipse to a shape file. Therefore I wanted to get the vertices from the ellipse via the ellipse.get_verts() function and write it into a polygon shape file (via shapely). But when plot the vertices and compare them with the original data, the ellipse by far not congruent. As i write the vertices AFTER "matplotlib's (great!)" transformation, i hoped, to get all the verteces right. Do you see, a solution to my problem?

Thanks a lot, Adrian

`#x=np.random.normal(15, 5, 40)
x=[14.04658211, 16.15219176, 8.76999363, 7.77098702, 18.24551371,
23.21029844, 11.31760109, 17.88907604, 3.75276018, 13.56080038,
18.37588605, 11.1607527 , 15.28666164, 14.44561426, 16.78490506,
13.47636812, 16.05632359, 17.68437307, 9.61650059, 10.91866781,
14.81591751, 15.79535801, 17.96959439, 15.13262672, 15.93797137,
16.00695453, 9.52804537, 9.90941689, 24.40166238, 17.34095785,
12.39716438, 8.23243059, 12.55848409, 21.74485295, 5.98996563,
12.15852099, 14.41792028, 11.64349602, 14.74192928, 11.87618266]

#y=zeros(len(x))
#for i in range(0, len(x)):

y[i]=(3+np.random.normal(2,0.8,40)[0])*x[i]

y=[ 65.16719696, 84.66478243, 32.52928637, 42.29133109,
89.42267553, 88.93953844, 55.171863 , 89.85122657,
16.77884709, 66.26381159, 105.93126428, 48.24141126,
58.66830149, 95.62149249, 87.1008412 , 84.53770306,
78.87603231, 88.72455795, 49.59030118, 68.22530499,
81.36355107, 72.32934773, 95.94894464, 75.28114967,
79.98937141, 62.30106299, 51.73602342, 47.20370099,
137.33210697, 109.26329496, 80.72923974, 44.14131401,
56.47549555, 111.51795471, 32.1814089 , 50.24932316,
73.87107096, 52.32507308, 62.02412045, 59.8584618 ]

fig, (ax1) = plt.subplots(1, 1, figsize = (8,8))
ax1.scatter(x,y, color='r')
cov = np.cov(x, y)
pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])

Using a special case to obtain the eigenvalues of this

two-dimensionl dataset.

ell_radius_x = np.sqrt(1 + pearson)
ell_radius_y = np.sqrt(1 - pearson)
ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2, facecolor= None, ec='r'
)

Calculating the stdandard deviation of x from

the squareroot of the variance and multiplying

with the given number of standard deviations.

scale_x = np.sqrt(cov[0, 0]) * 2
mean_x = np.mean(x)

calculating the stdandard deviation of y ...

scale_y = np.sqrt(cov[1, 1]) * 2
mean_y = np.mean(y)

transf = transforms.Affine2D()
.rotate_deg(45)
.scale(scale_x, scale_y)
.translate(mean_x, mean_y)

ellipse.set_transform(transf + ax1.transData)
ax1.add_patch(ellipse)

vertices=ellipse.get_verts()

for i in range(0, len(vertices[:,0])):
ax1.scatter(vertices[i, 0],vertices[i, 1], color='k')`

@liamtoney
Copy link

Hi @CarstenSchelp, thanks for this easy-to-use and well-documented code. I came here from the Matplotlib example page, which contains the following statement:

The default value is 3 which makes the ellipse enclose 99.4% of the points if the data is normally distributed like in these examples (3 standard deviations in 1-D contain 99.7% of the data, which is 99.4% of the data in 2-D).

I believe the 2-D case here is incorrectly represented. From this paper's Eq. 19, for the 2-D case the confidence level should be 98.9%. The paper also includes a table with values. Looking at the conversation above, it looks like the dimensionality factor has been discussed to some degree, but it looks like the Matplotlib example text might need to be updated?

If you (and others) agree, I can make a PR over at Matplotlib. Cheers.

@CarstenSchelp
Copy link
Author

If you (and others) agree, I can make a PR over at Matplotlib. Cheers.

Hi @liamtoney
You are more than welcome to fix that on the Matplotlib page!
Actually, I made an effort to straighten out this flaw, end of last year.
But the review process turned out to be less supportive than I had experienced it earlier.
(Generally, the Matplotlib crew is very positive and enthusiast!)
The agreement that I remember was that the maintainers themselves would submit a minimal change that would correct the wrong statement.
But it seems like it didn't happen.
So if you independently make the same point and can point to an authoritative source, it should be no problem to submit a PR that passes smoothly.
Thanks a lot!
Regards, Carsten

@CarstenSchelp
Copy link
Author

Hi @adiringenbach,
Even this short answer took me a long time. Sorry about that.
I think that a lot of the confusion comes from the fact that I used the word "vertices" to notify only the extreme points of an ellipse.
Whereas in graphicsland "vertex" is just another word for point. More or less.
I have yet an alternative method for getting (the points of) a covariance ellipse.
The documentation is not as detailed as in this gist, though:

In this new gist I am sketching the principle.
I also explain how it can be combined with the code in this current gist and become a complete and usable function.
Hope this is still of any use to you or someone else.

@liamtoney
Copy link

Here's the merged PR with the correct confidence interval: matplotlib/matplotlib#21216

@jensweiser
Copy link

Hi Carsten,
thanks for implementing this! I am using your code in a script and I'd like to cite your work properly. AFAIK Github now offers a way to add a "cite" button to repos (?) as well as getting a DOI. Would you consider getting one set up for this code? Otherwise, I'd just cite the URL, but I always prefer somwthing official as it offers a more consistent solution...
Thanks,
Jens

@CarstenSchelp
Copy link
Author

Hi @jensweiser,
Thank you. DOI sounds interesting, anyway. I will see what I can do.

@CarstenSchelp
Copy link
Author

Hi @jensweiser,

That was quite easy.
Obviously, I cannot get a DOI for a gist, but the repo that feeds the github.io page has a release with a zenodo DOI, now.
The file you want to refer to would be _posts/2018-09-14-Plot_Confidence_Ellipse_001.md
Here's the data of the DOI:

Markdown:
[![DOI](https://zenodo.org/badge/138745361.svg)](https://zenodo.org/badge/latestdoi/138745361)

reStructedText:

.. image:: https://zenodo.org/badge/138745361.svg
   :target: https://zenodo.org/badge/latestdoi/138745361

or as HTML:
<a href="https://zenodo.org/badge/latestdoi/138745361"><img src="https://zenodo.org/badge/138745361.svg" alt="DOI"></a>

@jensweiser
Copy link

Hi @jensweiser,

That was quite easy. Obviously, I cannot get a DOI for a gist, but the repo that feeds the github.io page has a release with a zenodo DOI, now. The file you want to refer to would be _posts/2018-09-14-Plot_Confidence_Ellipse_001.md Here's the data of the DOI:

Markdown: [![DOI](https://zenodo.org/badge/138745361.svg)](https://zenodo.org/badge/latestdoi/138745361)

reStructedText:

.. image:: https://zenodo.org/badge/138745361.svg
   :target: https://zenodo.org/badge/latestdoi/138745361

or as HTML: <a href="https://zenodo.org/badge/latestdoi/138745361"><img src="https://zenodo.org/badge/138745361.svg" alt="DOI"></a>

....fantastic, thanks a lot!

@CarstenSchelp
Copy link
Author

CarstenSchelp commented Apr 8, 2022 via email

@csaquino2
Copy link

Do you think is it possible to apply a variant of this to 3D coordinates (xyz)?

@camillefrayssinhes
Copy link

Hi!
I tested the gist to get the values of the radii and it doesn't seem to give the good values. I spent a bit of time trying to do some debugging but I can't figure out the problem! Did anyone have the same problem?

@Antoine-DupuyUL
Copy link

Hello Carsten,

Firstly, thank you for sharing this script and for all the information concerning it.
I was wondering if it is possible to extract the angle of the ellipse (either in radians or degrees).
I have already the radii and the width (minor and major). It would be awesome if I was able to get the "tilt" / the angle of the ellipse!

Thanks a million.

Best regards,
Antoine Dupuy.

@Antoine-DupuyUL
Copy link

Hi! I tested the gist to get the values of the radii and it doesn't seem to give the good values. I spent a bit of time trying to do some debugging but I can't figure out the problem! Did anyone have the same problem?

If you are taking the "ell_radius_x" & "ell_radius_x" directly, you will find weird results. That is because they are not scaled to your data yet.
Try with this:
scaled_ell_x_radius = np.sqrt(cov[0, 0]) * n_std
scaled_ell_y_radius = np.sqrt(cov[1, 1]) * n_std

There should be more logical results!

@ktevans
Copy link

ktevans commented May 3, 2024

Hello Carsten,

Firstly, thank you for sharing this script and for all the information concerning it. I was wondering if it is possible to extract the angle of the ellipse (either in radians or degrees). I have already the radii and the width (minor and major). It would be awesome if I was able to get the "tilt" / the angle of the ellipse!

Thanks a million.

Best regards, Antoine Dupuy.

This is also exactly what I am trying to get! Still working on it, but if anyone has any insight, I'd appreciate it!

@CarstenSchelp
Copy link
Author

CarstenSchelp commented May 4, 2024

Do you think is it possible to apply a variant of this to 3D coordinates (xyz)?

Guess what I have been banging my head against, the last six years :D
It's tricky, to say the least. 2D is just such a nice special case.
Don't hold your breath, but stubborn as I am, I keep trying.
Which is bold, because I think that people with profoundly deeper math-chops than I have tried this, too.
Kind regards, Carsten

@CarstenSchelp
Copy link
Author

Hello Carsten,
Firstly, thank you for sharing this script and for all the information concerning it. I was wondering if it is possible to extract the angle of the ellipse (either in radians or degrees). I have already the radii and the width (minor and major). It would be awesome if I was able to get the "tilt" / the angle of the ellipse!
Thanks a million.
Best regards, Antoine Dupuy.

This is also exactly what I am trying to get! Still working on it, but if anyone has any insight, I'd appreciate it!

Hi @ktevans and hi @Antoine-DupuyUL,

(And sorry Antoine, your question from months ago must have gotten lost in the shuffle).
I am trying something from top of my head, here. I am not at home with not enough time.

In a normalized case, that is, when the standard deviations of both variables are the same, the angle of the ellipse will be

arctan(std1/std2) = arctan(1) = pi/4 = 45degrees.

In the article I also mention that any deviation from 45degrees only comes forth from different scaling of the variables.
So you might give it a shot to determine the angle by

arctan(std1/std2) = angle in rad

std1 and std2 would be the "original", non-normalized standard deviations.
This will alsways be an "upward" angle.
A seemingly "downward" angle that you see when the pearson coefficient is negative is only due to the "second" radius being longer than the "first" radius.
Does this work for you?

Kind regards, Carsten

@ktevans
Copy link

ktevans commented May 6, 2024

Hello Carsten,
Firstly, thank you for sharing this script and for all the information concerning it. I was wondering if it is possible to extract the angle of the ellipse (either in radians or degrees). I have already the radii and the width (minor and major). It would be awesome if I was able to get the "tilt" / the angle of the ellipse!
Thanks a million.
Best regards, Antoine Dupuy.

This is also exactly what I am trying to get! Still working on it, but if anyone has any insight, I'd appreciate it!

Hi @ktevans and hi @Antoine-DupuyUL,

(And sorry Antoine, your question from months ago must have gotten lost in the shuffle). I am trying something from top of my head, here. I am not at home with not enough time.

In a normalized case, that is, when the standard deviations of both variables are the same, the angle of the ellipse will be

arctan(std1/std2) = arctan(1) = pi/4 = 45degrees.

In the article I also mention that any deviation from 45degrees only comes forth from different scaling of the variables. So you might give it a shot to determine the angle by

arctan(std1/std2) = angle in rad

std1 and std2 would be the "original", non-normalized standard deviations. This will alsways be an "upward" angle. A seemingly "downward" angle that you see when the pearson coefficient is negative is only due to the "second" radius being longer than the "first" radius. Does this work for you?

Kind regards, Carsten

Hi Carsten,

Thank you for this code. It has been of great help to me! I was able to get the eccentricity of the ellipse using this method!

I defined a simple correlation variable via the following:

corr = 0
if pearson > 0:
  corr = 1
if pearson < 0:
  corr = -1

Then, the eccentricity is just:
ecc = corr * (np.std(y) / np.std(x))

I got the following results where the slope in the legend is the calculated eccentricity:

Screen Shot 2024-05-06 at 12 50 44 PM Screen Shot 2024-05-06 at 12 51 00 PM

Thank you again! This was exactly what I needed!

@CarstenSchelp
Copy link
Author

[...] Thank you again! This was exactly what I needed!

Hi @ktevans, This is great. You turned it into your own thing.
You didn't even need the arctan(), after all. The slope was all you needed.
With the sign fixed, then.
I am pleased to see that this was useful.
Kind regards, Carsten

@Antoine-DupuyUL
Copy link

Antoine-DupuyUL commented May 10, 2024

Hello Carsten,
Firstly, thank you for sharing this script and for all the information concerning it. I was wondering if it is possible to extract the angle of the ellipse (either in radians or degrees). I have already the radii and the width (minor and major). It would be awesome if I was able to get the "tilt" / the angle of the ellipse!
Thanks a million.
Best regards, Antoine Dupuy.

This is also exactly what I am trying to get! Still working on it, but if anyone has any insight, I'd appreciate it!

Hi @ktevans and hi @Antoine-DupuyUL,

(And sorry Antoine, your question from months ago must have gotten lost in the shuffle). I am trying something from top of my head, here. I am not at home with not enough time.

In a normalized case, that is, when the standard deviations of both variables are the same, the angle of the ellipse will be

arctan(std1/std2) = arctan(1) = pi/4 = 45degrees.

In the article I also mention that any deviation from 45degrees only comes forth from different scaling of the variables. So you might give it a shot to determine the angle by

arctan(std1/std2) = angle in rad

std1 and std2 would be the "original", non-normalized standard deviations. This will alsways be an "upward" angle. A seemingly "downward" angle that you see when the pearson coefficient is negative is only due to the "second" radius being longer than the "first" radius. Does this work for you?

Kind regards, Carsten

Hello Carsten,

Thanks a million for your help and all the explanation!
It works perfectly 🙏

Again, thank you for making this code available.

Best regards,
Antoine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment