Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
A function to plot the confidence ellipse of the covariance of a 2D dataset. Uses matplotlib.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import matplotlib.transforms as transforms
def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
"""
Create a plot of the covariance confidence ellipse of `x` and `y`
See how and why this works: https://carstenschelp.github.io/2018/09/14/Plot_Confidence_Ellipse_001.html
This function has made it into the matplotlib examples collection:
https://matplotlib.org/devdocs/gallery/statistics/confidence_ellipse.html#sphx-glr-gallery-statistics-confidence-ellipse-py
Or, once matplotlib 3.1 has been released:
https://matplotlib.org/gallery/index.html#statistics
I update this gist according to the version there, because thanks to the matplotlib community
the code has improved quite a bit.
Parameters
----------
x, y : array_like, shape (n, )
Input data.
ax : matplotlib.axes.Axes
The axes object to draw the ellipse into.
n_std : float
The number of standard deviations to determine the ellipse's radiuses.
Returns
-------
matplotlib.patches.Ellipse
Other parameters
----------------
kwargs : `~matplotlib.patches.Patch` properties
"""
if x.size != y.size:
raise ValueError("x and y must be the same size")
cov = np.cov(x, y)
pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
# Using a special case to obtain the eigenvalues of this
# two-dimensionl dataset.
ell_radius_x = np.sqrt(1 + pearson)
ell_radius_y = np.sqrt(1 - pearson)
ellipse = Ellipse((0, 0),
width=ell_radius_x * 2,
height=ell_radius_y * 2,
facecolor=facecolor,
**kwargs)
# Calculating the stdandard deviation of x from
# the squareroot of the variance and multiplying
# with the given number of standard deviations.
scale_x = np.sqrt(cov[0, 0]) * n_std
mean_x = np.mean(x)
# calculating the stdandard deviation of y ...
scale_y = np.sqrt(cov[1, 1]) * n_std
mean_y = np.mean(y)
transf = transforms.Affine2D() \
.rotate_deg(45) \
.scale(scale_x, scale_y) \
.translate(mean_x, mean_y)
ellipse.set_transform(transf + ax.transData)
return ax.add_patch(ellipse)
# render plot with "plt.show()".
@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Sep 14, 2018

I am explaining why this works in this blogpost.
Please feel free to give feedback, here!

@colseanm

This comment has been minimized.

Copy link

@colseanm colseanm commented Sep 3, 2019

Hello, Carsten,

Thanks for posting this creative and simple code.

When generating a confidence ellipse with n_std=3.0 for a normally distributed x of length 1000 and a normally distributed y of length 1000, I should expect that on average only 3 (x,y)-points fall outside the ellipse (99.7% of the points should fall inside the ellipse).
However, when running confidence_ellipse() with the above parameters, I notice by visual inspection that on average more than 3 points fall outside the ellipse.

You had also mentioned the work of Vincent Spruyt (https://www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix). When implementing his algorithm, I notice by visual and numerical inspection that the confidence ellipse produced by his algorithm is always larger than the ellipse provided by the algorithm here. Furthermore, I noticed that the ellipse generated by his algorithm seems on average to exclude 3 points as expected when using the parameters above.

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Sep 4, 2019

Hi colseanm,
Thank you for sharing your findings. I will look into this. Of course there should not be a systematic difference between two correct algorithms. Otherwise, at least one of them has a problem ;-)

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Sep 5, 2019

Hello, Carsten,

Thanks for posting this creative and simple code.

When generating a confidence ellipse with n_std=3.0 for a normally distributed x of length 1000 and a normally distributed y of length 1000, I should expect that on average only 3 (x,y)-points fall outside the ellipse (99.7% of the points should fall inside the ellipse).
However, when running confidence_ellipse() with the above parameters, I notice by visual inspection that on average more than 3 points fall outside the ellipse.

You had also mentioned the work of Vincent Spruyt (https://www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix). When implementing his algorithm, I notice by visual and numerical inspection that the confidence ellipse produced by his algorithm is always larger than the ellipse provided by the algorithm here. Furthermore, I noticed that the ellipse generated by his algorithm seems on average to exclude 3 points as expected when using the parameters above.

Hello colseanm,

I think I know what's going on: to check, I reintroduced the "n_std rectangle" into the code and plotted both ellipse and rectangle.
The ellipse drawn with the above function fits exactly into the rectangle. But this we already knew.
At the bottom of this post I give the code for that rectangle function. (Do not forget to import "Rectangle", next to "Ellipse".)

Looking at rectangle and ellipse again kind of gave me the answer: the expectancy that 3 of 1000 observations should lie outside a range of 2 * 3 standard deviations is valid for one dimension by itself. It does not work for an entire covariant dataset.
If you plot a few datasets with ellipse and rectangle you will notice that only very few observations are actually outside the rectangle while the ellipse follows the bulk of the observations more closely and produces more "outliers".
The rectangle represents the "each dimension for itself" interpretation whereas the covariance ellipse is, well, covariant.

Intuitively, with a lot of datasets tested, this should average out to 3 observations per dimension outside of the rectangle.
Does this make sense to you?

It would be interesting to do the same test with the code given on the visiondummy site. Maybe it is already a challenge to produce a rectangle that will fit the ellipse ... ?

I hope that this answers your issue, sufficiently.
Either way, thank you for really taking a close look, here!
I really appreciate it.

def n_std_rectangle(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
    mean_x = x.mean()
    mean_y = y.mean()
    std_x = x.std()
    std_y = y.std()
    half_width = n_std * std_x
    half_height = n_std * std_y
    rectangle = Rectangle(
        (mean_x - half_width, mean_y - half_height),
        2 * half_width, 2 * half_height, facecolor=facecolor, **kwargs)
    return ax.add_patch(rectangle)

@simbaforrest

This comment has been minimized.

Copy link

@simbaforrest simbaforrest commented Apr 3, 2020

Hello, Carsten,

Thanks for posting this creative and simple code.

When generating a confidence ellipse with n_std=3.0 for a normally distributed x of length 1000 and a normally distributed y of length 1000, I should expect that on average only 3 (x,y)-points fall outside the ellipse (99.7% of the points should fall inside the ellipse).
However, when running confidence_ellipse() with the above parameters, I notice by visual inspection that on average more than 3 points fall outside the ellipse.

You had also mentioned the work of Vincent Spruyt (https://www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix). When implementing his algorithm, I notice by visual and numerical inspection that the confidence ellipse produced by his algorithm is always larger than the ellipse provided by the algorithm here. Furthermore, I noticed that the ellipse generated by his algorithm seems on average to exclude 3 points as expected when using the parameters above.

Thanks @CarstenSchelp for the code, first of all.
I think @colseanm's question can be answered by this post here: https://math.stackexchange.com/a/143445

Basically, for 2D Gaussian, the probability mass covered by the 3-sigma interval is less than 99.7%. From this table we can see that sqrt(9.21)-sigma can cover only 99%, so we need some number larger than sqrt(9.21) to reach 99.7%.

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Apr 3, 2020

Hello, Carsten,
Thanks for posting this creative and simple code.
When generating a confidence ellipse with n_std=3.0 for a normally distributed x of length 1000 and a normally distributed y of length 1000, I should expect that on average only 3 (x,y)-points fall outside the ellipse (99.7% of the points should fall inside the ellipse).
However, when running confidence_ellipse() with the above parameters, I notice by visual inspection that on average more than 3 points fall outside the ellipse.
You had also mentioned the work of Vincent Spruyt (https://www.visiondummy.com/2014/04/draw-error-ellipse-representing-covariance-matrix). When implementing his algorithm, I notice by visual and numerical inspection that the confidence ellipse produced by his algorithm is always larger than the ellipse provided by the algorithm here. Furthermore, I noticed that the ellipse generated by his algorithm seems on average to exclude 3 points as expected when using the parameters above.

Thanks @CarstenSchelp for the code, first of all.
I think @colseanm's question can be answered by this post here: https://math.stackexchange.com/a/143445

Basically, for 2D Gaussian, the probability mass covered by the 3-sigma interval is less than 99.7%. From this table we can see that sqrt(9.21)-sigma can cover only 99%, so we need some number larger than sqrt(9.21) to reach 99.7%.

Hi Simbaforest,
That's a nice link that really adds to the issue. Thank you. I am happy to read that people agree on that the 68-95-99.7 rule does not apply to n dimensions just like that. And in fact, the thought "How likely is it that a sample is within 3 sigma for all dimensions" can lead us to a valid estimation of that probability.

@matomatical

This comment has been minimized.

Copy link

@matomatical matomatical commented Apr 15, 2020

I am explaining why this works in this blogpost.
Please feel free to give feedback, here!

Nice technique, and explanation!

In the blog post, your instructions suggest to "scale the ellipse horizontally by (2nσ_x) [and] vertically by (2nσ_y)". Is the factor of 2 really meant to be there? I think it makes sense to scale the ellipse by nσ in each dimension, not by 2nσ. Moreover, the only place a 2 appears in the code is in the conversion from radius to diameter.

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Apr 15, 2020

@matomatical: Thank you, yes the “2” is applied in order to turn the radius into a diameter. Because that’s what most graphics libraries require for plotting an ellipse. That section in the blog post is for good reason labeled as a “recipe” that an impatient reader might follow blindly. The actual scaling as opposed to normalizing would not require that factor 2. But an impatient reader would end up with an ellipse half the size that he or she would expect :-)

@matomatical

This comment has been minimized.

Copy link

@matomatical matomatical commented Apr 15, 2020

Ah, so it was intentional! Now I see. I think the wording just tripped me up---since I, perhaps in a rare moment of patience, did notice the previous instruction was in terms of radii. The scaling instructions then seemed inconsistent with that.

May I suggest a clarification to the post? If you remove the mention of radii (perhaps until later, in the derivation) and write equations (2) and (3) in terms of the diameter explicitly ('horizontal diameter = 2\sqrt{1+p}' and 'vertical diameter = 2\sqrt{1-p}'), and remove the factor of 2 from the scaling instructions, this would make the instructions both self-consistent and also slightly more closely aligned with your implementation.

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Apr 15, 2020

Rare moment of patience? You apparently read it all the way to the end! But thanks, will consider!

@SilasK

This comment has been minimized.

Copy link

@SilasK SilasK commented Jun 17, 2020

@CarstenSchelp

That's a nice link that really adds to the issue. Thank you. I am happy to read that people agree on that the 68-95-99.7 rule does not apply to n dimensions just like that. And in fact, the thought "How likely is it that a sample is within 3 sigma for all dimensions" can lead us to a valid estimation of that probability.

Ok, it is not trivial to map the standard deviation to confidence intervals (68,95 ...) but it is still possible, isn't it?
Is is true, that to plot the error ellipse according to a specific confidence interval of data to be assumed gaussian we can use:

ci=0.95
s= chi2.ppf(ci,2)
scale_x = np.sqrt(cov[0, 0] * s)

scale_y = np.sqrt(cov[1, 1] * s)
@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Jun 18, 2020

@SilasK
Hi Silas,

To me it always feels like in classical statistics all the math is buried in powerful but cryptic functions/conventions.
Even with chi2. Sorry, I was born that way.
So I am careful to just say "Yeah, your code is probably right."
But you might want to check your code against this:

I put this sentence earlier:
"How likely is it that a sample is within 3 sigma for all dimensions?"
(or any other number of standard deviations)

This sentence translates to:

p_n = p_single**n [eq01]

where
p_n is the probability that you aim for, looking at all dimensions,
p_single is the probability that you'd have to apply on each dimension by itself in order to get a resulting p_n
n is the number of dimensions

[eq01] can be rearranged to:

p_single = math.pow(p_n, 1/n)

(note that the ** operator would force 1/n into an integer)

If the resulting "single" probability results in the same scaling factor that your code yields, all should be fine!

@taruntejreddych

This comment has been minimized.

Copy link

@taruntejreddych taruntejreddych commented Sep 16, 2020

Hey, does anyone know how to plot the ellipses when i have their means and variances? I have used gmm to obtain the ellipse parameters, and was wondering how to plot the ellipse

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Sep 16, 2020

Hey, does anyone know how to plot the ellipses when i have their means and variances? I have used gmm to obtain the ellipse parameters, and was wondering how to plot the ellipse

Hi Tarun,
You create the ellipse with the origin ([0, 0]) as its center.
The means are nothing more than the point where you translate (="shift") the center of your ellipse to, then.
This is one of the transforms in the code of this gist.

As for the variances: In terms of the covariance matrix, the variances are the values on its diagonal.
The co-variance(s) are on the other fields. We are talking 2-dimensional, here, so there is only one covariance value, upper right and lower left in the this symmetric matrix.
So this gives you the matrix cov. That's what the not-normalized covariance matrix is called in the code of the gist.
You get from cov to the normalized version of it with
pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
That is your normalized covariance matirx, with the pearson coefficient.

So now you may have a picture how to get from your not yet normalized context into the line along which this code works.

The key for you is probably how to build that covariance matrix from the data you get out of gmm.
The method in the gist takes "raw"-two dimensional data (arrays x and y) but you can skip this and just build your covariance matrix directly.
Does this help?

@taruntejreddych

This comment has been minimized.

Copy link

@taruntejreddych taruntejreddych commented Sep 16, 2020

Hey Carsten,

It helped. Thanks

@randolf-scholz

This comment has been minimized.

Copy link

@randolf-scholz randolf-scholz commented Sep 23, 2020

Here's an even easier way to do it: create a circle and map it under the matrix square root of sigma like so:

T = np.linspace(0, 2*np.pi, num=1000)
circle = radius * np.vstack([np.cos(T), np.sin(T)])
x, y = sqrtm(Sigma) @ circle
plt.plot(x, y, '-r', linewidth=5)

Full demo:

import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import multivariate_normal,chi2
from scipy.linalg import sqrtm, inv

N = 10000
n = 2
X = np.random.randn(n, n)
Sigma = (1/n) * X.T @ X   # random covariance matrix
mu  = np.zeros(n)
distribution = multivariate_normal(mean=mu, cov=Sigma)
samples = distribution.rvs(N)

# confidence region is given by x.T S^{-1} x <= chi^2(p)
confidence = .75
r = np.sqrt( chi2.isf(1-confidence, df=n) )
T = np.linspace(0, 2*np.pi, num=1000)
circle = r * np.vstack([np.cos(T), np.sin(T)])
x, y = sqrtm(Sigma) @ circle
u, v = inv(sqrtm(Sigma)) @ samples.T
inside = u**2 + v**2 <= r**2
outside = ~inside

plt.plot(*samples[inside].T, '.b');
plt.plot(*samples[outside].T, '.g');
plt.plot(x, y, '-r', linewidth=5)
print(F"{confidence=}, empirical: {sum(inside)}/{N}")

Bonus: this approach works in the n-dimensional case as well.

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Sep 28, 2020

Hi Randolf,
Thanks for this alternative solution. Frankly, I didn't even know about scipy's sqrtm function and how you can squeeze a set of points in shape with it.
For processing, it looks like there's quite some heavy matrix-lifting going on, under the hood.
How would you determine how many points you have to generate to get a smooth plot?
That's typically the kind of questions I like to leave to matplotlib.
Also, I would expect that the number of required points skyrockets when the number of dimensions increases.
However, this clearly is beyond plotting, of course.
But maybe your experience with this approach would prove my concerns wrong.
Either way, thanks for teaching me something new!
Kind regards, Carsten

@aaloya

This comment has been minimized.

Copy link

@aaloya aaloya commented Oct 2, 2020

Hi Carsten,

Thank you for putting this script together, it's proven helpful for me. I am wondering if there is a (simple) way of getting a functional form of the ellipses (i.e. actual data points that if plotted would map out the trajectory of the ellipse). I am interested in extracting the direction of the major and minor axes and think if I had the data that outlines the ellipses, I could calculate these directions.

Thank you in advance,

Alvaro

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Oct 20, 2020

Hi Carsten,

Thank you for putting this script together, it's proven helpful for me. I am wondering if there is a (simple) way of getting a functional form of the ellipses (i.e. actual data points that if plotted would map out the trajectory of the ellipse). I am interested in extracting the direction of the major and minor axes and think if I had the data that outlines the ellipses, I could calculate these directions.

Thank you in advance,

Alvaro

Hi Alvaro,
It took a while for me to respond. My apologies. Looks like what you are looking for are the vertices of the ellipse. You can extract them using the code in this gist, but you can also just run the eigen-decomposition of the given covariance-matrix. The eigenvectors give you the direction of the ellipses' axes and the corresponding eigenvalues give you the square of the length along the eigenvector where the vertex is located. So the vertexes are not a big problem. However, an ellipse that is not aligned with the axes of your coordinate system is a major headache, because very few of those nice and simple ellipse-equations remain valid, in that case. The code in the gist leaves that problem to matplotlib's (great!) transform functionality. You might experiment with rotating the found vertices back and forth to find an arbitrary point on the ellipse. (Remember, if you normalize the covariance first the angle will always be 45 degrees, then rescaling will produce the right dimensions and the actual angle.) If you came to this gist directly without the corresponding article, you might want to check out the article. It explains a bit more: An Alternative Way to Plot the Covariance Ellipse
Kind regards, Carsten

@artemis-2020

This comment has been minimized.

Copy link

@artemis-2020 artemis-2020 commented Nov 24, 2020

Hi Carsten, thanks for this code.

It works great for most of my plots, however I have encountered a problem for log-log plots. I would like to draw ellipses around groups of datapoints, and for linear representation in the plot it works fine, for log-log it does not work.
I have tried to fix this by first transforming my data to logarithmic form and back (np.log10) and later convert it back. However, the data coming out of your function doesn't let me do anything to it (patches.Ellipse object).
Inside this object is something called "_path", and I somehow can't inspect it using Spyder (error: maximum recursion depth exceeded).
If this were a group of coordinates, I could perhaps iterate through them and transform them into linear again.
Do you have any pointers for me?

MWE:

`import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import matplotlib.transforms as transforms
from scipy.spatial import ConvexHull
from matplotlib import rcParams

def confidence_ellipse(x, y, ax, n_std=3.0, facecolor='none', **kwargs):
if x.size != y.size:
raise ValueError("x and y must be the same size")

cov = np.cov(x, y)
pearson = cov[0, 1]/np.sqrt(cov[0, 0] * cov[1, 1])
# Using a special case to obtain the eigenvalues of this
# two-dimensionl dataset.
ell_radius_x = np.sqrt(1 + pearson)
ell_radius_y = np.sqrt(1 - pearson)
ellipse = Ellipse((0, 0), width=ell_radius_x * 2, height=ell_radius_y * 2,
                  facecolor=facecolor, **kwargs)

# Calculating the stdandard deviation of x from
# the squareroot of the variance and multiplying
# with the given number of standard deviations.
scale_x = np.sqrt(cov[0, 0]) * n_std
mean_x = np.mean(x)

# calculating the stdandard deviation of y ...
scale_y = np.sqrt(cov[1, 1]) * n_std
mean_y = np.mean(y)

transf = transforms.Affine2D() \
    .rotate_deg(45) \
    .scale(scale_x, scale_y) \
    .translate(mean_x, mean_y)

ellipse.set_transform(transf + ax.transData)
return ellipse

plt.close('all')

Data

T_cg = [0.05,0.04,0.001,0.001,0.025,0.05,0.1,0.006,0.053,0.01,0.0035,0.11,0.001,2.36,0.15,0.0044,0.04,0.12,0.12,3.6,0.5,0.001,0.04]
I_cg = [43,35,43,75,40,45,30,69,65,40,47,64,75,65,89,65,60,57,65,65,80,110,110]
L_cg = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]

fig = plt.figure()
plt.grid()
plt.title('Log-log plot')

scattercolors= ['blue','red','orange','purple','green','yellow','gray','cyan','magenta','black']
for i in range(0,len(T_cg)):
labeltext = str(L_cg[i])
xloc = I_cg[i]
yloc = T_cg[i]
sc1 = plt.scatter(xloc,yloc,marker='^',color=scattercolors[0])
plt.text(xloc,yloc,labeltext,c='black',fontsize=12)

ax = plt.gca()
ellipse = confidence_ellipse(np.array(np.log10(I_cg)), np.array(np.log10(T_cg)), ax, n_std=3.0, edgecolor=scattercolors[0], facecolor='none')
ax.add_patch(ellipse)
plt.xlabel('X')
plt.ylabel('Y')
plt.xscale('log')
plt.yscale('log')
plt.show()`

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Nov 24, 2020

Hi artemis-2020,
Thank you for your question. Covariance is a measure of how datasets are linear dependent from each other.
So I doubt that there is anything useful to plot or see when making the axes logarithmic. Even if the ellipse would fit the datapoints. (And it would be quite distorted, then).
If you want to make the resulting shape fit your datapoints, you might want to experiment with randolf-scholz's method (see above). With that method you get actual points instead of a 'shape' and you could try and shift and squeeze them into place. But again, would the result be of any use, visually?
I like your idea to do logarithms on the input data. Then you can investigate on the linear dependency between those "logged" datasets.
The still linear axes would have to be tweaked a little, in order to look like proper "log" axes, though.
But that's where it ends, in my opinion. There is no use in trying to go back to linear, somehow.

After all, the purpose of the confidence ellipse is to give an easy insight in what's "in" and what's "out". When its clear shape and symmetry is gone, you don't see anything much, anymore.
I hope that I understood your question well, an I hope even more that I am making some sense, here ;-)
Is this any help to you?

Kind regards, Carsten

@artemis-2020

This comment has been minimized.

Copy link

@artemis-2020 artemis-2020 commented Nov 24, 2020

Carsten, thanks for your reply.
I might try manually generating some shapes and squeezing them to fit.

@EmilioCMartinez

This comment has been minimized.

Copy link

@EmilioCMartinez EmilioCMartinez commented Dec 15, 2020

how would this be called from a pandas df? can x,y arguments be from columns of a pandas df that are both type:float64 ?

I cannot seem to generate a plot correctly, thank you!

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Dec 15, 2020

how would this be called from a pandas df? can x,y arguments be from columns of a pandas df that are both type:float64 ?

I cannot seem to generate a plot correctly, thank you!

Yes, pandas is wrapped all around numpy and it is easy to go back and forth, concerning data types.
For plotting, try this:

`import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
import matplotlib.transforms as transforms

# def confidence_ellipse function, here ...

# create 2D-data

k = 100
data = np.random.normal(size=(2 * k))
data.shape = (2, k)

# create some interdependency

data[1] = (data[0] + data[1])/2

# now pandas:

df = pd.DataFrame(data=data.T, columns=["x", "y"])

# and plot:

fig, ax = plt.subplots(figsize=(10, 10))
confidence_ellipse(df.x, df.y, ax, edgecolor='darkgrey')
plt.xlim((-4, 4))
plt.ylim((-4, 4))
plt.show()`

@ilkot

This comment has been minimized.

Copy link

@ilkot ilkot commented Feb 2, 2021

Hi, thanks for the implementation that helps a lot!
I wonder if there is any method to extract 'inside the ellipse' points?
I tried contains_point() method but it gives False continously even its center, I wonder why this is happening?

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Feb 2, 2021

Hi, thanks for the implementation that helps a lot!
I wonder if there is any method to extract 'inside the ellipse' points?
I tried contains_point() method but it gives False continously even its center, I wonder why this is happening?

Hi ilkot,
Are you calling methods on the ellipse object that is returned? That is outside my matplotlib knowledge. When submitting this as an example to the matplotlib docs, I was told that it is a good idea to return the ellipse object so that a user cold still operate on it. I just followed that recommendation, without ever using that feature myself.
You are probably looking for the Mahalanobis distance. It is turn-key implemented in scipy, as far as I know.

@vic1309

This comment has been minimized.

Copy link

@vic1309 vic1309 commented Feb 7, 2021

Thank you for the implementation, @CarstenSchelp. I was wondering about an easy way to obtain the ellipses border. I would like to plot it in a map object, so I have to convert the original data to the map projection.

Cheers,
V

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Feb 7, 2021

Thank you for the implementation, @CarstenSchelp. I was wondering about an easy way to obtain the ellipses border. I would like to plot it in a map object, so I have to convert the original data to the map projection.

Cheers,
V

Hi Vic,
Thanks for using it :-)
Are you thinking of creation data for the ellipse, like center x, y, radius a & b and angle of rotation so that you can plot another ellipse object in another framework (like QGis or ArcGis)? Or do you need a set of points on that ellipse?

@vic1309

This comment has been minimized.

Copy link

@vic1309 vic1309 commented Feb 8, 2021

Thank you for the quick reply, Carsten!

No, not in GIS frameworks, but rather in Basemap or Cartopy. I have a set of latitude and longitude data which I fitted the ellipse, but I would like to project it on a map now. I kind of solve it by firstly projecting my data into the map object (x, y = m(lon, lat)), then fitting the ellipse and plotting it in the same ax which the map is referenced to. Maybe not the most efficient way..

@CarstenSchelp

This comment has been minimized.

Copy link
Owner Author

@CarstenSchelp CarstenSchelp commented Feb 8, 2021

Thank you for the quick reply, Carsten!

No, not in GIS frameworks, but rather in Basemap or Cartopy. I have a set of latitude and longitude data which I fitted the ellipse, but I would like to project it on a map now. I kind of solve it by firstly projecting my data into the map object (x, y = m(lon, lat)), then fitting the ellipse and plotting it in the same ax which the map is referenced to. Maybe not the most efficient way..

Hi @vic1309
Sounds good to me, actually. Working with your framework and not against it.
Through the kwargs you can have the ellipse rendered in any way matplotlib allows.
Things tend to get kind of messy and hard to maintain when you do things like generating points and such.
So I think your approach is just fine.
Have you read the kwargs section at the end of this article?
https://matplotlib.org/gallery/statistics/confidence_ellipse.html#sphx-glr-gallery-statistics-confidence-ellipse-py

Or do I not understand your question fully?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment