Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
[Python] Fitting plane/surface to a set of data points

Python version of the MATLAB code in this Stack Overflow post: http://stackoverflow.com/a/18648210/97160

The example shows how to determine the best-fit plane/surface (1st or higher order polynomial) over a set of three-dimensional points.

Implemented in Python + NumPy + SciPy + matplotlib.

quadratic_surface

#!/usr/bin/evn python
import numpy as np
import scipy.linalg
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
# some 3-dim points
mean = np.array([0.0,0.0,0.0])
cov = np.array([[1.0,-0.5,0.8], [-0.5,1.1,0.0], [0.8,0.0,1.0]])
data = np.random.multivariate_normal(mean, cov, 50)
# regular grid covering the domain of the data
X,Y = np.meshgrid(np.arange(-3.0, 3.0, 0.5), np.arange(-3.0, 3.0, 0.5))
XX = X.flatten()
YY = Y.flatten()
order = 1 # 1: linear, 2: quadratic
if order == 1:
# best-fit linear plane
A = np.c_[data[:,0], data[:,1], np.ones(data.shape[0])]
C,_,_,_ = scipy.linalg.lstsq(A, data[:,2]) # coefficients
# evaluate it on grid
Z = C[0]*X + C[1]*Y + C[2]
# or expressed using matrix/vector product
#Z = np.dot(np.c_[XX, YY, np.ones(XX.shape)], C).reshape(X.shape)
elif order == 2:
# best-fit quadratic curve
A = np.c_[np.ones(data.shape[0]), data[:,:2], np.prod(data[:,:2], axis=1), data[:,:2]**2]
C,_,_,_ = scipy.linalg.lstsq(A, data[:,2])
# evaluate it on a grid
Z = np.dot(np.c_[np.ones(XX.shape), XX, YY, XX*YY, XX**2, YY**2], C).reshape(X.shape)
# plot points and fitted surface
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2)
ax.scatter(data[:,0], data[:,1], data[:,2], c='r', s=50)
plt.xlabel('X')
plt.ylabel('Y')
ax.set_zlabel('Z')
ax.axis('equal')
ax.axis('tight')
plt.show()
@diffracteD

This comment has been minimized.

Copy link

@diffracteD diffracteD commented Feb 23, 2015

@amroamroamro
mean = np.array([0.0,0.0,0.0])
cov = np.array([[1.0,-0.5,0.8], [-0.5,1.1,0.0], [0.8,0.0,1.0]])
data = np.random.multivariate_normal(mean, cov, 50)

This part i don't understand clearly.
I have x,y,z axis data stored in 3 lists named x, y and z.
for example, x=[1.2, 1.3, 1.6, 2.5, 2,3, 2.8]
y=[167.0, 180.3, 177.8,160.4,179.6, 154.3]
z=[-0.3, -0.8, -0.75, -1.21, -1.65, -0.68]
How can i imply this code with such data ? a little bit more explanation would be very helpful.
thanks.

@amroamroamro

This comment has been minimized.

Copy link
Owner Author

@amroamroamro amroamroamro commented Feb 25, 2015

@diffracteD:
For the sake of the example, I was just using random data.
Obviously if you already have the points, used them instead...

So replace that part with:

x = [1.2, 1.3, 1.6, 2.5, 2.3, 2.8]
y = [167.0, 180.3, 177.8,160.4,179.6, 154.3]
z = [-0.3, -0.8, -0.75, -1.21, -1.65, -0.68]
data = np.c_[x,y,z]

Then data will be a 6x3 matrix of points (each row is a point). Keep in mind that this sort of surface-fitting works better if you have a bit more than just 6 data points.

Also you'll have to adjust the range of the grid created to that of the data. Something like:

# regular grid covering the domain of the data
mn = np.min(data, axis=0)
mx = np.max(data, axis=0)
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
@ghost

This comment has been minimized.

Copy link

@ghost ghost commented Aug 14, 2015

I have st_patterns,tx_patterns,optimized_matrix axis data stored in 3 lists.
st_patterns =[1,2,3,4,5,6,8,10,12,14,17,20,23,26,29,32,36,40,45,51,57,64,71,78,85,91,96,106,116,122,128,141,151,160,177,192,209,224,241,256,273,288,305,320,337,352,384,416,448,480,512,544,576,608,640,672,704,736,768,800,832,864,896,928,960,992,1024,1056,1088,1120,1152,1184,1216,1248,1280,1312,1344,1376,1408,1440,1472,1504,1536,1568,1600,1632,1664,1696,1728,1760,1792,1824,1856,1888,1920,1952,1984,2016,2048,2080,2112,2144,2176,2208,2240,2272,2304,2336,2368,2400,2432,2464,2496,2528,2560,2592,2624,2656,2688,2720,2752,2784,2816,2848,2880,2912,2944,2976,3008,3040,3072,3104,3136,3168,3200,3232,3264,3296,3328,3360,3392,3424,3456,3488,3520,3552,3584,3616,3648,3680,3712,3744,3776,3808,3840,3872,3904,3936,3968,4000,4032,4064,4096,4128,4160,4192,4224,4256,4288,4320,4352,4384,4416,4448,4480,4512,4544,4576,4608,4640,4672,4704,4736,4768,4800,4832,4864,4896,4928,4940]
tx_patterns =[2,3,4,5,6,7,8,10,12,14,17,20,23,26,29,32,36,40,45,51,57,64,71,78,85,91,96,100,106,116,122,128,141,151,160,177,192,200,209,224,241,256,273,288, 300,337,371, 400,434,467, 500,551, 600, 651, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800, 4900, 5000, 5100, 5200, 5300, 5400, 5500, 5600, 5700, 5800, 5900, 6000, 6100, 6200, 6300, 6400, 6500, 6600, 6700, 6800, 6900, 7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800, 7900, 8000, 8100, 8200, 8300, 8400, 8500, 8600, 8700, 8800, 8900, 9000, 9100, 9200, 9300, 9400, 9500, 9600, 9700, 9800, 9900, 10000, 10100, 10200, 10300, 10400, 10500, 10600, 10700, 10800, 10900, 11000, 11100, 11200, 11300, 11400, 11500, 11600, 11700, 11800, 11842]
And my final list is a matrix which is a list of list as you guess. [167][190]
So how I fit my data into your script?
When I put my data into your script it gives me an error : ValueError: all the input array dimensions except for the concatenation axis must match exactly

@romanrdgz

This comment has been minimized.

Copy link

@romanrdgz romanrdgz commented Feb 11, 2016

Any possibility of using order 3? How? Tried having a look at numpy's Chebysev fit implementations, but no examples are available

@patrickjwright

This comment has been minimized.

Copy link

@patrickjwright patrickjwright commented Feb 12, 2016

This is a great method, thanks! What is the easiest way to get the equation for the 2nd-order (quadratic) plane?

@patrickjwright

This comment has been minimized.

Copy link

@patrickjwright patrickjwright commented Feb 15, 2016

To answer my own question, the 2nd-order equation, using the coefficients returned in C:
Z = C[4]*X**2. + C[5]*Y**2. + C[3]*X*Y + C[1]*X + C[2]*Y + C[0]

@vahidm

This comment has been minimized.

Copy link

@vahidm vahidm commented Jul 12, 2016

Hi, thanks for the code.

Can it easily be generalized for higher dimensions and higher orders? for example fit 2D plane for any n-dimensional dataset.

@sirgogo

This comment has been minimized.

Copy link

@sirgogo sirgogo commented Sep 19, 2016

Hi, thanks for the post. Just to clarify, you are stuffing all the errors into the Z variable, correct? This might be helpful to state, because otherwise the term "best-fit plane/surface" is misleading.

@reox

This comment has been minimized.

Copy link

@reox reox commented Nov 11, 2016

@romanrdgz: sure! I think this is how it should work:

import numpy as np
from itertools import combinations
import scipy.linalg
x = [1.2, 1.3, 1.6, 2.5, 2.3, 2.8]
y = [167.0, 180.3, 177.8, 160.4, 179.6, 154.3]
z = [-0.3, -0.8, -0.75, -1.21, -1.65, -0.68]
f = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6]).transpose()

G = np.c_[x, y, z]

A = np.concatenate((G, np.ones((G.shape[0],1))), axis=1)
C, _, _, _ = scipy.linalg.lstsq(A, f)
# C will have now the coefficients for:
# f(x, y, z) = ax + by + cz + d

# quadratic eq.
dim = G.shape[1]
A = np.concatenate((G**2, np.array([np.prod(G[:, k], axis=1) for k in combinations(range(dim), dim-1)]).transpose(), G, np.ones((G.shape[0], 1))), axis=1)
C, _, _, _ = scipy.linalg.lstsq(A, f)
# C will have now the coefficients for:
# f(x, y, z) = ax**2 + by**2 + cz**2 + dxy+ exz + fyz + gx + hy + iz + j

# This can be used then:
def quadratic(a):
    dim = a.shape[0]
    A = np.concatenate((a**2, np.array([np.prod(a[k,]) for k in combinations(range(dim), dim-1)]), a, [1]))
    return np.sum(np.dot(A, C))

for i in range(G.shape[0]):
    print(quadratic(G[i,:]), f[i])

You can easily extend to n dimensions.

@8556732

This comment has been minimized.

Copy link

@8556732 8556732 commented Mar 1, 2017

Hey, python newb here but learning fast. First of all thanks for the code and method, I've adapted some of it to my PhD work.

I was wondering how you would go about projecting the contours of the resultant surface onto a 2D plot. Normally I would do this by plotting polylines for various values of X, Y and Z. Is there a simpler way of just projecting the surface onto one of the axes in 2d?

@alessandropfn

This comment has been minimized.

Copy link

@alessandropfn alessandropfn commented Mar 31, 2017

Everything works, thanks! Now, what I have to do if I want to get the equation of the plane?

@8556732

This comment has been minimized.

Copy link

@8556732 8556732 commented May 5, 2017

In my mind, to get the equation of the plane one just uses the coefficients to evaluate the original equation used to plot the surface. So in the case of the linear plot, the form is:

Z = C[0]*X + C[1]*Y + C[2] 

With your A, B and C constants respectively from the line "print C", wherein C was defined as the constants for the surface in the function

@keerthanamanivannan

This comment has been minimized.

Copy link

@keerthanamanivannan keerthanamanivannan commented Nov 3, 2017

@sirgogo what do you mean stuffing all the errors in Z variable? Can you explain a little bit please?

@wiki-yu

This comment has been minimized.

Copy link

@wiki-yu wiki-yu commented Feb 25, 2018

Thanks for the code. By using the linear part, what we got is minimizing the vertical offsets, not perpendicular offsets. I am wondering do you have any idea how to do implementation for perpendicular offsets?

@sjboege

This comment has been minimized.

Copy link

@sjboege sjboege commented May 15, 2018

Note that if you want to adapt this for your own data put x values in data[:, 0], y values in data[:, 1] and z values in data[:, 2].
For example, ZZ values can be calculated using
C[0]=0.0200290632278191
C[1] = 0.0500726580695476
C[2] = 10.0145316139095
for XX and YY.
X,Y = np.meshgrid(np.arange(2, 11, 2), np.arange(1, 6, 1))
XX = X.flatten()
YY = Y.flatten()
ZZ = np.array([-10.1046624, -10.14472052, -10.18477865, -10.22483678, -10.2648949,
-10.15473506, -10.19479318, -10.23485131, -10.27490944, -10.31496756,
-10.20480771, -10.24486584, -10.28492397, -10.32498209, -10.36504022,
-10.25488037, -10.2949385, -10.33499663, -10.37505475, -10.41511288,
-10.30495303, -10.34501116, -10.38506928, -10.42512741, -10.46518554])
data = np.column_stack((XX, YY, ZZ))

The first order fit returns the correct results.

@DennisScuba

This comment has been minimized.

Copy link

@DennisScuba DennisScuba commented Jun 28, 2018

Hi, thank you very much for the code, it's very helpful! Just I have the problem, that I have to use a function like the following:
h= acos(wt) + bsin(wt) + ct² + dt + e
where a,b,c,d,e and w is unknown. How can I apply this to the A matrix? Should I apply a taylor polynom and suggest values for frequency w?
Thank you very much for your help!

@madgrizzle

This comment has been minimized.

Copy link

@madgrizzle madgrizzle commented Dec 3, 2018

How would you fit to 3rd degree (cubic) equation?

@ramiromena

This comment has been minimized.

Copy link

@ramiromena ramiromena commented Jan 11, 2019

Hi all,
First of all, thanks for the code. It is really useful.
As @madgrizzle stated before, I am trying to implement the cubic option.
For the quadratic function, the code generates 6 coefficients ( XX, YY, XX*YY, XX**2, YY**2 and the constant)
Then, for the cubic function, 8 coefficients are expected.
However, if I use the Pascal's triangle, 10 coefficients should be required (the 6 of the quadratic case + XX**2*YY, XX*YY**2, XX**3, YY**3)
Then, there are two terms that are dropped and I do not why.
I tried to read the documentation for scipy.linalg.lstsq, but I couldn't find any explanation.
Any suggestion or reference will be appreciated.
Thanks in advance.

@PatrickTanta

This comment has been minimized.

Copy link

@PatrickTanta PatrickTanta commented Jan 17, 2019

To answer my own question, the 2nd-order equation, using the coefficients returned in C:
Z = C[4]*X**2. + C[5]*Y**2. + C[3]*X*Y + C[1]*X + C[2]*Y + C[0]

did you solve the problem ? i have the same dude u.u

@Lukxl

This comment has been minimized.

Copy link

@Lukxl Lukxl commented Jan 16, 2020

Thanks for this good code.
I do have one problem, I don't see the plane when I run this code. I only get my data points. Any Idea?

@amroamroamro

This comment has been minimized.

Copy link
Owner Author

@amroamroamro amroamroamro commented Jan 16, 2020

@Lukxl

maybe you need to adjust the stride/count params in the surface plot function to fit your data range:

ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2, linewidth=0.5, edgecolor='b')

Refer to docs:

@Lukxl

This comment has been minimized.

Copy link

@Lukxl Lukxl commented Jan 16, 2020

Thank you for this idea. I'll check and try this

Edit: Still I don't see the surface. I only get one line in my 3d plot.

Is it possible that the fitted surface results in a 2D line?

@amroamroamro

This comment has been minimized.

Copy link
Owner Author

@amroamroamro amroamroamro commented Jan 16, 2020

@Lukxl

I'm assuming you adjusted the grid in the code to cover your actual data range?

# regular grid covering the domain of the data
mn = np.min(data, axis=0)
mx = np.max(data, axis=0)
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))

You should also make sure that the data is well-behaved (i.e look at the scatter plot) and that the fit didn't result in a degenerate model (manually inspect the coefficient values in C)

@Lukxl

This comment has been minimized.

Copy link

@Lukxl Lukxl commented Jan 17, 2020

I did. I figured out how I can do it.
I needed another fit function as you used.
Thanks for the well support.

@MSchmidt99

This comment has been minimized.

Copy link

@MSchmidt99 MSchmidt99 commented Mar 8, 2020

@madgrizzle

I took the time to linearize this for up to the nth degree over the afternoon. You will want to make sure that all your data has 64 bit datatype (ex: np.int64(x) or np.float64(z)) because at higher degree calculations things will begin to break when overflow warnings come up.
This is what I threw together:

# functions

# in: [meshgrid], [meshgrid], [np list of coordinate pair np lists. ex: [[x1,y1,z1], [x2,y2,z2], etc.] ], [degree]
# out: [Z]
def curve(X, Y, coord, n):
    XX = X.flatten()
    YY = Y.flatten()

    # best-fit curve
    A = XYchooseN(coord[:,0], coord[:,1], n)
    C,_,_,_ = scipy.linalg.lstsq(A, coord[:,2])
    print("Calculating C in curve() done")
    # evaluate it on a grid
    Z = np.dot(XYchooseN(XX, YY, n), C).reshape(X.shape)
    return Z

# in: [array], [array], [int]
# out: sum from k=0 to k=n of n choose k for x^n-k * y^k (coefficients ignored)
def XYchooseN(x,y,n):
    XYchooseN = []
    n = n+1
    for j in range(len(x)):
        I = x[j]
        J = y[j]
        matrix = []
        Is = []
        Js = []
        for i in range(0,n):
            Is.append(I**i)
            Js.append(J**i)
            matrix.append(np.concatenate((np.ones(n-i),np.zeros(i))))
        Is = np.array(Is)
        Js = np.array(Js)[np.newaxis]
        IsJs0s = matrix * Is * Js.T
        IsJs = []
        for i in range(0,n):
            IsJs = np.concatenate((IsJs,IsJs0s[i,:n-i]))
        XYchooseN.append(IsJs)
    return np.array(XYchooseN)

X,Y = np.meshgrid(x_data,y_data) # surface meshgrid

Z = curve(X, Y, coord, 3)

# todo: cmap by avg from (x1,y1) to (x2,y2) of |Z height - scatter height|

ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.5, color='r')
@13Ducks

This comment has been minimized.

Copy link

@13Ducks 13Ducks commented Mar 17, 2020

@MSchmidt99

Thanks for the code. I am confused about what the "coord" variable represents.

Thanks.

@MSchmidt99

This comment has been minimized.

Copy link

@MSchmidt99 MSchmidt99 commented Mar 18, 2020

@13Ducks
Sorry about that, in my comment I made it look like a tuple. coord is just a numpy array of shape (n, 3) where n is the number of coordinates, and 3 is the graph dimensions with index 0 as X, 1 as Y, and 2 as Z. I have edited the function description in my earlier comment to make it less misleading.

@karlmolina

This comment has been minimized.

Copy link

@karlmolina karlmolina commented Apr 2, 2020

@MSchmidt99

What is x_data and y_data?

@GrljAles

This comment has been minimized.

Copy link

@GrljAles GrljAles commented Apr 9, 2020

Would it be possible to use this to detrend my data, and how would I approach this?

@EmanuelNazareth

This comment has been minimized.

Copy link

@EmanuelNazareth EmanuelNazareth commented May 2, 2020

Excellent code, very handy. For those interested in R2 and R2 adjusted here's the code:

from sklearn.metrics import r2_score

z_pred= C[0]*np.array(x) + C[1]*np.array(y) + C[2]

#print R2
r2_score(z, z_pred)

def Adj_r2(Rsquared,n,p):
#n = number of data points, p = independent variables
output = 1-(1-Rsquared)*(n-1)/(n-p-1)
return output

#print R2_adjusted
Adj_r2(r2_score(z, z_calc),102,3)

@gabrilmf92

This comment has been minimized.

Copy link

@gabrilmf92 gabrilmf92 commented Jul 9, 2020

Hi everyone,

Dear @amroamroamro,
In relation to your original code, I have x,y,z-axis data stored in 3 lists named x, y and z. The structure of my table data is the next:

0 | 2 | 4
13.7 | 4436.95646 | 4774.21409
14.3 | 4311.55363 | 4688.34413
14.8 | 4299.82329 | 4687.26375
15.3 | 4164.97164 | 4587.78392
where the values of 0 column are the values of x axis, 2 and 4 the values of y axis and the rest of values the z axis

I would like to get an equation like this:
a·x^2·y^2 +b·x^2·y + c·x^2 + d·x·y^2 + e·x·y +f·x + g·y^2 + h·y + i
How can I modify the code to get the equation?
A little bit more explanation would be very helpful.
Thanks.

@amroamroamro

This comment has been minimized.

Copy link
Owner Author

@amroamroamro amroamroamro commented Jul 9, 2020

@gabrilmf92

I believe I already answered something similar, see above comments:

# your 3-dim data
x = [13.7, 14.3, 14.8, 15.3]
y = [4436.95646, 4311.55363, 4299.82329, 4164.97164]
z = [4774.21409, 4688.34413, 4687.26375, 4587.78392]
data = np.c_[x,y,z]

# regular grid covering the domain of the data
mn = np.min(data, axis=0)
mx = np.max(data, axis=0)
X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))
XX = X.flatten()
YY = Y.flatten()

order = 2  # quadratic fit

PS: your data consists of merely 4 points looking almost perfectly co-planar, so a linear fit would suffice.

@dnoon20

This comment has been minimized.

Copy link

@dnoon20 dnoon20 commented Aug 23, 2020

How can I modify my surface so that the bounds are from a polygonal domain (in that instead of a standard x/y domain, the area within the 2d polygon represents the limits of the surface)?

@alexlib

This comment has been minimized.

Copy link

@alexlib alexlib commented Aug 27, 2020

Here is a cubic version

elif order == 3:
    # M = [ones(size(x)), x, y, x.^2, x.*y, y.^2, x.^3, x.^2.*y, x.*y.^2, y.^3]
    A = np.c_[np.ones(data.shape[0]), data[:,:2], data[:,0]**2, np.prod(data[:,:2], axis=1), \
              data[:,1]**2, data[:,0]**3, np.prod(np.c_[data[:,0]**2,data[:,1]],axis=1), \
              np.prod(np.c_[data[:,0],data[:,1]**2],axis=1), data[:,2]**3]
    C,_,_,_ = scipy.linalg.lstsq(A, data[:,2])
    
    Z = np.dot(np.c_[np.ones(XX.shape), XX, YY, XX**2, XX*YY, YY**2, XX**3, XX**2*YY, XX*YY**2, YY**3], C).reshape(X.shape)

@mirgilani

This comment has been minimized.

Copy link

@mirgilani mirgilani commented Sep 7, 2020

Dear @alexlib
Thanks for code for cubic oder. I have print the coefficient . unfortunately I cannot understand which coefficient belong to which one.
If I am not mistaken : print(c) give coefficient like : c[6]x^3 + c[9]y^3 + c[7]x^2 y + c[8] x y^2 + c[3] x^2 + c[5] y^2 + c[4] xy + c[1]x + c[2] y + c[0]. but the result is really odd.
I will appriciate for any help

@alexlib

This comment has been minimized.

Copy link

@alexlib alexlib commented Sep 16, 2020

Dear @alexlib
Thanks for code for cubic oder. I have print the coefficient . unfortunately I cannot understand which coefficient belong to which one.
If I am not mistaken : print(c) give coefficient like : c[6]x^3 + c[9]y^3 + c[7]x^2 y + c[8] x y^2 + c[3] x^2 + c[5] y^2 + c[4] xy + c[1]x + c[2] y + c[0]. but the result is really odd.
I will appriciate for any help

please see the fork
https://gist.github.com/alexlib/05abf4e42a0341047b1da12e69c2f3a3

@PhuDoTuong

This comment has been minimized.

Copy link

@PhuDoTuong PhuDoTuong commented Jun 30, 2021

Does anyone try to use Ceres Solver ?

@lukaszprawdziwy

This comment has been minimized.

Copy link

@lukaszprawdziwy lukaszprawdziwy commented Sep 16, 2021

Hello,
Thank you for the handy code.
Is there a way to fit a surface with an exponential function?
I think of something like:
c[0] + c[1] * x + c[2] * y+ c[3] * np.exp(c[4] * x) + c[5] * (np.exp(c[6] * y)

how to feed it to scipy.linalg.lstsq?

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment