{{ message }}

Instantly share code, notes, and snippets.

Last active Sep 16, 2021
[Python] Fitting plane/surface to a set of data points

Python version of the MATLAB code in this Stack Overflow post: http://stackoverflow.com/a/18648210/97160

The example shows how to determine the best-fit plane/surface (1st or higher order polynomial) over a set of three-dimensional points.

Implemented in Python + NumPy + SciPy + matplotlib.

 #!/usr/bin/evn python import numpy as np import scipy.linalg from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt # some 3-dim points mean = np.array([0.0,0.0,0.0]) cov = np.array([[1.0,-0.5,0.8], [-0.5,1.1,0.0], [0.8,0.0,1.0]]) data = np.random.multivariate_normal(mean, cov, 50) # regular grid covering the domain of the data X,Y = np.meshgrid(np.arange(-3.0, 3.0, 0.5), np.arange(-3.0, 3.0, 0.5)) XX = X.flatten() YY = Y.flatten() order = 1 # 1: linear, 2: quadratic if order == 1: # best-fit linear plane A = np.c_[data[:,0], data[:,1], np.ones(data.shape[0])] C,_,_,_ = scipy.linalg.lstsq(A, data[:,2]) # coefficients # evaluate it on grid Z = C[0]*X + C[1]*Y + C[2] # or expressed using matrix/vector product #Z = np.dot(np.c_[XX, YY, np.ones(XX.shape)], C).reshape(X.shape) elif order == 2: # best-fit quadratic curve A = np.c_[np.ones(data.shape[0]), data[:,:2], np.prod(data[:,:2], axis=1), data[:,:2]**2] C,_,_,_ = scipy.linalg.lstsq(A, data[:,2]) # evaluate it on a grid Z = np.dot(np.c_[np.ones(XX.shape), XX, YY, XX*YY, XX**2, YY**2], C).reshape(X.shape) # plot points and fitted surface fig = plt.figure() ax = fig.gca(projection='3d') ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2) ax.scatter(data[:,0], data[:,1], data[:,2], c='r', s=50) plt.xlabel('X') plt.ylabel('Y') ax.set_zlabel('Z') ax.axis('equal') ax.axis('tight') plt.show()

### diffracteD commented Feb 23, 2015

 @amroamroamro mean = np.array([0.0,0.0,0.0]) cov = np.array([[1.0,-0.5,0.8], [-0.5,1.1,0.0], [0.8,0.0,1.0]]) data = np.random.multivariate_normal(mean, cov, 50) This part i don't understand clearly. I have x,y,z axis data stored in 3 lists named x, y and z. for example, x=[1.2, 1.3, 1.6, 2.5, 2,3, 2.8] y=[167.0, 180.3, 177.8,160.4,179.6, 154.3] z=[-0.3, -0.8, -0.75, -1.21, -1.65, -0.68] How can i imply this code with such data ? a little bit more explanation would be very helpful. thanks.

### amroamroamro commented Feb 25, 2015

 @diffracteD: For the sake of the example, I was just using random data. Obviously if you already have the points, used them instead... So replace that part with: ```x = [1.2, 1.3, 1.6, 2.5, 2.3, 2.8] y = [167.0, 180.3, 177.8,160.4,179.6, 154.3] z = [-0.3, -0.8, -0.75, -1.21, -1.65, -0.68] data = np.c_[x,y,z]``` Then `data` will be a 6x3 matrix of points (each row is a point). Keep in mind that this sort of surface-fitting works better if you have a bit more than just 6 data points. Also you'll have to adjust the range of the grid created to that of the data. Something like: ```# regular grid covering the domain of the data mn = np.min(data, axis=0) mx = np.max(data, axis=0) X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))```

### ghost commented Aug 14, 2015

 I have st_patterns,tx_patterns,optimized_matrix axis data stored in 3 lists. st_patterns =[1,2,3,4,5,6,8,10,12,14,17,20,23,26,29,32,36,40,45,51,57,64,71,78,85,91,96,106,116,122,128,141,151,160,177,192,209,224,241,256,273,288,305,320,337,352,384,416,448,480,512,544,576,608,640,672,704,736,768,800,832,864,896,928,960,992,1024,1056,1088,1120,1152,1184,1216,1248,1280,1312,1344,1376,1408,1440,1472,1504,1536,1568,1600,1632,1664,1696,1728,1760,1792,1824,1856,1888,1920,1952,1984,2016,2048,2080,2112,2144,2176,2208,2240,2272,2304,2336,2368,2400,2432,2464,2496,2528,2560,2592,2624,2656,2688,2720,2752,2784,2816,2848,2880,2912,2944,2976,3008,3040,3072,3104,3136,3168,3200,3232,3264,3296,3328,3360,3392,3424,3456,3488,3520,3552,3584,3616,3648,3680,3712,3744,3776,3808,3840,3872,3904,3936,3968,4000,4032,4064,4096,4128,4160,4192,4224,4256,4288,4320,4352,4384,4416,4448,4480,4512,4544,4576,4608,4640,4672,4704,4736,4768,4800,4832,4864,4896,4928,4940] tx_patterns =[2,3,4,5,6,7,8,10,12,14,17,20,23,26,29,32,36,40,45,51,57,64,71,78,85,91,96,100,106,116,122,128,141,151,160,177,192,200,209,224,241,256,273,288, 300,337,371, 400,434,467, 500,551, 600, 651, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000, 3100, 3200, 3300, 3400, 3500, 3600, 3700, 3800, 3900, 4000, 4100, 4200, 4300, 4400, 4500, 4600, 4700, 4800, 4900, 5000, 5100, 5200, 5300, 5400, 5500, 5600, 5700, 5800, 5900, 6000, 6100, 6200, 6300, 6400, 6500, 6600, 6700, 6800, 6900, 7000, 7100, 7200, 7300, 7400, 7500, 7600, 7700, 7800, 7900, 8000, 8100, 8200, 8300, 8400, 8500, 8600, 8700, 8800, 8900, 9000, 9100, 9200, 9300, 9400, 9500, 9600, 9700, 9800, 9900, 10000, 10100, 10200, 10300, 10400, 10500, 10600, 10700, 10800, 10900, 11000, 11100, 11200, 11300, 11400, 11500, 11600, 11700, 11800, 11842] And my final list is a matrix which is a list of list as you guess. [167][190] So how I fit my data into your script? When I put my data into your script it gives me an error : ValueError: all the input array dimensions except for the concatenation axis must match exactly

### romanrdgz commented Feb 11, 2016

 Any possibility of using order 3? How? Tried having a look at numpy's Chebysev fit implementations, but no examples are available

### patrickjwright commented Feb 12, 2016

 This is a great method, thanks! What is the easiest way to get the equation for the 2nd-order (quadratic) plane?

### patrickjwright commented Feb 15, 2016

 To answer my own question, the 2nd-order equation, using the coefficients returned in C: `Z = C[4]*X**2. + C[5]*Y**2. + C[3]*X*Y + C[1]*X + C[2]*Y + C[0]`

### vahidm commented Jul 12, 2016

 Hi, thanks for the code. Can it easily be generalized for higher dimensions and higher orders? for example fit 2D plane for any n-dimensional dataset.

### sirgogo commented Sep 19, 2016 • edited

 Hi, thanks for the post. Just to clarify, you are stuffing all the errors into the Z variable, correct? This might be helpful to state, because otherwise the term "best-fit plane/surface" is misleading.

### reox commented Nov 11, 2016 • edited

 @romanrdgz: sure! I think this is how it should work: ```import numpy as np from itertools import combinations import scipy.linalg x = [1.2, 1.3, 1.6, 2.5, 2.3, 2.8] y = [167.0, 180.3, 177.8, 160.4, 179.6, 154.3] z = [-0.3, -0.8, -0.75, -1.21, -1.65, -0.68] f = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6]).transpose() G = np.c_[x, y, z] A = np.concatenate((G, np.ones((G.shape[0],1))), axis=1) C, _, _, _ = scipy.linalg.lstsq(A, f) # C will have now the coefficients for: # f(x, y, z) = ax + by + cz + d # quadratic eq. dim = G.shape[1] A = np.concatenate((G**2, np.array([np.prod(G[:, k], axis=1) for k in combinations(range(dim), dim-1)]).transpose(), G, np.ones((G.shape[0], 1))), axis=1) C, _, _, _ = scipy.linalg.lstsq(A, f) # C will have now the coefficients for: # f(x, y, z) = ax**2 + by**2 + cz**2 + dxy+ exz + fyz + gx + hy + iz + j # This can be used then: def quadratic(a): dim = a.shape[0] A = np.concatenate((a**2, np.array([np.prod(a[k,]) for k in combinations(range(dim), dim-1)]), a, [1])) return np.sum(np.dot(A, C)) for i in range(G.shape[0]): print(quadratic(G[i,:]), f[i])``` You can easily extend to `n` dimensions.

### 8556732 commented Mar 1, 2017

 Hey, python newb here but learning fast. First of all thanks for the code and method, I've adapted some of it to my PhD work. I was wondering how you would go about projecting the contours of the resultant surface onto a 2D plot. Normally I would do this by plotting polylines for various values of X, Y and Z. Is there a simpler way of just projecting the surface onto one of the axes in 2d?

### alessandropfn commented Mar 31, 2017

 Everything works, thanks! Now, what I have to do if I want to get the equation of the plane?

### 8556732 commented May 5, 2017

 In my mind, to get the equation of the plane one just uses the coefficients to evaluate the original equation used to plot the surface. So in the case of the linear plot, the form is: ``````Z = C[0]*X + C[1]*Y + C[2] `````` With your A, B and C constants respectively from the line "print C", wherein C was defined as the constants for the surface in the function

### keerthanamanivannan commented Nov 3, 2017

 @sirgogo what do you mean stuffing all the errors in Z variable? Can you explain a little bit please?

### wiki-yu commented Feb 25, 2018

 Thanks for the code. By using the linear part, what we got is minimizing the vertical offsets, not perpendicular offsets. I am wondering do you have any idea how to do implementation for perpendicular offsets?

### sjboege commented May 15, 2018 • edited

 Note that if you want to adapt this for your own data put x values in data[:, 0], y values in data[:, 1] and z values in data[:, 2]. For example, ZZ values can be calculated using C[0]=0.0200290632278191 C[1] = 0.0500726580695476 C[2] = 10.0145316139095 for XX and YY. X,Y = np.meshgrid(np.arange(2, 11, 2), np.arange(1, 6, 1)) XX = X.flatten() YY = Y.flatten() ZZ = np.array([-10.1046624, -10.14472052, -10.18477865, -10.22483678, -10.2648949, -10.15473506, -10.19479318, -10.23485131, -10.27490944, -10.31496756, -10.20480771, -10.24486584, -10.28492397, -10.32498209, -10.36504022, -10.25488037, -10.2949385, -10.33499663, -10.37505475, -10.41511288, -10.30495303, -10.34501116, -10.38506928, -10.42512741, -10.46518554]) data = np.column_stack((XX, YY, ZZ)) The first order fit returns the correct results.

### DennisScuba commented Jun 28, 2018

 Hi, thank you very much for the code, it's very helpful! Just I have the problem, that I have to use a function like the following: h= acos(wt) + bsin(wt) + ct² + dt + e where a,b,c,d,e and w is unknown. How can I apply this to the A matrix? Should I apply a taylor polynom and suggest values for frequency w? Thank you very much for your help!

### madgrizzle commented Dec 3, 2018

 How would you fit to 3rd degree (cubic) equation?

### ramiromena commented Jan 11, 2019 • edited

 Hi all, First of all, thanks for the code. It is really useful. As @madgrizzle stated before, I am trying to implement the cubic option. For the quadratic function, the code generates 6 coefficients ( `XX, YY, XX*YY, XX**2, YY**2` and the constant) Then, for the cubic function, 8 coefficients are expected. However, if I use the Pascal's triangle, 10 coefficients should be required (the 6 of the quadratic case + `XX**2*YY, XX*YY**2, XX**3, YY**3`) Then, there are two terms that are dropped and I do not why. I tried to read the documentation for scipy.linalg.lstsq, but I couldn't find any explanation. Any suggestion or reference will be appreciated. Thanks in advance.

### PatrickTanta commented Jan 17, 2019

 To answer my own question, the 2nd-order equation, using the coefficients returned in C: `Z = C[4]*X**2. + C[5]*Y**2. + C[3]*X*Y + C[1]*X + C[2]*Y + C[0]` did you solve the problem ? i have the same dude u.u

### Lukxl commented Jan 16, 2020

 Thanks for this good code. I do have one problem, I don't see the plane when I run this code. I only get my data points. Any Idea?

### amroamroamro commented Jan 16, 2020

 @Lukxl maybe you need to adjust the stride/count params in the surface plot function to fit your data range: ``````ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.2, linewidth=0.5, edgecolor='b') `````` Refer to docs:

### Lukxl commented Jan 16, 2020 • edited

 Thank you for this idea. I'll check and try this Edit: Still I don't see the surface. I only get one line in my 3d plot. Is it possible that the fitted surface results in a 2D line?

### amroamroamro commented Jan 16, 2020

 @Lukxl I'm assuming you adjusted the grid in the code to cover your actual data range? ```# regular grid covering the domain of the data mn = np.min(data, axis=0) mx = np.max(data, axis=0) X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20))``` You should also make sure that the data is well-behaved (i.e look at the scatter plot) and that the fit didn't result in a degenerate model (manually inspect the coefficient values in `C`)

### Lukxl commented Jan 17, 2020

 I did. I figured out how I can do it. I needed another fit function as you used. Thanks for the well support.

### MSchmidt99 commented Mar 8, 2020 • edited

 @madgrizzle I took the time to linearize this for up to the nth degree over the afternoon. You will want to make sure that all your data has 64 bit datatype (ex: np.int64(x) or np.float64(z)) because at higher degree calculations things will begin to break when overflow warnings come up. This is what I threw together: ``````# functions # in: [meshgrid], [meshgrid], [np list of coordinate pair np lists. ex: [[x1,y1,z1], [x2,y2,z2], etc.] ], [degree] # out: [Z] def curve(X, Y, coord, n): XX = X.flatten() YY = Y.flatten() # best-fit curve A = XYchooseN(coord[:,0], coord[:,1], n) C,_,_,_ = scipy.linalg.lstsq(A, coord[:,2]) print("Calculating C in curve() done") # evaluate it on a grid Z = np.dot(XYchooseN(XX, YY, n), C).reshape(X.shape) return Z # in: [array], [array], [int] # out: sum from k=0 to k=n of n choose k for x^n-k * y^k (coefficients ignored) def XYchooseN(x,y,n): XYchooseN = [] n = n+1 for j in range(len(x)): I = x[j] J = y[j] matrix = [] Is = [] Js = [] for i in range(0,n): Is.append(I**i) Js.append(J**i) matrix.append(np.concatenate((np.ones(n-i),np.zeros(i)))) Is = np.array(Is) Js = np.array(Js)[np.newaxis] IsJs0s = matrix * Is * Js.T IsJs = [] for i in range(0,n): IsJs = np.concatenate((IsJs,IsJs0s[i,:n-i])) XYchooseN.append(IsJs) return np.array(XYchooseN) X,Y = np.meshgrid(x_data,y_data) # surface meshgrid Z = curve(X, Y, coord, 3) # todo: cmap by avg from (x1,y1) to (x2,y2) of |Z height - scatter height| ax.plot_surface(X, Y, Z, rstride=1, cstride=1, alpha=0.5, color='r') ``````

### 13Ducks commented Mar 17, 2020

 @MSchmidt99 Thanks for the code. I am confused about what the "coord" variable represents. Thanks.

### MSchmidt99 commented Mar 18, 2020 • edited

 @13Ducks Sorry about that, in my comment I made it look like a tuple. coord is just a numpy array of shape (n, 3) where n is the number of coordinates, and 3 is the graph dimensions with index 0 as X, 1 as Y, and 2 as Z. I have edited the function description in my earlier comment to make it less misleading.

### karlmolina commented Apr 2, 2020

 @MSchmidt99 What is x_data and y_data?

### GrljAles commented Apr 9, 2020

 Would it be possible to use this to detrend my data, and how would I approach this?

### EmanuelNazareth commented May 2, 2020 • edited

 Excellent code, very handy. For those interested in R2 and R2 adjusted here's the code: from sklearn.metrics import r2_score z_pred= C[0]*np.array(x) + C[1]*np.array(y) + C[2] #print R2 r2_score(z, z_pred) def Adj_r2(Rsquared,n,p): #n = number of data points, p = independent variables output = 1-(1-Rsquared)*(n-1)/(n-p-1) return output #print R2_adjusted Adj_r2(r2_score(z, z_calc),102,3)

### gabrilmf92 commented Jul 9, 2020 • edited

 Hi everyone, Dear @amroamroamro, In relation to your original code, I have x,y,z-axis data stored in 3 lists named x, y and z. The structure of my table data is the next: 0 | 2 | 4 13.7 | 4436.95646 | 4774.21409 14.3 | 4311.55363 | 4688.34413 14.8 | 4299.82329 | 4687.26375 15.3 | 4164.97164 | 4587.78392 where the values of 0 column are the values of x axis, 2 and 4 the values of y axis and the rest of values the z axis I would like to get an equation like this: a·x^2·y^2 +b·x^2·y + c·x^2 + d·x·y^2 + e·x·y +f·x + g·y^2 + h·y + i How can I modify the code to get the equation? A little bit more explanation would be very helpful. Thanks.

### amroamroamro commented Jul 9, 2020

 @gabrilmf92 I believe I already answered something similar, see above comments: ```# your 3-dim data x = [13.7, 14.3, 14.8, 15.3] y = [4436.95646, 4311.55363, 4299.82329, 4164.97164] z = [4774.21409, 4688.34413, 4687.26375, 4587.78392] data = np.c_[x,y,z] # regular grid covering the domain of the data mn = np.min(data, axis=0) mx = np.max(data, axis=0) X,Y = np.meshgrid(np.linspace(mn[0], mx[0], 20), np.linspace(mn[1], mx[1], 20)) XX = X.flatten() YY = Y.flatten() order = 2 # quadratic fit``` PS: your data consists of merely 4 points looking almost perfectly co-planar, so a linear fit would suffice.

### dnoon20 commented Aug 23, 2020

 How can I modify my surface so that the bounds are from a polygonal domain (in that instead of a standard x/y domain, the area within the 2d polygon represents the limits of the surface)?

### alexlib commented Aug 27, 2020 • edited

 Here is a cubic version ``````elif order == 3: # M = [ones(size(x)), x, y, x.^2, x.*y, y.^2, x.^3, x.^2.*y, x.*y.^2, y.^3] A = np.c_[np.ones(data.shape[0]), data[:,:2], data[:,0]**2, np.prod(data[:,:2], axis=1), \ data[:,1]**2, data[:,0]**3, np.prod(np.c_[data[:,0]**2,data[:,1]],axis=1), \ np.prod(np.c_[data[:,0],data[:,1]**2],axis=1), data[:,2]**3] C,_,_,_ = scipy.linalg.lstsq(A, data[:,2]) Z = np.dot(np.c_[np.ones(XX.shape), XX, YY, XX**2, XX*YY, YY**2, XX**3, XX**2*YY, XX*YY**2, YY**3], C).reshape(X.shape) ``````

### mirgilani commented Sep 7, 2020 • edited

 Dear @alexlib Thanks for code for cubic oder. I have print the coefficient . unfortunately I cannot understand which coefficient belong to which one. If I am not mistaken : print(c) give coefficient like : c[6]x^3 + c[9]y^3 + c[7]x^2 y + c[8] x y^2 + c[3] x^2 + c[5] y^2 + c[4] xy + c[1]x + c[2] y + c[0]. but the result is really odd. I will appriciate for any help

### alexlib commented Sep 16, 2020

 Dear @alexlib Thanks for code for cubic oder. I have print the coefficient . unfortunately I cannot understand which coefficient belong to which one. If I am not mistaken : print(c) give coefficient like : c[6]x^3 + c[9]y^3 + c[7]x^2 y + c[8] x y^2 + c[3] x^2 + c[5] y^2 + c[4] xy + c[1]x + c[2] y + c[0]. but the result is really odd. I will appriciate for any help please see the fork https://gist.github.com/alexlib/05abf4e42a0341047b1da12e69c2f3a3

### PhuDoTuong commented Jun 30, 2021

 Does anyone try to use Ceres Solver ?

### lukaszprawdziwy commented Sep 16, 2021 • edited

 Hello, Thank you for the handy code. Is there a way to fit a surface with an exponential function? I think of something like: c[0] + c[1] * x + c[2] * y+ c[3] * np.exp(c[4] * x) + c[5] * (np.exp(c[6] * y) how to feed it to scipy.linalg.lstsq? thanks