{{ message }}

Instantly share code, notes, and snippets.

# agramfort/lowess.py

Last active Oct 16, 2020
LOWESS : Locally weighted regression
 """ This module implements the Lowess function for nonparametric regression. Functions: lowess Fit a smooth nonparametric regression curve to a scatterplot. For more information, see William S. Cleveland: "Robust locally weighted regression and smoothing scatterplots", Journal of the American Statistical Association, December 1979, volume 74, number 368, pp. 829-836. William S. Cleveland and Susan J. Devlin: "Locally weighted regression: An approach to regression analysis by local fitting", Journal of the American Statistical Association, September 1988, volume 83, number 403, pp. 596-610. """ # Authors: Alexandre Gramfort # # License: BSD (3-clause) from math import ceil import numpy as np from scipy import linalg def lowess(x, y, f=2. / 3., iter=3): """lowess(x, y, f=2./3., iter=3) -> yest Lowess smoother: Robust locally weighted regression. The lowess function fits a nonparametric regression curve to a scatterplot. The arrays x and y contain an equal number of elements; each pair (x[i], y[i]) defines a data point in the scatterplot. The function returns the estimated (smooth) values of y. The smoothing span is given by f. A larger value for f will result in a smoother curve. The number of robustifying iterations is given by iter. The function will run faster with a smaller number of iterations. """ n = len(x) r = int(ceil(f * n)) h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)] w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0) w = (1 - w ** 3) ** 3 yest = np.zeros(n) delta = np.ones(n) for iteration in range(iter): for i in range(n): weights = delta * w[:, i] b = np.array([np.sum(weights * y), np.sum(weights * y * x)]) A = np.array([[np.sum(weights), np.sum(weights * x)], [np.sum(weights * x), np.sum(weights * x * x)]]) beta = linalg.solve(A, b) yest[i] = beta + beta * x[i] residuals = y - yest s = np.median(np.abs(residuals)) delta = np.clip(residuals / (6.0 * s), -1, 1) delta = (1 - delta ** 2) ** 2 return yest if __name__ == '__main__': import math n = 100 x = np.linspace(0, 2 * math.pi, n) y = np.sin(x) + 0.3 * np.random.randn(n) f = 0.25 yest = lowess(x, y, f=f, iter=3) import pylab as pl pl.clf() pl.plot(x, y, label='y noisy') pl.plot(x, yest, label='y pred') pl.legend() pl.show()

### nickv2002 commented Nov 6, 2013

 Thanks for this... it's very useful and easier than installing the statsmodels packages. Note that numpy has a value for pi so you can remove `import math` on line 59 and change `math.pi` to `np.pi`. I think you can also use `np.ceil` instead of importing `math.ceil` in the same respect.

### Zujio commented Aug 20, 2014

 Wonderful! Thank you for this.

### davidliwei commented May 15, 2015

 Thanks very much! Really useful.

### neale commented Jul 9, 2015

 This is great! Just as effective as anything else I've used and it avoids heavy dependencies.

### mzhao4 commented Sep 8, 2015

 Good code. But a minor suggestion that might take into consideration is that if input x is integers, the w matrix might have problem and result in singular matrix in solving linear systems. I would add a conversion sentence such as x = x.astype('float') before implementing w = np.clip(np.abs((x[:,None] - x[None,:]) / h), 0.0, 1.0). Or simply force h to be float.

### NelleV commented Oct 14, 2015

 Hi Alex, Do you mind adding a licence? Cheers, N

 done

### Xunius commented Mar 9, 2017

 Thanks for the code. Is it possible to extend this to 2D? I'm looking for a solution that works on data in the format of maps/images. Is it possible to allow the local subset to be non-square, e.g. 10 grid wide x 6 grid height?

### bbartley commented May 13, 2017

 This is really cool, but it doesn't quite do what I need. I guess what I'm looking for is a gist that estimates beta and the standard error for beta. If I were a little smarter I could probably figure out how to modify this.

### mridolfi commented Feb 7, 2018

 Thanks, this is very nice. However, I have an issue. When I change f I often get this error: `numpy.linalg.linalg.LinAlgError: Singular matrix` What can I do about it?

### rrealrangel commented Aug 9, 2019 • edited

 Thanks for the script. Just a comment: according to Cleveland (1979), "h_i is the rth smallest number among | x_i - x_j |, for j = 1, ..., n". So I think it would be necessary to add np.unique() in line 42, in the computation of h, as follows: h = [np.unique(np.sort(np.abs(x - x[i])))[r] for i in range(n)] to consider the case of equally separated values in x. Am I wrong? Thanks again.

### agramfort commented Aug 10, 2019

 I don't know and I don't have much time to look. Feel free to fork and update the code.

### kwhkim commented Aug 15, 2020

 Nice implemented. It would be great to have the functionality that deals with unequally weighted data points, so called weighted regression

### ericpre commented Aug 31, 2020

 For what it is worth with @tjof2, we have tweaked this version to avoid the `numpy.linalg.linalg.LinAlgError: Singular matrix` and speed it up with numba; see https://gist.github.com/ericpre/7a4dfba660bc8bb7499e7d96b8bdd4bb

### Anshuman-02905 commented Sep 1, 2020

 THis is very nice