{{ message }}

Instantly share code, notes, and snippets.

# agramfort/lowess.py

Last active Apr 3, 2022
LOWESS : Locally weighted regression
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 """ This module implements the Lowess function for nonparametric regression. Functions: lowess Fit a smooth nonparametric regression curve to a scatterplot. For more information, see William S. Cleveland: "Robust locally weighted regression and smoothing scatterplots", Journal of the American Statistical Association, December 1979, volume 74, number 368, pp. 829-836. William S. Cleveland and Susan J. Devlin: "Locally weighted regression: An approach to regression analysis by local fitting", Journal of the American Statistical Association, September 1988, volume 83, number 403, pp. 596-610. """ # Authors: Alexandre Gramfort # # License: BSD (3-clause) from math import ceil import numpy as np from scipy import linalg def lowess(x, y, f=2. / 3., iter=3): """lowess(x, y, f=2./3., iter=3) -> yest Lowess smoother: Robust locally weighted regression. The lowess function fits a nonparametric regression curve to a scatterplot. The arrays x and y contain an equal number of elements; each pair (x[i], y[i]) defines a data point in the scatterplot. The function returns the estimated (smooth) values of y. The smoothing span is given by f. A larger value for f will result in a smoother curve. The number of robustifying iterations is given by iter. The function will run faster with a smaller number of iterations. """ n = len(x) r = int(ceil(f * n)) h = [np.sort(np.abs(x - x[i]))[r] for i in range(n)] w = np.clip(np.abs((x[:, None] - x[None, :]) / h), 0.0, 1.0) w = (1 - w ** 3) ** 3 yest = np.zeros(n) delta = np.ones(n) for iteration in range(iter): for i in range(n): weights = delta * w[:, i] b = np.array([np.sum(weights * y), np.sum(weights * y * x)]) A = np.array([[np.sum(weights), np.sum(weights * x)], [np.sum(weights * x), np.sum(weights * x * x)]]) beta = linalg.solve(A, b) yest[i] = beta[0] + beta[1] * x[i] residuals = y - yest s = np.median(np.abs(residuals)) delta = np.clip(residuals / (6.0 * s), -1, 1) delta = (1 - delta ** 2) ** 2 return yest if __name__ == '__main__': import math n = 100 x = np.linspace(0, 2 * math.pi, n) y = np.sin(x) + 0.3 * np.random.randn(n) f = 0.25 yest = lowess(x, y, f=f, iter=3) import pylab as pl pl.clf() pl.plot(x, y, label='y noisy') pl.plot(x, yest, label='y pred') pl.legend() pl.show()

### nickv2002 commented Nov 6, 2013

Thanks for this... it's very useful and easier than installing the statsmodels packages.

Note that numpy has a value for pi so you can remove `import math` on line 59 and change `math.pi` to `np.pi`. I think you can also use `np.ceil` instead of importing `math.ceil` in the same respect.

### Zujio commented Aug 20, 2014

Wonderful! Thank you for this.

### davidliwei commented May 15, 2015

Thanks very much! Really useful.

### neale commented Jul 9, 2015

This is great! Just as effective as anything else I've used and it avoids heavy dependencies.

### mzhao4 commented Sep 8, 2015

Good code. But a minor suggestion that might take into consideration is that if input x is integers, the w matrix might have problem and result in singular matrix in solving linear systems. I would add a conversion sentence such as x = x.astype('float') before implementing w = np.clip(np.abs((x[:,None] - x[None,:]) / h), 0.0, 1.0). Or simply force h to be float.

### NelleV commented Oct 14, 2015

Hi Alex,
Do you mind adding a licence?
Cheers,
N

done

### Xunius commented Mar 9, 2017

Thanks for the code.
Is it possible to extend this to 2D? I'm looking for a solution that works on data in the format of maps/images. Is it possible to allow the local subset to be non-square, e.g. 10 grid wide x 6 grid height?

### bbartley commented May 13, 2017

This is really cool, but it doesn't quite do what I need. I guess what I'm looking for is a gist that estimates beta[1] and the standard error for beta[1]. If I were a little smarter I could probably figure out how to modify this.

### mridolfi commented Feb 7, 2018

Thanks, this is very nice. However, I have an issue. When I change f I often get this error:
`numpy.linalg.linalg.LinAlgError: Singular matrix`

What can I do about it?

### rrealrangel commented Aug 9, 2019 • edited

Thanks for the script. Just a comment: according to Cleveland (1979), "h_i is the rth smallest number among | x_i - x_j |, for j = 1, ..., n". So I think it would be necessary to add np.unique() in line 42, in the computation of h, as follows:

h = [np.unique(np.sort(np.abs(x - x[i])))[r] for i in range(n)]

to consider the case of equally separated values in x. Am I wrong?

Thanks again.

### kwhkim commented Aug 15, 2020

Nice implemented.
It would be great to have the functionality that deals with unequally weighted data points, so called weighted regression

### ericpre commented Aug 31, 2020

For what it is worth with @tjof2, we have tweaked this version to avoid the `numpy.linalg.linalg.LinAlgError: Singular matrix` and speed it up with numba; see https://gist.github.com/ericpre/7a4dfba660bc8bb7499e7d96b8bdd4bb

### Anshuman-02905 commented Sep 1, 2020

THis is very nice

### AyrtonB commented Mar 27, 2021

Thanks @agramfort, this was really handy in my research.

@ericpre, to remove the `Singular matrix` error is the only change required: `beta = np.linalg.lstsq(A, b)[0]`?

### ericpre commented Mar 28, 2021

@AyrtonB, I don't remember, you will need to figure it out yourself, sorry!

### AyrtonB commented Mar 28, 2021

No worries @ericpre, think I've managed to work it out.

I've implemented an sklearn compatible version of this that also allows quantile predictions and confidence intervals to be calculated. I've also added the option to specify the locations at which the local regressions are calculated which significantly reduces the run-time.

Details can be found here