Skip to content

Instantly share code, notes, and snippets.

@willtownes
Last active February 17, 2022 16:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save willtownes/e642e1830ee1f3740ec3408b22208827 to your computer and use it in GitHub Desktop.
Save willtownes/e642e1830ee1f3740ec3408b22208827 to your computer and use it in GitHub Desktop.
Is LOESS a special case of a Gaussian process?
library(pdist)
n<-200
X<-matrix(10*runif(n),ncol=1)
y<-sin(X[,1])#+rnorm(n,sd=.2)
#plot(X[,1],y)
#xnew<-3
#span<-1
my_loess<-function(xnew,X,y,span=.75){
#xnew is a vector with length=ncol(X)
#nn=number of nearest neighbors to consider
nn=ceiling(length(y)*span)
d<-sqrt(colSums((xnew-t(X))^2))
rk<-rank(d,ties.method="random")
ii<- rk<=nn
d[ii]<-d[ii]/max(d[ii])
w<-0*d
w[ii]<-(1-abs(d[ii])^3)^3
X<-cbind(1,X)
c(1,xnew)%*%solve(crossprod(X,w*X),crossprod(X,w*y))
}
my_loess_vec<-function(Xnew,X,y,span=.75){
apply(Xnew,1,my_loess,X,y,span)
}
Xnew<-matrix(seq(from=-10,to=20,length.out=100),ncol=1)
ynew<-my_loess_vec(Xnew,X,y,span=1)
plot(X,y,xlim=c(-5,15))
lines(Xnew,ynew)
fit<-loess(y~X,span=1,degree=1)
ydefault<-predict(fit,Xnew)
lines(Xnew,ydefault,col="red")
my_gp<-function(Xnew,X,y){
Xnew<-cbind(1,Xnew)
X<-cbind(1,X)
D<-as.matrix(dist(X))
M<-max(D)
K<-(1-(D/M)^3)^3
Dnew<-as.matrix(pdist(Xnew,X))
Knew<-(1-(Dnew/M)^3)^3
Knew%*%solve(K,y)
}
ynew2<-my_gp(Xnew,X,y)
lines(Xnew,ynew2,col="blue")
legend("bottomleft",c("loess_default","loess_custom","GP"),lty=rep(1,3),col=c("black","red","blue"))
#possible extension of tricube kernel to real domain
tricube<-function(d){
u<-2/(1+exp(-d))-1 #map reals to (-1,1)
70/81*(1-abs(u)^3)^3
}
curve(tricube,from=-10,to=10)
@willtownes
Copy link
Author

TL/DR: yes it does appear to be a special case. Discussion thread: https://twitter.com/thebasepoint/status/1263680394689310720

@willtownes
Copy link
Author

Thanks to @andrewcharlesjones who identified a critical bug (I was plotting ynew instead of ynew2). Now that the bug is fixed, it's clear that the GP is in fact NOT the same as loess. Andy also points out that the tricube kernel is not positive semidefinite on an unbounded domain and hence not a great choice for a GP kernel. I also removed the noise from the simulation because I only implemented the noiseless GP and fitting it to noisy data actually led to very unstable predictions.

image

@willtownes
Copy link
Author

@andrewcharlesjones
Copy link

Thanks Will! This is so interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment