-
-
Save agramfort/557193 to your computer and use it in GitHub Desktop.
from scipy import stats | |
class GaussianKernelDensityEstimation(object): | |
"""docstring for GaussianKernelDensityEstimation""" | |
def __init__(self): | |
self.gkde = None | |
def fit(self, X): | |
"""docstring for fit""" | |
self.gkde = stats.gaussian_kde(X.T) | |
return self | |
def predict(self, X): | |
return self.gkde.evaluate(X.T) | |
if __name__ == '__main__': | |
import numpy as np | |
# Create some dummy data | |
X = np.random.randn(400,2) | |
X[200:,:] += 3.5 | |
# Regular grid to evaluate kde upon | |
n_grid_points = 128 | |
xmin, ymin = X.min(axis=0) | |
xmax, ymax = X.max(axis=0) | |
xx = np.linspace(xmin - 0.5, xmax + 0.5, n_grid_points) | |
yy = np.linspace(ymin - 0.5, ymax + 0.5, n_grid_points) | |
xg, yg = np.meshgrid(xx, yy) | |
grid_coords = np.c_[xg.ravel(), yg.ravel()] | |
# Compute density estimation | |
kde = GaussianKernelDensityEstimation() | |
kde.fit(X) # Fit | |
zz = kde.predict(grid_coords) # Evaluate density on grid points | |
zz = zz.reshape(*xg.shape) | |
import matplotlib.pylab as pl | |
pl.close('all') | |
pl.set_cmap(pl.cm.Paired) | |
pl.figure() | |
pl.contourf(xx, yy, zz, label='density') | |
pl.scatter(X[:,0], X[:,1], color='k', label='samples') | |
pl.axis('tight') | |
pl.legend() | |
pl.title('Kernel Density Estimation') | |
pl.show() |
Yes we can (TM). We can use our ball tree to select only the data points from the training set that contribute significantly to a given test point. The problem is that with this strategy, it is harder to vectorize the 'predict' on the test data. Whether scipy's default behavior or the one I am suggesting is the best depends the 'shape' of you data: number of training point, number of testing points, and number of dimensions.
That means that we need a heuristic to decide which strategy to apply.
Also, grouping the test points that are close together to select a common 'active set' for them, and thus vectorizing on that bunch the predict would help.
There is some work to do to get a decent KDE :)
ok so we'll say that this is work in progress... At least the discussion was constructive :)
argh ! I didn't know. Any chance we can do better?