Skip to content

Instantly share code, notes, and snippets.

View AndreiZaitcev's full-sized avatar

Andrei Zaitcev AndreiZaitcev

View GitHub Profile
@AndreiZaitcev
AndreiZaitcev / canopy.py
Created April 10, 2016 12:41 — forked from gdbassett/canopy.py
Efficient python implementation of canopy clustering. (A method for efficiently generating centroids and clusters, most commonly as input to a more robust clustering algorithm.)
from sklearn.metrics.pairwise import pairwise_distances
import numpy as np
# X shoudl be a numpy matrix, very likely sparse matrix: http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csr_matrix.html#scipy.sparse.csr_matrix
# T1 > T2 for overlapping clusters
# T1 = Distance to centroid point to not include in other clusters
# T2 = Distance to centroid point to include in cluster
# T1 > T2 for overlapping clusters
# T1 < T2 will have points which reside in no clusters
# T1 == T2 will cause all points to reside in mutually exclusive clusters