Skip to content

Instantly share code, notes, and snippets.

@jaganadhg
Last active December 21, 2021 06:33
Show Gist options
  • Save jaganadhg/b3f6af86ad99bf6e9bb7be21e5baa1b5 to your computer and use it in GitHub Desktop.
Save jaganadhg/b3f6af86ad99bf6e9bb7be21e5baa1b5 to your computer and use it in GitHub Desktop.
KMeans cosine
from sklearn.cluster import k_means_
from sklearn.metrics.pairwise import cosine_similarity, pairwise_distances
from sklearn.preprocessing import StandardScaler
def create_cluster(sparse_data, nclust = 10):
# Manually override euclidean
def euc_dist(X, Y = None, Y_norm_squared = None, squared = False):
#return pairwise_distances(X, Y, metric = 'cosine', n_jobs = 10)
return cosine_similarity(X, Y)
k_means_.euclidean_distances = euc_dist
scaler = StandardScaler(with_mean=False)
sparse_data = scaler.fit_transform(sparse_data)
kmeans = k_means_.KMeans(n_clusters = nclust, n_jobs = 20, random_state = 3425)
_ = kmeans.fit(sparse_data)
return kmeans.labels_
@andrewaziz
Copy link

Hi! I've been testing out this code, however i get the following error:
IndexError: index N is out of bounds for axis 0 with size N.

Have you crossed this error during the testing of your code? Any help is extremely appreciated!

Regards,
Andrew

@jiangchao123
Copy link

cant work!

@2017alan
Copy link

i have run those code with sklearn version 0.20.3 , and before i input the data to sklearn i transform the data type to np.float64 .
see this may help you ,bug for out of index
scikit-learn/scikit-learn#7705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment