Last active
December 21, 2021 06:33
-
-
Save jaganadhg/b3f6af86ad99bf6e9bb7be21e5baa1b5 to your computer and use it in GitHub Desktop.
KMeans cosine
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.cluster import k_means_ | |
from sklearn.metrics.pairwise import cosine_similarity, pairwise_distances | |
from sklearn.preprocessing import StandardScaler | |
def create_cluster(sparse_data, nclust = 10): | |
# Manually override euclidean | |
def euc_dist(X, Y = None, Y_norm_squared = None, squared = False): | |
#return pairwise_distances(X, Y, metric = 'cosine', n_jobs = 10) | |
return cosine_similarity(X, Y) | |
k_means_.euclidean_distances = euc_dist | |
scaler = StandardScaler(with_mean=False) | |
sparse_data = scaler.fit_transform(sparse_data) | |
kmeans = k_means_.KMeans(n_clusters = nclust, n_jobs = 20, random_state = 3425) | |
_ = kmeans.fit(sparse_data) | |
return kmeans.labels_ |
cant work!
i have run those code with sklearn version 0.20.3 , and before i input the data to sklearn i transform the data type to np.float64 .
see this may help you ,bug for out of index
scikit-learn/scikit-learn#7705
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi! I've been testing out this code, however i get the following error:
IndexError: index N is out of bounds for axis 0 with size N.
Have you crossed this error during the testing of your code? Any help is extremely appreciated!
Regards,
Andrew