Skip to content

Instantly share code, notes, and snippets.

@fbrundu
Last active May 21, 2024 23:44
Show Gist options
  • Save fbrundu/7822020 to your computer and use it in GitHub Desktop.
Save fbrundu/7822020 to your computer and use it in GitHub Desktop.
Get k clusters from pandas dataframe using fastcluster. Use fastcluster to make a hierarchical clustering cropped to k clusters.
import fastcluster as fc
import pandas as pd
import scipy.cluster.hierarchy as sch
# define total number of cluster to obtain
k = 5
# define matrix path
mat_path = 'matrix.txt'
# load matrix
mat = pd.read_table(mat_path, index_col=0)
# clustering on columns?
clust_columns = True
if clust_columns:
mat = mat.T
# define fastcluster method and metric
method = 'complete'
metric = 'cosine'
# run fastcluster
clust_total = fc.linkage(mat, method=method, metric=metric)
# crop dendrogram to k
clust = sch.fcluster(clust_total, k, criterion='maxclust')
# clust to pandas Series
clust = pd.Series(clust, index=mat.index)
# write output to file
output_path = 'clust.txt'
clust.to_csv(output_path, sep='\t')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment