Skip to content

Instantly share code, notes, and snippets.

View marcoscastro's full-sized avatar

Marcos Castro de Souza marcoscastro

View GitHub Profile
@rpgove
rpgove / README.md
Last active April 15, 2024 10:12
Using the elbow method to determine the optimal number of clusters for k-means clustering

K-means is a simple unsupervised machine learning algorithm that groups a dataset into a user-specified number (k) of clusters. The algorithm is somewhat naive--it clusters the data into k clusters, even if k is not the right number of clusters to use. Therefore, when using k-means clustering, users need some way to determine whether they are using the right number of clusters.

One method to validate the number of clusters is the elbow method. The idea of the elbow method is to run k-means clustering on the dataset for a range of values of k (say, k from 1 to 10 in the examples above), and for each value of k calculate the sum of squared errors (SSE). Like this:

var sse = {};
for (var k = 1; k <= maxK; ++k) {
    sse[k] = 0;
    clusters = kmeans(dataset, k);
    clusters.forEach(function(cluster) {

mean = clusterMean(cluster);

@LiorZ
LiorZ / kmeans.py
Last active September 25, 2023 01:22
KMeans clustering python script for biological sequences
#!/usr/bin/python
### Created by Lior Zimmerman (http://www.github.com/LiorZ) ###
### Distributed under MIT License (http://opensource.org/licenses/MIT) ###
import sys, getopt
from Bio import SeqIO,pairwise2
import Bio.SubsMat.MatrixInfo as matrices
import sklearn.cluster as cluster
@mejibyte
mejibyte / gist:1268157
Created October 6, 2011 18:15
Implementation of Ukkonen's algorithm to build a prefix tree in O(n)
using namespace std;
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <fstream>
#include <cassert>
#include <climits>
#include <cstdlib>
#include <cstring>