Skip to content

Instantly share code, notes, and snippets.

@michaelguia
Created June 7, 2018 02:10
Show Gist options
  • Save michaelguia/075681a1f144437890923bd88490757f to your computer and use it in GitHub Desktop.
Save michaelguia/075681a1f144437890923bd88490757f to your computer and use it in GitHub Desktop.

SUMMARY: Hierarchical cluster analysis

  1. Hierarchical cluster analysis of n objects is defined by a stepwise algorithm which merges two objects at each step, the two which have the least dissimilarity.

  2. Dissimilarities between clusters of objects can be defined in several ways; for example, the maximum dissimilarity (complete linkage), minimum dissimilarity (single linkage) or average dissimilarity (average linkage).

  3. Either rows or columns of a matrix can be clustered – in each case we choose the appropriate dissimilarity measure that we prefer.

  4. The results of a cluster analysis is a binary tree, or dendrogram, with n–1 nodes. The branches of this tree are cut at a level where there is a lot of 'space' to cut them, that is where the jump in levels of two consecutive nodes is large.

  5. A permutation test is possible to validate the chosen number of clusters, that is to see if there really is a non-random tendency for the objects to group together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment