Skip to content

Instantly share code, notes, and snippets.

@michaelChein
Last active January 19, 2019 18:10
Show Gist options
  • Save michaelChein/9e03a20675708e40cdfa496cce5545ca to your computer and use it in GitHub Desktop.
Save michaelChein/9e03a20675708e40cdfa496cce5545ca to your computer and use it in GitHub Desktop.
Generating a Euclidean matrix using numpy's broadcasting

Generating a Euclidean matrix using numpy's broadcasting

This is an algorithm for calculating a euclidean distance matrix between each point to each other point in a data set, written as part of a agglomerative clustering exercise, using numpy's broadcasting, but could be applied to any other calculation done on multidimensional data sets.

The basic concept is that, when adding or multiplying two vectors of sizes (m,1) and (1,m), numpy will broadcast (duplicate the vector) so that it allows the calculation. For example multiplying a vector [1,2,3,4,...10] with a transposed version of itself, will yield the multiplication table. For Example:

drawing

The same technique can be used for matrices. In this case, I was looking to generate a Euclidean distance matrix for the iris data set. The formula is:

Obtaining the table could obviously be done using two nested for loops, but can also be performed using matrix operations.

We take the data set (represented as 'A' in the illustration),

drawing

create a 3D copy of it, and rotate it to the orientation shown as 'B' in the illustration.

drawing

We then create another copy and rotate it as represented by 'C'.
drawing

B-C will generate a 3D cube ('D'), sized (m,m,n) which represents the calculation for each dimension of the original dataset vectors (n=4 in the iris data set).

drawing

Then, we simply raise the entire cube to the power of 2, sum the result in the axis shown in the illustration, and square.

The result is the multiplication matrix ('E').

drawing

Note- It's critical not to get the orientations confused when constructing A and B, obviously... :)

code:

import numpy as np
from sklearn.datasets import load_iris

Load data:
d, _ = load_iris(return_X_y=True)

def euc_matrix(A):
	# generate distance matrix:
	B = np.rot90(A[:,:,None],1,(1,2)) #[:,:,None] is needed to add a dimension
	C = np.rot90(B,1,(0,1))
	return np.sqrt(np.sum((B-C)**2, axis=2))

mat = euc_matrix(A)  



Michael Chein 04/01/2019

import numpy as np
from sklearn.datasets import load_iris
Load data:
d, _ = load_iris(return_X_y=True)
def euc_matrix(A):
# generate distance matrix:
B = np.rot90(A[:,:,None],1,(1,2)) #[:,:,None] is needed to add a dimension
C = np.rot90(B,1,(0,1))
return np.sqrt(np.sum((B-C)**2, axis=2))
mat = euc_matrix(A)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment