Skip to content

Instantly share code, notes, and snippets.

@rth
Created November 8, 2016 12:02
Show Gist options
  • Save rth/b79c9ec93981242de3d809469660f999 to your computer and use it in GitHub Desktop.
Save rth/b79c9ec93981242de3d809469660f999 to your computer and use it in GitHub Desktop.
A quick benchmark for computing a metric based on the cosine distance
from math import sqrt, acos
import numpy as np
def make_test_arrays(N, norm=True):
X = np.random.rand(N)
Y = np.random.rand(N)
if norm:
X /= np.linalg.norm(X)
Y /= np.linalg.norm(Y)
return X, Y
def euclidean_dist(X, Y):
XY = X - Y
return sqrt(XY.dot(XY))
def cosine_dist(X, Y):
"""Not a metric in general"""
X = X / np.linalg.norm(X)
Y = Y / np.linalg.norm(Y)
return 1 - X.dot(Y)
def cosine_dist_spherical(X, Y):
"""Assumes that X, Y are L2 normalized,
this is a metric, equal to euclidean_dist/2
"""
return 1 - X.dot(Y)
def arccosine_dist_spherical(X, Y):
"""Assumes that X, Y are L2 normalized, just to check
the acos overhead"""
return acos(X.dot(Y))
X, Y = make_test_arrays(10000, norm=True)
%timeit euclidean_dist(X, Y)
%timeit cosine_dist(X, Y)
%timeit cosine_dist_spherical(X, Y)
%timeit arccosine_dist_spherical(X, Y)
## Returns
# 100000 loops, best of 3: 11 µs per loop
# The slowest run took 5.40 times longer than the fastest. This could mean that an intermediate result is being cached.
# 10000 loops, best of 3: 43.6 µs per loop
# The slowest run took 18.30 times longer than the fastest. This could mean that an intermediate result is being cached.
# 100000 loops, best of 3: 4.73 µs per loop
# The slowest run took 5.81 times longer than the fastest. This could mean that an intermediate result is being cached.
# 100000 loops, best of 3: 4.41 µs per loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment