Skip to content

Instantly share code, notes, and snippets.

@kashifulhaque
Created June 18, 2024 17:22
Show Gist options
  • Save kashifulhaque/12eb2155916452cfdb99a6f52a532271 to your computer and use it in GitHub Desktop.
Save kashifulhaque/12eb2155916452cfdb99a6f52a532271 to your computer and use it in GitHub Desktop.
Some commonly used distance metrics in ML

In order to check similarity between 2 vectors, we can check the distance between them. There are a few different metrics to measure distance between 2 vectors.

import numpy as np

1. Cosine similarity

Measures the angle between 2 non-zero vectors. It ranges between $[-1, 1]$ $$cos (\theta) = \frac{\vec{p} \cdot \vec{q}}{||p|| \text{ } ||q||}$$

def cosine_similarity(p, q):
  dot_prod = np.dot(p, q)
  norm1 = np.linalg.norm(p)
  norm2 = np.linalg.norm(q)
  
  return dot_prod / (norm1 * norm2)

2. Euclidean distance

Measures the straight line distance between 2 vectors $$d_{\text{euclidean}}(\vec{p}, \vec{q}) = || \vec{p} - \vec{q} ||$$

def euclidean_distance(p, q):
  return np.linalg.norm(p - q)

3. Manhattan distance (L1 distance)

Measures the distance between two points in a grid based on the sum of the absolute differences of their coordinates. $$d_{\text{manhattan}}(\vec{p}, \vec{q}) = \sum |\vec{p} - \vec{q}|$$

def manhattan_distance(p, q):
  return np.sum(np.abs(p - q))

4. Cosine distance

Cosine distance is related to cosine similarity, but is defined as 1 - cosine_similarity

def cosine_distance(p, q):
  return 1 - np.dot(p, q) / (np.linalg.norm(p) * np.linalg.norm(q))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment