Skip to content

Instantly share code, notes, and snippets.

@JDWarner
Last active April 20, 2024 01:38
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save JDWarner/6730886 to your computer and use it in GitHub Desktop.
Save JDWarner/6730886 to your computer and use it in GitHub Desktop.
Jaccard coefficient between two boolean NumPy arrays or array-like data. This is commonly used as a set similarity metric, and it is a true metric. The dimensionality of the input is completely arbitrary, but `im1.shape` and `im2.shape` much be equal. This Gist is licensed under the modified BSD license, otherwise known as the 3-clause BSD.
"""
_jaccard.py : Jaccard metric for comparing set similarity.
"""
import numpy as np
def jaccard(im1, im2):
"""
Computes the Jaccard metric, a measure of set similarity.
Parameters
----------
im1 : array-like, bool
Any array of arbitrary size. If not boolean, will be converted.
im2 : array-like, bool
Any other array of identical size. If not boolean, will be converted.
Returns
-------
jaccard : float
Jaccard metric returned is a float on range [0,1].
Maximum similarity = 1
No similarity = 0
Notes
-----
The order of inputs for `jaccard` is irrelevant. The result will be
identical if `im1` and `im2` are switched.
"""
im1 = np.asarray(im1).astype(np.bool)
im2 = np.asarray(im2).astype(np.bool)
if im1.shape != im2.shape:
raise ValueError("Shape mismatch: im1 and im2 must have the same shape.")
intersection = np.logical_and(im1, im2)
union = np.logical_or(im1, im2)
return intersection.sum() / float(union.sum())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment