Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Hellinger distance for discrete probability distributions in Python
"""
Three ways of computing the Hellinger distance between two discrete
probability distributions using NumPy and SciPy.
"""
import numpy as np
from scipy.linalg import norm
from scipy.spatial.distance import euclidean
_SQRT2 = np.sqrt(2) # sqrt(2) with default precision np.float64
def hellinger1(p, q):
return norm(np.sqrt(p) - np.sqrt(q)) / _SQRT2
def hellinger2(p, q):
return euclidean(np.sqrt(p), np.sqrt(q)) / _SQRT2
def hellinger3(p, q):
return np.sqrt(np.sum((np.sqrt(p) - np.sqrt(q)) ** 2)) / _SQRT2
@EvgeniDubov
Copy link

EvgeniDubov commented Jul 9, 2018

In case anyone is interested, I've implemented Hellinger Distance in Cython as a split criterion for sklearn DecisionTreeClassifier and RandomForestClassifier.
It performs great in my use cases of imbalanced data classification, beats RandomForestClassifier with gini and XGBClassifier.
You are welcome to check it out on https://github.com/EvgeniDubov/hellinger-distance-criterion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment