Skip to content

Instantly share code, notes, and snippets.

# larsmans/hellinger.py

Created Jul 15, 2012
Hellinger distance for discrete probability distributions in Python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 """ Three ways of computing the Hellinger distance between two discrete probability distributions using NumPy and SciPy. """ import numpy as np from scipy.linalg import norm from scipy.spatial.distance import euclidean _SQRT2 = np.sqrt(2) # sqrt(2) with default precision np.float64 def hellinger1(p, q): return norm(np.sqrt(p) - np.sqrt(q)) / _SQRT2 def hellinger2(p, q): return euclidean(np.sqrt(p), np.sqrt(q)) / _SQRT2 def hellinger3(p, q): return np.sqrt(np.sum((np.sqrt(p) - np.sqrt(q)) ** 2)) / _SQRT2

### EvgeniDubov commented Jul 9, 2018

In case anyone is interested, I've implemented Hellinger Distance in Cython as a split criterion for sklearn DecisionTreeClassifier and RandomForestClassifier.
It performs great in my use cases of imbalanced data classification, beats RandomForestClassifier with gini and XGBClassifier.
You are welcome to check it out on https://github.com/EvgeniDubov/hellinger-distance-criterion

to join this conversation on GitHub. Already have an account? Sign in to comment