# larsmans/hellinger.py

Created Jul 15, 2012
Hellinger distance for discrete probability distributions in Python
 """ Three ways of computing the Hellinger distance between two discrete probability distributions using NumPy and SciPy. """ import numpy as np from scipy.linalg import norm from scipy.spatial.distance import euclidean _SQRT2 = np.sqrt(2) # sqrt(2) with default precision np.float64 def hellinger1(p, q): return norm(np.sqrt(p) - np.sqrt(q)) / _SQRT2 def hellinger2(p, q): return euclidean(np.sqrt(p), np.sqrt(q)) / _SQRT2 def hellinger3(p, q): return np.sqrt(np.sum((np.sqrt(p) - np.sqrt(q)) ** 2)) / _SQRT2

### EvgeniDubov commented Jul 9, 2018

In case anyone is interested, I've implemented Hellinger Distance in Cython as a split criterion for sklearn DecisionTreeClassifier and RandomForestClassifier.
It performs great in my use cases of imbalanced data classification, beats RandomForestClassifier with gini and XGBClassifier.
You are welcome to check it out on https://github.com/EvgeniDubov/hellinger-distance-criterion

