Skip to content

Instantly share code, notes, and snippets.

@brentp
Created September 8, 2010 21:45
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save brentp/570896 to your computer and use it in GitHub Desktop.
Save brentp/570896 to your computer and use it in GitHub Desktop.
import numpy as np
from scipy.stats import chisqprob, chisquare
def gtest(f_obs, f_exp=None, ddof=0):
"""
http://en.wikipedia.org/wiki/G-test
The G test can test for goodness of fit to a distribution
Parameters
----------
f_obs : array
observed frequencies in each category
f_exp : array, optional
expected frequencies in each category. By default the categories are
assumed to be equally likely.
ddof : int, optional
adjustment to the degrees of freedom for the p-value
Returns
-------
chisquare statistic : float
The chisquare test statistic
p : float
The p-value of the test.
Notes
-----
The p-value indicates the probability that the observed distribution is
drawn from a distribution given frequencies in expected.
So a low p-value inidcates the distributions are different.
Examples
--------
>>> gtest([9.0, 8.1, 2, 1, 0.1, 20.0], [10, 5.01, 6, 4, 2, 1])
(117.94955444335938, 8.5298516190930345e-24)
>>> gtest([1.01, 1.01, 4.01], [1.00, 1.00, 4.00])
(0.060224734246730804, 0.97033649350189344)
>>> gtest([2, 1, 6], [4, 3, 2])
(8.2135343551635742, 0.016460903780063787)
References
----------
http://en.wikipedia.org/wiki/G-test
"""
f_obs = np.asarray(f_obs, 'f')
k = f_obs.shape[0]
f_exp = np.array([np.sum(f_obs, axis=0) / float(k)] * k, 'f') \
if f_exp is None \
else np.asarray(f_exp, 'f')
g = 2 * np.add.reduce(f_obs * np.log(f_obs / f_exp))
return g, chisqprob(g, k - 1 - ddof)
if __name__ == "__main__":
import doctest
doctest.testmod()
@IgorFobia
Copy link

Thank you for this script. However, it works only if each category has the same sample size. I hope I'll make a version for contingency tables.

@wiso
Copy link

wiso commented Jan 25, 2016

the sum should be done only for non-empty cells

@user799595
Copy link

This is now possible with scipy:

scipy.stats.power_divergence(f_obs, f_exp, lambda_="log-likelihood")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment