Skip to content

Instantly share code, notes, and snippets.

@StefRe
Last active November 1, 2023 20:23
Show Gist options
  • Save StefRe/f02055e4044d267cd856eee54d16134f to your computer and use it in GitHub Desktop.
Save StefRe/f02055e4044d267cd856eee54d16134f to your computer and use it in GitHub Desktop.
Testing Coefficients of Variation from multiple samples
import numpy as np
import scipy
def feltz_miller(*samples):
k = len(samples)
m_j = np.array([len(sample) - 1 for sample in samples])
cv_j = np.array([np.std(sample, ddof=1) / np.mean(sample) for sample in samples])
d = np.sum(m_j * cv_j) / np.sum(m_j)
d_ad = np.sum(m_j * (cv_j - d) ** 2) / (d**2 * (0.5 + d**2))
return d_ad, scipy.stats.chi2.sf(d_ad, k - 1)
@StefRe
Copy link
Author

StefRe commented Nov 1, 2023

This is a straightforward translation of Ben Marwick's R code into Python. For details see the cvequality vignette.

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/benmarwick/cvequality/master/vignettes/GaltonFamilies.csv")
male = df.loc[df.gender.eq("male"), "childHeight"]
female = df.loc[df.gender.eq("female"), "childHeight"]
d_ad, p = feltz_miller(male, female)
print(d_ad, p)  # 0.44163996742937056 0.5063319732170684

df = pd.read_csv("https://raw.githubusercontent.com/benmarwick/cvequality/master/vignettes/Handaxes.csv")
d_ad, p = feltz_miller(df.L, df.L1, df.B, df.B1, df.B2, df["T"], df.T1)
print(d_ad, p)  # 309.191946847004 8.766100959546317e-64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment