Skip to content

Instantly share code, notes, and snippets.

@timm
Created May 22, 2013 20:07
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save timm/5630491 to your computer and use it in GitHub Desktop.
Save timm/5630491 to your computer and use it in GitHub Desktop.
Python version of non-parametric hypothesis testing using Vargha and Delaney's A12 statistic.
class Rx:
"has the nums of a treatment, its name and rank"
def __init__(i,lst):
i.rx, i.lst = lst[0], lst[1:]
i.mean = sum(i.lst)/len(i.lst)
i.rank = 0
def __repr__(i):
return 'rank #%s %s at %s'%(i.rank,i.rx,i.mean)
def a12s(lst,rev=True,enough=0.66):
"sees if lst[i+1] has rank higher than lst[i]"
lst = [Rx(one) for one in lst]
lst = sorted(lst,key=lambda x:x.mean,reverse=rev)
one = lst[0]
rank = one.rank = 1
for two in lst[1:]:
if a12(one.lst,two.lst,rev) > enough: rank += 1
two.rank = rank
one = two
return lst
def a12(lst1,lst2,rev=True):
"how often is x in lst1 more than y in lst2?"
more = same = 0.0
for x in lst1:
for y in lst2:
if x==y : same += 1
elif rev and x > y : more += 1
elif not rev and x < y : more += 1
return (more + 0.5*same) / (len(lst1)*len(lst2))
def fromFile(f="a12.dat",rev=True,enough=0.66):
"utility for reading sample data from disk"
import re
cache = {}
num, space = r'^\+?-?[0-9]', r'[ \t\n]+'
for line in open(f):
line = line.strip()
if line:
for word in re.split(space,line):
if re.match(num,word[0]):
cache[now] += [float(word)]
else:
now = word
cache[now] = [now]
return a12s(cache.values(),rev,enough)
@timm
Copy link
Author

timm commented May 22, 2013

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; The A12 non-parametric test
; Tim Menzies, (c) 2013, tim@menzies.us
; (c) http://creativecommons.org/licenses/by/3.0/
;
; The Vargha and Delaney's A12 statistics is a non-parametric effect
; size measure. Reference: + A. Vargha and H. D. Delaney. A critique
; and improvement of the CL common language effect size statistics of
; McGraw and Wong. Journal of Educational and Behavioral Statistics,
; 25(2):101-132, 2000
;
; Given a performance measure M seen in m measures of X and n measures
; of Y, the A12 statistics measures the probability that running
; algorithm X yields higher M values than running another algorithm Y.
;
; A12 = #(X > Y)/mn + 0.5*#(X=Y)/mn
;
; According to Vargha and Delaney, a small, medium, large difference
; between two populations:
;
; + Big is A12 over 0.71
; + Medium is A12 over 0.64
; + Small is A12 over 0.56
;
; In my view, this seems gratitiously different to...
;
; + Big is A12 over three-quarters (0.75)
; + Medium is A12 over two-thirds (0.66)
; + Small is A12 over half (0.5)
;
; Whatever, the following code parameterizes that magic number
; so you can use the standard values if you want to.
;
; While A12 studies two treatments. LA12 handles multiple treatments.
; Samples from each population are sorted by their mean. Then
; b4= sample[i] and after= sample[i+1] and rank(after) = 1+rank(b4)
; if a12 reports that the two populations are different.

To simplify that process, I offer the following syntax. A population
; is a list of numbers, which may be unsorted, and starts with some
; symbol or string describing the population. A12s expects a list of
; such populations. For examples of that syntax, see the following use cases

from a12 import *

rxs= [["x1", 0.34, 0.49, 0.51, 0.60],
["x2", 0.9, 0.7, 0.8, 0.60],
["x3", 0.15, 0.25, 0.4, 0.35],
["x4", 0.6, 0.7, 0.8, 0.90],
["x5", 0.1, 0.2, 0.3, 0.40]]
for rx in a12s(rxs,rev=False,enough=0.75): print rx

print ""
rxs = [["y1", 101, 100, 99, 101, 99.5],
["y2", 101, 100, 99, 101, 100.0],
["y3", 101, 100, 99.5, 101, 99.0],
["y4", 101, 100, 99, 101, 100.0]]
for rx in a12s(rxs): print rx

@timm
Copy link
Author

timm commented May 22, 2013

; Also, the "fromFile" supports reading from file. For example, if the file is this...

x1 0.34 0.49 0.51 0.60
x2 0.9 0.7 0.8 0.60
x3 0.15 0.25 0.4 0.35
x4 0.6 0.7 0.8 0.90
x5 0.1 0.2 0.3 0.40

; then this call will print the stats:
; for rx in fromFile(): print rx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment