Skip to content

Instantly share code, notes, and snippets.

@timm
Last active March 16, 2018 14:31
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save timm/a6e759eb7d9b5f05b468 to your computer and use it in GitHub Desktop.
Save timm/a6e759eb7d9b5f05b468 to your computer and use it in GitHub Desktop.
Fast method for computing Cliffs Delta
from __future__ import division
def cliffsDelta(lst1,lst2,
dull = [0.147, # small
0.33, # medium
0.474 # large
][0] ):
"Returns true if there are more than 'dull' differences"
m, n = len(lst1), len(lst2)
lst2 = sorted(lst2)
j = more = less = 0
for repeats,x in runs(sorted(lst1)):
while j <= (n - 1) and lst2[j] < x:
j += 1
more += j*repeats
while j <= (n - 1) and lst2[j] == x:
j += 1
less += (n - j)*repeats
d= (more - less) / (m*n)
return abs(d) > dull
def runs(lst):
"Iterator, chunks repeated values"
for j,two in enumerate(lst):
if j == 0:
one,i = two,0
if one!=two:
yield j - i,one
i = j
one=two
yield j - i + 1,two
def _cliffsDelta():
"demo function"
lst1=[1,2,3,4,5,6,7]
for r in [1.01,1.1,1.21, 1.5, 2]:
lst2=map(lambda x: x*r,lst1)
print r,cliffsDelta(lst1,lst2) # should return False
@timm
Copy link
Author

timm commented May 8, 2015

If the above is fast way, the slow way is:

def cliffsDelta(lst1,lst2,
                dull = [0.147, # small
                        0.33,  # medium
                        0.474 # large
                        ][0] ): 
  n= gt = lt = 0.0
  for x in lst1:
    for y in lst2:
      n += 1
      if x > y:  gt += 1
      if x < y:  lt += 1
  return abs(lt - gt)/n > dull

This is slower than the above code since, without the sorting and the runs stuff, it takes time polynomial time to explore the lists.

@neilernst
Copy link

I added some tests and turned it into Py 3 over here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment