Skip to content

Instantly share code, notes, and snippets.

@CanoeFZH
Last active December 11, 2017 08:28
Show Gist options
  • Save CanoeFZH/39a32d463cb562723b4524f45c5b56da to your computer and use it in GitHub Desktop.
Save CanoeFZH/39a32d463cb562723b4524f45c5b56da to your computer and use it in GitHub Desktop.
mmr demo
import collections
'''
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1330&context=compsci
https://www.quora.com/Where-can-I-find-a-maximum-marginal-relevance-algorithm-in-Python-for-redundancy-removal-in-two-documents
'''
def argmax(keys, f):
return max(keys, key=f)
def mmr_sorted(docs, lambda_,):
selected = collections.OrderedDict()
while set(selected) != docs:
remaining = docs - set(selected)
mmr_score = lambda x: lambda_* len(x) - (1-lambda_)*max([len(set(x) & set(y)) for y in set(selected)-{x}] or [0])
next_selected = argmax(remaining, mmr_score)
selected[next_selected] = len(selected)
return selected
docs = set(["abc", "ab", "bc", "cd", "abcd", "efg", "higklm", "ll"])
In [8]: mmr_sorted(docs, 0.1)
Out[8]:
OrderedDict([('higklm', 0),
('abcd', 1),
('efg', 2),
('ll', 3),
('cd', 4),
('ab', 5),
('bc', 6),
('abc', 7)])
In [9]: mmr_sorted(docs, 1)
Out[9]:
OrderedDict([('higklm', 0),
('abcd', 1),
('abc', 2),
('efg', 3),
('ll', 4),
('cd', 5),
('ab', 6),
('bc', 7)])
In [10]: mmr_sorted(docs, 0)
Out[10]:
OrderedDict([('abcd', 0),
('ll', 1),
('efg', 2),
('higklm', 3),
('cd', 4),
('ab', 5),
('bc', 6),
('abc', 7)])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment