Skip to content

Instantly share code, notes, and snippets.

@Noleli
Created April 9, 2012 15:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Noleli/2344069 to your computer and use it in GitHub Desktop.
Save Noleli/2344069 to your computer and use it in GitHub Desktop.
overlaps = []
for postword in post:
matchcount = 0
wordmatches = []
for bncword in bnc:
if(postword['word'] == bncword['word']):
bnclogf = math.log(float(bncword['count']))
wordmatches.append({'word': postword['word'],
'postlogf': postword['logf'],
'bnclogf': bnclogf,
'diff': float(postword['logf']) - bnclogf
})
# find the max of duplicates
if len(wordmatches) > 0:
while len(wordmatches) > 1:
if wordmatches[0]['bnclogf'] > wordmatches[1]['bnclogf']:
del wordmatches[1]
else:
del wordmatches[0]
overlaps.append(wordmatches[0])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment