Skip to content

Instantly share code, notes, and snippets.

@andyreagan
Created March 9, 2015 18:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save andyreagan/af62a2dd3b5c6f93ea48 to your computer and use it in GitHub Desktop.
Save andyreagan/af62a2dd3b5c6f93ea48 to your computer and use it in GitHub Desktop.
core trie code in new labMTsimple
# self.data is a tuple of 2 items
# the first is trie for the non-regex (fixed) words
# and the second for the regex (stem) words
# matching on prefix mimins the [a-z]* match for the stems
# also, this was stolen from an internal function, so
# wordDict is a hash (dict) or "word": count
# for a corpus being scores
totalcount = 0
totalscore = 0.0
for word,count in wordDict.iteritems():
if word in self.data[0]:
totalcount += count
totalscore += count*self.data[0][word][0]
elif (len(self.data[1].prefixes(word)) > 0):
totalcount += count
totalscore += count*self.data[1].prefix_items(word)[0][1][0]
return totalscore/totalcount
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment