Skip to content

Instantly share code, notes, and snippets.

@fjavieralba
Created March 23, 2012 10:51
Show Gist options
  • Save fjavieralba/2169554 to your computer and use it in GitHub Desktop.
Save fjavieralba/2169554 to your computer and use it in GitHub Desktop.
[PYTHON] Non overlapping tagging of a sentence based on a dictionary of expressions
def non_overlapping_tagging(sentence, dict, max_key_size):
"""
Result is only one tagging of all the possible ones.
The resulting tagging is determined by these two priority rules:
- longest matches have higher priority
- search is made from left to right
"""
tag_sentence = []
N = len(sentence)
if max_key_size == -1:
max_key_size = N
i = 0
while (i < N):
tagged = False
j = min(i + max_key_size, N) #avoid overflow
while (j > i):
literal = " ".join(sentence[i:j])
print literal
if literal in dict:
print "HIT: %s" % literal
tag_sentence.append("[%s]" % literal)
i = j
tagged = True
else:
j -= 1
if not tagged:
tag_sentence.append(sentence[i])
i += 1
return tag_sentence
dict = {
'Los Angeles' : 1,
'Lakers': 1,
'Los Angeles Lakers': 1,
#'Angeles Lakers': 1,
'Washington': 1,
'State': 1,
'Washington State University': 1
}
sentence = ["Los", "Angeles", "Lakers", "visited", "Washington", "State", "last", "week"]
non_overlapping_tagging(sentence, dict, -1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment