Skip to content

Instantly share code, notes, and snippets.

@dbrgn
Created August 18, 2011 12:55
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save dbrgn/1154006 to your computer and use it in GitHub Desktop.
Save dbrgn/1154006 to your computer and use it in GitHub Desktop.
Boyer-Moore-Algorithm in Python
class last_occurrence(object):
"""Last occurrence functor."""
def __init__(self, pattern, alphabet):
"""Generate a dictionary with the last occurrence of each alphabet
letter inside the pattern.
Note: This function uses str.rfind, which already is a pattern
matching algorithm. There are more 'basic' ways to generate this
dictionary."""
self.occurrences = dict()
for letter in alphabet:
self.occurrences[letter] = pattern.rfind(letter)
def __call__(self, letter):
"""Return last position of the specified letter inside the pattern.
Return -1 if letter not found in pattern."""
return self.occurrences[letter]
def boyer_moore_match(text, pattern):
"""Find occurrence of pattern in text."""
alphabet = set(text)
last = last_occurrence(pattern, alphabet)
m = len(pattern)
n = len(text)
i = m - 1 # text index
j = m - 1 # pattern index
while i < n:
if text[i] == pattern[j]:
if j == 0:
return i
else:
i -= 1
j -= 1
else:
l = last(text[i])
i = i + m - min(j, 1+l)
j = m - 1
return -1
### TEST FUNCTION ###
if __name__ == '__main__':
def show_match(text, pattern):
print 'Text: %s' % text
p = boyer_moore_match(text, pattern)
print 'Match: %s%s' % ('.'*p, pattern)
text = 'abacaabadcabacabaabb'
pattern = 'abacab'
show_match(text, pattern)
print
text = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.'
pattern = 'dolor'
show_match(text, pattern)
show_match(text, pattern + 'e')
@dbrgn
Copy link
Author

dbrgn commented Aug 18, 2011

Output:

Text:  abacaabadcabacabaabb
Match: ..........abacab

Text:  Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
Match: ............dolor
Text:  Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
Match: .............................................................................................................dolore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment