Last active
July 15, 2016 16:46
-
-
Save awhit012/f78afc37714a4c97c789 to your computer and use it in GitHub Desktop.
Horspool string matching algorithm in Ruby
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Horspool algorithm implements the good suffix rule. | |
# If the char is not present, it still skips ahead by the length of the pattern. | |
# If the char is not a match, but it is present in the string, it jumps forward by the appropriate ammount. | |
# pattern_length if char not in pattern is found | |
# function accepts a string and a pattern and returns the index in the string where | |
# the pattern first occurs, or "not found" | |
def brute_search_2 string, pattern | |
pattern_length = pattern.length | |
bad_match_table = Hash.new | |
# Generates hash table with keys as all chars in pattern, and values as the length the last occurence | |
# of the char is in the pattern from the end. | |
# So for "cake", the table becomes {"c"=>3, "a"=>2, "k"=>1} | |
# For "potato", the table becomes {"p"=>5, "o"=>4, "t"=>1, "a"=>2} | |
# The table is generated by iterating through the pattern and saving the char at the current index as the | |
# key, and setting the value as the pattern_length minus the current index minus 1. | |
for i in (0...pattern_length - 1) | |
bad_match_table[pattern[i]] = pattern_length - i - 1 | |
end | |
# Iterate through the string starting at string[pattern_length-1] | |
# we cant do a for loop, because we need to add pattern_length to string_index | |
# when the char is not in our bad_match_table hash | |
# therefore we must define string_index as pattern_length -1 and iterate up ourselves | |
string_index = pattern_length - 1 | |
while string_index < string.length | |
match_count = 0 | |
loop do | |
# if a non-match is found, then break. | |
break if string[string_index - match_count] != pattern[(pattern_length - 1 - match_count)] | |
# if it wasn't a non-match, it must be a match! | |
match_count += 1 | |
# if that match_count reaches the lenght of the pattern, you found the pattern! | |
# return the index in string where the pattern begins | |
return string_index - (pattern_length - 1) if match_count == pattern_length | |
end | |
if bad_match_table[string[string_index]] | |
string_index += bad_match_table[string[string_index]] | |
else | |
string_index += pattern_length | |
end | |
end | |
return "not found" | |
end | |
# tests | |
puts " " | |
puts "Brute Force Right to Left Test:" | |
puts " " | |
# match | |
test_string = "let them eat cake" | |
test_pattern = "cake" | |
should_be = 13 | |
result = brute_search_2 test_string, test_pattern | |
puts "Pattern at index #{result} " | |
raise "This is wrong" unless result == should_be | |
# no match | |
test_string2 = "let them eat cake" | |
test_pattern2 = "potato" | |
should_be2 = "not found" | |
result2 = brute_search_2 test_string2, test_pattern2 | |
puts "Pattern at index #{result2} " | |
raise "This is wrong" unless result == should_be |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Alex, thanks for this. Should your code at line 70 be:
raise "This is wrong" unless result2 == should_be2
?