Skip to content

Instantly share code, notes, and snippets.

@awhit012
Last active July 15, 2016 16:46
Show Gist options
  • Save awhit012/f78afc37714a4c97c789 to your computer and use it in GitHub Desktop.
Save awhit012/f78afc37714a4c97c789 to your computer and use it in GitHub Desktop.
Horspool string matching algorithm in Ruby
#Horspool algorithm implements the good suffix rule.
# If the char is not present, it still skips ahead by the length of the pattern.
# If the char is not a match, but it is present in the string, it jumps forward by the appropriate ammount.
# pattern_length if char not in pattern is found
# function accepts a string and a pattern and returns the index in the string where
# the pattern first occurs, or "not found"
def brute_search_2 string, pattern
pattern_length = pattern.length
bad_match_table = Hash.new
# Generates hash table with keys as all chars in pattern, and values as the length the last occurence
# of the char is in the pattern from the end.
# So for "cake", the table becomes {"c"=>3, "a"=>2, "k"=>1}
# For "potato", the table becomes {"p"=>5, "o"=>4, "t"=>1, "a"=>2}
# The table is generated by iterating through the pattern and saving the char at the current index as the
# key, and setting the value as the pattern_length minus the current index minus 1.
for i in (0...pattern_length - 1)
bad_match_table[pattern[i]] = pattern_length - i - 1
end
# Iterate through the string starting at string[pattern_length-1]
# we cant do a for loop, because we need to add pattern_length to string_index
# when the char is not in our bad_match_table hash
# therefore we must define string_index as pattern_length -1 and iterate up ourselves
string_index = pattern_length - 1
while string_index < string.length
match_count = 0
loop do
# if a non-match is found, then break.
break if string[string_index - match_count] != pattern[(pattern_length - 1 - match_count)]
# if it wasn't a non-match, it must be a match!
match_count += 1
# if that match_count reaches the lenght of the pattern, you found the pattern!
# return the index in string where the pattern begins
return string_index - (pattern_length - 1) if match_count == pattern_length
end
if bad_match_table[string[string_index]]
string_index += bad_match_table[string[string_index]]
else
string_index += pattern_length
end
end
return "not found"
end
# tests
puts " "
puts "Brute Force Right to Left Test:"
puts " "
# match
test_string = "let them eat cake"
test_pattern = "cake"
should_be = 13
result = brute_search_2 test_string, test_pattern
puts "Pattern at index #{result} "
raise "This is wrong" unless result == should_be
# no match
test_string2 = "let them eat cake"
test_pattern2 = "potato"
should_be2 = "not found"
result2 = brute_search_2 test_string2, test_pattern2
puts "Pattern at index #{result2} "
raise "This is wrong" unless result == should_be
@mnorelli
Copy link

Alex, thanks for this. Should your code at line 70 be:
raise "This is wrong" unless result2 == should_be2 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment