Last active
August 29, 2015 14:12
-
-
Save aaronjensen/f9599dab32d49c1702d6 to your computer and use it in GitHub Desktop.
Scoring algorithm for https://github.com/garybernhardt/selecta/issues/30
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# https://github.com/garybernhardt/selecta/issues/30 | |
def compute_match_length(choice, query) | |
# Check for exact match | |
return query.length if choice.index(query) | |
query_chars = query.chars | |
# Extract longest possible substring | |
first_index = choice.index(query_chars.first) | |
last_index = choice.rindex(query_chars.last) | |
# See if a match is even possible | |
return unless first_index && last_index && first_index < last_index | |
choice = choice[first_index..last_index] | |
# Keep track of lowest score for each index in our choice. | |
# It starts out at 1 for each of the 0th characters. | |
scores = {} | |
each_index_of_char(choice, query_chars[0]) do |index| | |
scores[index] = 1 | |
end | |
max = choice.length + 1 | |
query_chars.drop(1).each do |char| | |
return if scores.empty? | |
new_scores = Hash.new(max) | |
start = scores.keys.min + 1 | |
each_index_of_char(choice, char, start) do |char_index, boundary| | |
scores.each do |index, score| | |
next unless char_index > index | |
score_delta = boundary ? 1 : (char_index - index) | |
new_score = score + score_delta | |
new_scores[char_index] = new_score if new_score < new_scores[char_index] | |
end | |
end | |
scores = new_scores | |
end | |
scores.values.min | |
end | |
def each_index_of_char(str, char, start = 0) | |
index = str.index(char, start) | |
while index | |
prev = index - 1 | |
boundary = start > 0 && str.index(/\W/, prev) == prev | |
yield index, boundary | |
index = str.index(char, index + 1) | |
end | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
scoring non-matching | |
Baseline: | X-------| | |
Current: | X---- | | |
0 3.158 us | |
scoring matching exactly | |
Baseline: | X-----| | |
Current: | X | | |
0 32.625 us | |
scoring matching broken up | |
Baseline: | X- | | |
Current: | X-----| | |
0 499.000 us | |
scoring overlapping matches | |
Baseline: | X--| | |
Current: | X | | |
0 70.625 us | |
scoring almost overlapping matches | |
Baseline: | X | | |
Current: | X---| | |
0 1.567 ms | |
scoring paths, non-matching | |
Baseline: | X-----| | |
Current: | X---- | | |
0 11.801 ms | |
scoring paths, empty query | |
Baseline: | X-------| | |
Current: | X--------| | |
0 559.000 us | |
scoring paths, trivial query | |
Baseline: | X----| | |
Current: | X--- | | |
0 8.676 ms | |
search matching a large list of choices filtering with no matches | |
Baseline: | X--------| | |
Current: | X------- | | |
0 1.277 ms | |
search matching a large list of choices filtering with many matches | |
Baseline: | X------| | |
Current: | X- | | |
0 2.435 ms | |
search matching a large list of choices progressive query appending, sequential, few matches | |
Baseline: | X------| | |
Current: | X-- | | |
0 6.490 ms | |
search matching a large list of choices progressive query appending, sequential, many matches | |
Baseline: | X------ | | |
Current: | X---| | |
0 43.046 ms | |
search matching a large list of choices progressive query appending, non-sequential, few matches | |
Baseline: | X---| | |
Current: | X---- | | |
0 5.087 ms | |
search matching a large list of choices progressive query appending, non-sequential, many matches | |
Baseline: | X-- | | |
Current: | X--| | |
0 34.659 ms |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment