Skip to content

Instantly share code, notes, and snippets.

@tomstuart tomstuart/levenshtein.rb
Last active Jun 15, 2017

Embed
What would you like to do?
Approximate substring matching with Levenshtein distance
module Levenshtein
def self.substring_distance(needle, haystack)
distances = Array.new(haystack.length.succ, 0)
needle.each_char.with_index do |needle_char, needle_index|
next_distances = [needle_index.succ]
haystack.each_char.with_index do |haystack_char, haystack_index|
deletion, insertion, substitution =
distances[haystack_index.succ].succ,
next_distances[haystack_index].succ,
distances[haystack_index] + (needle_char == haystack_char ? 0 : 1)
next_distances.push([deletion, insertion, substitution].min)
end
distances = next_distances
end
distances.min
end
end
require 'levenshtein'
RSpec.describe 'Levenshtein substring edit distance' do
[
'Machu Picchu', 'Machu Picchu', 0,
'Machu Picchu', 'Where is Machu Picchu?', 0,
'Machu Picchu', 'Where is Machu Pichu?', 1,
'Machu Picchu', 'Where is Macchu Picchu?', 1,
'Stonehenge', 'How do I get to Stone Henge?', 2,
'', '', 0,
'', 'Machu Picchu', 0,
'u', 'Machu Picchu', 0,
'z', 'Machu Picchu', 1,
'Uluru', '', 5
].each_slice(3) do |needle, haystack, distance|
specify "the shortest distance between “#{needle}” and part of “#{haystack}” is #{distance}" do
expect(Levenshtein.substring_distance(needle, haystack)).to eq distance
end
end
end
@tomstuart

This comment has been minimized.

Copy link
Owner Author

commented Jun 15, 2017

Based on this information from https://en.wikipedia.org/wiki/Approximate_string_matching#Problem_formulation_and_algorithms:

Computing E(m, j) is very similar to computing the edit distance between two strings. In fact, we can use the Levenshtein distance computing algorithm for E(m, j), the only difference being that we must initialize the first row with zeros, and save the path of computation, that is, whether we used E(i − 1, j), E(i, j − 1) or E(i − 1, j − 1) in computing E(i, j).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.