Skip to content

Instantly share code, notes, and snippets.

@tomstuart tomstuart/levenshtein.rb
Last active Jun 15, 2017

What would you like to do?
Approximate substring matching with Levenshtein distance
module Levenshtein
def self.substring_distance(needle, haystack)
distances =, 0)
needle.each_char.with_index do |needle_char, needle_index|
next_distances = [needle_index.succ]
haystack.each_char.with_index do |haystack_char, haystack_index|
deletion, insertion, substitution =
distances[haystack_index] + (needle_char == haystack_char ? 0 : 1)
next_distances.push([deletion, insertion, substitution].min)
distances = next_distances
require 'levenshtein'
RSpec.describe 'Levenshtein substring edit distance' do
'Machu Picchu', 'Machu Picchu', 0,
'Machu Picchu', 'Where is Machu Picchu?', 0,
'Machu Picchu', 'Where is Machu Pichu?', 1,
'Machu Picchu', 'Where is Macchu Picchu?', 1,
'Stonehenge', 'How do I get to Stone Henge?', 2,
'', '', 0,
'', 'Machu Picchu', 0,
'u', 'Machu Picchu', 0,
'z', 'Machu Picchu', 1,
'Uluru', '', 5
].each_slice(3) do |needle, haystack, distance|
specify "the shortest distance between “#{needle}” and part of “#{haystack}” is #{distance}" do
expect(Levenshtein.substring_distance(needle, haystack)).to eq distance

This comment has been minimized.

Copy link
Owner Author

tomstuart commented Jun 15, 2017

Based on this information from

Computing E(m, j) is very similar to computing the edit distance between two strings. In fact, we can use the Levenshtein distance computing algorithm for E(m, j), the only difference being that we must initialize the first row with zeros, and save the path of computation, that is, whether we used E(i − 1, j), E(i, j − 1) or E(i − 1, j − 1) in computing E(i, j).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.