Skip to content

Instantly share code, notes, and snippets.

@plukevdh
Created November 7, 2012 03:13
Show Gist options
  • Save plukevdh/4029376 to your computer and use it in GitHub Desktop.
Save plukevdh/4029376 to your computer and use it in GitHub Desktop.
Implementations of the levenshtein algorithm
-module(levenshtein).
-export([distance/2]).
first_letter_check(OneLetter, TwoLetter) ->
if OneLetter =:= TwoLetter -> 0
; true -> 1
end.
distance(String1, String1) -> 0;
distance(String, "") -> string:len(String);
distance("", String) -> string:len(String);
distance([H1|Rest1], [H2|Rest2]) ->
Cost = first_letter_check(H1, H2),
lists:min([
distance(Rest1, [H2|Rest2]) + 1,
distance([H1|Rest1], Rest2) + 1,
distance(Rest1, Rest2) + Cost
]).
class Levenshtein
attr_reader :distance
def initialize(first, second)
@from = first
@to = second
end
def distance
@distance ||= calculate_distance(@from, @to)
end
private
def calculate_distance(from, to)
return 0 if from == to
return from.length if to.empty?
return to.length if from.empty?
[
calculate_distance(from[1..-1], to) + 1,
calculate_distance(from, to[1..-1]) + 1,
calculate_distance(from[1..-1], to[1..-1]) + (from[0] <=> to[0]).abs
].min
end
end
@plukevdh
Copy link
Author

plukevdh commented Nov 7, 2012

Ruby benchmarks

Benchmark.bmbm do |bm|
  bm.report("simple") { Levenshtein.new("kitten", "sitting").distance }
  bm.report("more complex") { Levenshtein.new("space balls", "base spalls").distance }
end
Rehearsal ------------------------------------------------
simple         0.110000   0.000000   0.110000 (  0.114116)
more complex  62.300000   0.270000  62.570000 ( 72.798922)
-------------------------------------- total: 62.680000sec

                   user     system      total        real
simple         0.100000   0.000000   0.100000 (  0.105756)
more complex  62.020000   0.270000  62.290000 ( 69.381390)

@plukevdh
Copy link
Author

plukevdh commented Nov 7, 2012

Non-legit stats for Erlang

28> levenshtein:distance("kitten", "sitting").
29> statistics(wall_clock).
{3708982,19}

34> levenshtein:distance("space balls", "base spalls").
35> statistics(wall_clock).
{3788477,11198}

simple = 19ms
complex = 11.2 sec

@plukevdh
Copy link
Author

plukevdh commented Nov 7, 2012

updated erlang bogus stats:

simple = < 10ms
complex = 1248.75ms

about a 4 sample avg for each run (wallclock).

@plukevdh
Copy link
Author

plukevdh commented Nov 7, 2012

Moar better stats:
33> timer:tc(levenshtein, distance, ["kitten", "sitting"]).
{4893,3}
34> timer:tc(levenshtein, distance, ["space balls", "base spalls"]).
{2347887,5}

microsec.

@plukevdh
Copy link
Author

plukevdh commented Nov 7, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment