Created
November 7, 2012 03:13
-
-
Save plukevdh/4029376 to your computer and use it in GitHub Desktop.
Implementations of the levenshtein algorithm
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-module(levenshtein). | |
-export([distance/2]). | |
first_letter_check(OneLetter, TwoLetter) -> | |
if OneLetter =:= TwoLetter -> 0 | |
; true -> 1 | |
end. | |
distance(String1, String1) -> 0; | |
distance(String, "") -> string:len(String); | |
distance("", String) -> string:len(String); | |
distance([H1|Rest1], [H2|Rest2]) -> | |
Cost = first_letter_check(H1, H2), | |
lists:min([ | |
distance(Rest1, [H2|Rest2]) + 1, | |
distance([H1|Rest1], Rest2) + 1, | |
distance(Rest1, Rest2) + Cost | |
]). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class Levenshtein | |
attr_reader :distance | |
def initialize(first, second) | |
@from = first | |
@to = second | |
end | |
def distance | |
@distance ||= calculate_distance(@from, @to) | |
end | |
private | |
def calculate_distance(from, to) | |
return 0 if from == to | |
return from.length if to.empty? | |
return to.length if from.empty? | |
[ | |
calculate_distance(from[1..-1], to) + 1, | |
calculate_distance(from, to[1..-1]) + 1, | |
calculate_distance(from[1..-1], to[1..-1]) + (from[0] <=> to[0]).abs | |
].min | |
end | |
end |
Non-legit stats for Erlang
28> levenshtein:distance("kitten", "sitting").
29> statistics(wall_clock).
{3708982,19}
34> levenshtein:distance("space balls", "base spalls").
35> statistics(wall_clock).
{3788477,11198}
simple = 19ms
complex = 11.2 sec
updated erlang bogus stats:
simple = < 10ms
complex = 1248.75ms
about a 4 sample avg for each run (wallclock).
Moar better stats:
33> timer:tc(levenshtein, distance, ["kitten", "sitting"]).
{4893,3}
34> timer:tc(levenshtein, distance, ["space balls", "base spalls"]).
{2347887,5}
microsec.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ruby benchmarks