Skip to content

Instantly share code, notes, and snippets.

@jramb
Last active December 30, 2015 11:39
Show Gist options
  • Save jramb/7824266 to your computer and use it in GitHub Desktop.
Save jramb/7824266 to your computer and use it in GitHub Desktop.
String distance metric - Levenshtein distance Optimized with some efford, still, performance depends heavily on the length of the input. Example: select xx_tools_pkg.levenshtein_dist('Hejsan', 'hejsan'), xx_tools_pkg.levenshtein_dist('kitten', 'sitting') from dual; Returns distances of 1 and 3. Probably you would want to apply upper/lower on the…
-- Levenshtein is a string distance metric
-- https://en.wikipedia.org/wiki/Levenshtein_distance
function levenshtein_dist(p_s varchar2, p_t varchar2)
return number deterministic
is
type int_t is table of pls_integer index by pls_integer;
type char_t is table of char(2) index by pls_integer;
v0 int_t;
v1 int_t;
t char_t; -- copy of p_t for performance
si char(2); -- need more space for some non-ascii chars
m pls_integer := length(p_s);
n pls_integer := length(p_t);
begin
if m is null or n is null then
return null;
end if;
if m=0 or n=0 then
return case when m>n then m else n end;
end if;
for j in 1..n loop
t(j) := substr(p_t, j, 1);
end loop;
for i in 0..n loop v0(i) := i; end loop;
for i in 1..m loop
v1(0) := i;
si := substr(p_s, i, 1);
for j in 1..n loop
v1(j) := least
( v1(j-1)+1
, v0(j)+1
, v0(j-1) + case when si=t(j) then 0 else 1 end);
end loop;
v0 := v1;
end loop;
return v1(n);
end levenshtein_dist;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment