Created
December 12, 2018 14:48
-
-
Save TutorialDoctor/fc3a42a609a120e52018d898625cba76 to your computer and use it in GitHub Desktop.
Similarity of two strings using Levenstein distance. Edited slightly from code found online.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Problem Statement | |
================= | |
Given two strings str1 and str2, find the minimum number of edits (edit one character to another, delete char from str1 | |
or delete char from str2) to change str1 to str2. | |
Video | |
----- | |
* https://youtu.be/We3YDTzNXEk | |
Analysis | |
-------- | |
* DP Runtime : O(len(str1) * len(str2)) | |
* Recursive Solution: Exponential (O(3^(m+n-1))) | |
Reference | |
--------- | |
* https://www.clear.rice.edu/comp130/12spring/editdist/ | |
""" | |
def min_edit_distance(str1, str2): | |
rows = len(str2) + 1 | |
cols = len(str1) + 1 | |
T = [[0 for _ in range(cols)] for _ in range(rows)] | |
for j in range(cols): | |
T[0][j] = j | |
for i in range(rows): | |
T[i][0] = i | |
for i in range(1, rows): | |
for j in range(1, cols): | |
if str2[i - 1] == str1[j - 1]: | |
T[i][j] = T[i - 1][j - 1] | |
else: | |
T[i][j] = 1 + min(T[i - 1][j - 1], T[i - 1][j], T[i][j - 1]) | |
return T[rows - 1][cols - 1] | |
def similarity(s1,s2): | |
return (1-(min_edit_distance(s1, s2)/max(len(s1),len(s2)))) | |
s1 = 'benolive@gmail.com' | |
s2 ='benolive@gmail.com' | |
print(similarity(s1,s2)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment