Skip to content

Instantly share code, notes, and snippets.

@abijith-kp
Created July 17, 2014 08:37
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abijith-kp/923fdac6d43639e3d7f5 to your computer and use it in GitHub Desktop.
Save abijith-kp/923fdac6d43639e3d7f5 to your computer and use it in GitHub Desktop.
import jellyfish
def fuzzy_set(string_list, similarity_ratio):
'''
Input : List of strings
Output : Set of strings. All similar kinds of strings are merged together
and the most
'''
string_list = list(set(string_list))
string_list.sort()
flag = 0
for i in string_list:
flag = 0
f = string_list.index(i) + 1
for j in string_list:
if jellyfish.jaro_winkler(i, j) >= similarity_ratio and string_list.index(i) != string_list.index(j):
flag = 1
break
if flag == 1:
string_list.pop(string_list.index(i))
return set(string_list)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment