Skip to content

Instantly share code, notes, and snippets.

@Hammad-hab
Last active July 14, 2024 16:30
Show Gist options
  • Save Hammad-hab/5b4fbb9f8b4eeda3ad9de7c1dce6b6a6 to your computer and use it in GitHub Desktop.
Save Hammad-hab/5b4fbb9f8b4eeda3ad9de7c1dce6b6a6 to your computer and use it in GitHub Desktop.
A general and simple implementation of the Hamming Neighbourhood Distance.
def hn_distance(x1:str, x2:str, j=0):
"""
Hamming-Neighbourhood Distance
See more here https://medium.com/p/73997d14706e/
x1: The first comparision string
x2: The second comparision string
j: The ignorance threshold is the minimum difference between
values required for them to be considered as a potential
consideration. It is best to leave this alone unless you
know what you're doing.
"""
# Convert each string to it's corresponding ASCII Vector
a = [ord(c) for c in x1]
b = [ord(c) for c in x2]
# Delta Array
z = []
# Finding the longest array to ensure lengthy iteration
ln = max(len(a), len(b))
for x in range(ln):
# if the observation point exceeds the length of string A, start pushing direct B ASCII codes to delta array
if x > (len(a) - 1):
z.append(b[x])
# if the observation point exceeds the length of string B, start pushing direct A ASCII codes to delta array
elif x > (len(b) - 1):
z.append(a[x])
else:
# For now the observation point is within both strings, record difference in delta array
z.append(b[x] - a[x])
# In the case of Delta array, there are differences.
# If the difference is equal to j (threshold of ignorance), remove it
# otherwise record it.
# The length of the delta array is the difference between the two strings
# The difference from HN-Distance is the number of characters that are different
# from each other.
return len([abs(n) for n in z if n > j])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment