Skip to content

Instantly share code, notes, and snippets.

@nornagon
Created July 21, 2017 04:07
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save nornagon/6326a643fc30339ece3021013ed9b48c to your computer and use it in GitHub Desktop.
Save nornagon/6326a643fc30339ece3021013ed9b48c to your computer and use it in GitHub Desktop.
Smith-Waterman Python implementation
import numpy as np
def smith_waterman(a: str, b: str, alignment_score: float = 1, gap_cost: float = 1) -> float:
"""
Compute the Smith-Waterman alignment score for two strings.
See https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm#Algorithm
This implementation has a fixed gap cost (i.e. extending a gap is considered
free). In the terminology of the Wikipedia description, W_k = {c, c, c, ...}.
This implementation also has a fixed alignment score, awarded if the relevant
characters are equal.
Kinda slow, especially for large (50+ char) inputs.
"""
# H holds the alignment score at each point, computed incrementally
H = np.zeros((len(a) + 1, len(b) + 1))
for i in range(1, len(a) + 1):
for j in range(1, len(b) + 1):
# The score for substituting the letter a[i-1] for b[j-1]. Generally low
# for mismatch, high for match.
match = H[i-1,j-1] + (alignment_score if a[i-1] == b[j-1] else 0)
# The scores for for introducing extra letters in one of the strings (or
# by symmetry, deleting them from the other).
delete = H[1:i,j].max() - gap_cost if i > 1 else 0
insert = H[i,1:j].max() - gap_cost if j > 1 else 0
H[i,j] = max(match, delete, insert, 0)
# The highest score is the best local alignment.
# For our purposes, we don't actually care _what_ the alignment was, just how
# aligned the two strings were.
return H.max()
@Janemely
Copy link

thanks for this. helped me a lot with my project of implementing a method to prevent malicious name confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment