Skip to content

Instantly share code, notes, and snippets.

@MatthewRalston
Created April 29, 2016 22:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MatthewRalston/e5e99b7cba4788fc41b7744cc62bdde7 to your computer and use it in GitHub Desktop.
Save MatthewRalston/e5e99b7cba4788fc41b7744cc62bdde7 to your computer and use it in GitHub Desktop.
Hamming distance match
'''
When you want to match files that only differ by one character: sample_1.fastq.gz sample_2.fastq.gz
'''
def is_hamming_match(seq1, seq2):
if len(seq1) == len(seq2): # The sequences must be of equal length
zipped = zip(seq1, seq2)
is_mismatch = list([c1 != c2 for c1, c2 in zipped])
if sum(is_mismatch) == 1: # The sequences must have only one mismatch
mismatches = list([[c1, c2] for c1, c2 in zipped if c1 != c2])
if mismatches[0] == ['1', '2'] or mismatches[0] == ['2', '1']: # The mismatch must only be the numbers 1 and 2
return True
return False
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment