Created
April 29, 2016 22:27
-
-
Save MatthewRalston/e5e99b7cba4788fc41b7744cc62bdde7 to your computer and use it in GitHub Desktop.
Hamming distance match
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
When you want to match files that only differ by one character: sample_1.fastq.gz sample_2.fastq.gz | |
''' | |
def is_hamming_match(seq1, seq2): | |
if len(seq1) == len(seq2): # The sequences must be of equal length | |
zipped = zip(seq1, seq2) | |
is_mismatch = list([c1 != c2 for c1, c2 in zipped]) | |
if sum(is_mismatch) == 1: # The sequences must have only one mismatch | |
mismatches = list([[c1, c2] for c1, c2 in zipped if c1 != c2]) | |
if mismatches[0] == ['1', '2'] or mismatches[0] == ['2', '1']: # The mismatch must only be the numbers 1 and 2 | |
return True | |
return False |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment