Skip to content

Instantly share code, notes, and snippets.

@momvart
Last active May 13, 2020 11:25
Show Gist options
  • Save momvart/7a3674e563340e70c25b18057d8b578e to your computer and use it in GitHub Desktop.
Save momvart/7a3674e563340e70c25b18057d8b578e to your computer and use it in GitHub Desktop.
A generator for mapping of last to first column in BWT matrix. Input is a BW transform of DNA sequence.
#change this if you want to support other types of sequences
def get_char_index(c):
if c == 'A':
return 1
elif c == 'C':
return 2
elif c == 'G':
return 3
elif c == 'T':
return 4
elif c == '$':
return 0
else:
return -1
def get_last_to_first(bwt):
counts = [0, 0, 0, 0, 0]
letter_pos = list()
for c in bwt:
index = get_char_index(c)
letter_pos.append(counts[index])
counts[index] += 1
cum_counts = [sum(counts[:i]) for i in range(len(counts))]
for i in range(len(bwt)):
letter_pos[i] += cum_counts[get_char_index(bwt[i])]
return letter_pos
def get_first_col(bwt):
letter_pos = get_last_to_first(bwt)
answer = [None] * len(bwt)
for i in range(len(bwt)):
answer[letter_pos[i]] = bwt[i]
return ''.join(answer)
@momvart
Copy link
Author

momvart commented May 13, 2020

More information about BW transform: Wikipedia.
A useful resource for this algorithm: Lecture from Stanford

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment