Skip to content

Instantly share code, notes, and snippets.

@ademar111190
Last active August 29, 2015 14:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ademar111190/878fcbd2ae64c0d18b1a to your computer and use it in GitHub Desktop.
Save ademar111190/878fcbd2ae64c0d18b1a to your computer and use it in GitHub Desktop.
Just a study of biological sequence and how Dynamic Programming Alignment works

#Dynamic Programming

Just a study of biological sequence and how Dynamic Programming Alignment works

#How to

./dynamicProgramming.py -a ATA -b GATC > result.json

the result is a json that contains contains:

  • alphaResult that is the String alpha modified to align with the beta string
  • betaResult that is the String beta modified to align with the alpha string
  • merge the comparation between alphaResult and betaResult where 1 means gap, 2 means match and 3 means mismatch
  • path the 2d array containing the path, where 1 means the start position "▶", 2 means goes to top "↑", 3 means go to left "←" and 4 diagonal means go to top left "↖"
  • table the table of points resulting
  • score the score of the alignment

#Debug

using the -d "debug" you do not get a json as result, you'll see some logs, for example:

./dynamicProgramming.py -a ATA -b GATC -m -1 -M 1 -g -2 -G -2 -d

Options:
{'mismatch': -1, 'gapInitiation': -2, 'beta': 'GATC', 'debug': True, 'alpha': 'ATA', 'gapExtension': -2, 'match': 1}

Result table
+---+------+------+------+------+------+
|   |      |    G |    A |    T |    C |
+---+------+------+------+------+------+
|   | ▶  0 | ← -2 | ← -4 | ← -6 | ← -8 |
| A | ↑ -2 | ↖ -1 | ↖ -1 | ← -3 | ← -5 |
| T | ↑ -4 | ↖ -3 | ↖ -2 | ↖  0 | ← -2 |
| A | ↑ -6 | ↖ -5 | ↖ -2 | ↑ -2 | ↖ -1 |
+---+------+------+------+------+------+

Strings comparations
\_ATA
GATC
\_AT*

Score
Score: -1

#Substitution Matrix

To use a substitution matrix you need the matrix in json format, and send the file with -x option, for example:

./dynamicProgramming.py -a HVDDMP -b HTADLLA -x blosum50.json
{
"A": {
"A": 5,
"R": -2,
"N": -1,
"D": -2,
"C": -1,
"Q": -1,
"E": -1,
"G": 0,
"H": -2,
"I": -1,
"L": -2,
"K": -1,
"M": -1,
"F": -3,
"P": -1,
"S": 1,
"T": 0,
"W": -3,
"Y": -2,
"V": 0
},
"R": {
"A": -2,
"R": 7,
"N": -1,
"D": -2,
"C": -4,
"Q": 1,
"E": 0,
"G": -3,
"H": 0,
"I": -4,
"L": -3,
"K": 3,
"M": -2,
"F": -3,
"P": -3,
"S": -1,
"T": -1,
"W": -3,
"Y": -1,
"V": -3
},
"N": {
"A": -1,
"R": -1,
"N": 7,
"D": 2,
"C": -2,
"Q": 0,
"E": 0,
"G": 0,
"H": 1,
"I": -3,
"L": -4,
"K": 0,
"M": -2,
"F": -4,
"P": -2,
"S": 1,
"T": 0,
"W": -4,
"Y": -2,
"V": -3
},
"D": {
"A": -2,
"R": -2,
"N": 2,
"D": 8,
"C": -4,
"Q": 0,
"E": 2,
"G": -1,
"H": -1,
"I": -4,
"L": -4,
"K": -1,
"M": -4,
"F": -5,
"P": -1,
"S": 0,
"T": -1,
"W": -5,
"Y": -3,
"V": -4
},
"C": {
"A": -1,
"R": -4,
"N": -2,
"D": -4,
"C": 13,
"Q": -3,
"E": -3,
"G": -3,
"H": -3,
"I": -2,
"L": -2,
"K": -3,
"M": -2,
"F": -2,
"P": -4,
"S": -1,
"T": -1,
"W": -5,
"Y": -3,
"V": -1
},
"Q": {
"A": -1,
"R": 1,
"N": 0,
"D": 0,
"C": -3,
"Q": 7,
"E": 2,
"G": -2,
"H": 1,
"I": -3,
"L": -2,
"K": 2,
"M": 0,
"F": -4,
"P": -1,
"S": 0,
"T": -1,
"W": -1,
"Y": -1,
"V": -3
},
"E": {
"A": -1,
"R": 0,
"N": 0,
"D": 2,
"C": -3,
"Q": 2,
"E": 6,
"G": -3,
"H": 0,
"I": -4,
"L": -3,
"K": 1,
"M": -2,
"F": -3,
"P": -1,
"S": -1,
"T": -1,
"W": -3,
"Y": -2,
"V": -3
},
"G": {
"A": 0,
"R": -3,
"N": 0,
"D": -1,
"C": -3,
"Q": -2,
"E": -3,
"G": 8,
"H": -2,
"I": -4,
"L": -4,
"K": -2,
"M": -3,
"F": -4,
"P": -2,
"S": 0,
"T": -2,
"W": -3,
"Y": -3,
"V": -4
},
"H": {
"A": -2,
"R": 0,
"N": 1,
"D": -1,
"C": -3,
"Q": 1,
"E": 0,
"G": -2,
"H": 10,
"I": -4,
"L": -3,
"K": 0,
"M": -1,
"F": -1,
"P": -2,
"S": -1,
"T": -2,
"W": -3,
"Y": 2,
"V": -4
},
"I": {
"A": -1,
"R": -4,
"N": -3,
"D": -4,
"C": -2,
"Q": -3,
"E": -4,
"G": -4,
"H": -4,
"I": 5,
"L": 2,
"K": -3,
"M": 2,
"F": 0,
"P": -3,
"S": -3,
"T": -1,
"W": -3,
"Y": -1,
"V": 4
},
"L": {
"A": -2,
"R": -3,
"N": -4,
"D": -4,
"C": -2,
"Q": -2,
"E": -3,
"G": -4,
"H": -3,
"I": 2,
"L": 5,
"K": -3,
"M": 3,
"F": 1,
"P": -4,
"S": -3,
"T": -1,
"W": -2,
"Y": -1,
"V": 1
},
"K": {
"A": -1,
"R": 3,
"N": 0,
"D": -1,
"C": -3,
"Q": 2,
"E": 1,
"G": -2,
"H": 0,
"I": -3,
"L": -3,
"K": 6,
"M": -2,
"F": -4,
"P": -1,
"S": 0,
"T": -1,
"W": -3,
"Y": -2,
"V": -3
},
"M": {
"A": -1,
"R": -2,
"N": -2,
"D": -4,
"C": -2,
"Q": 0,
"E": -2,
"G": -3,
"H": -1,
"I": 2,
"L": 3,
"K": -2,
"M": 7,
"F": 0,
"P": -3,
"S": -2,
"T": -1,
"W": -1,
"Y": 0,
"V": 1
},
"F": {
"A": -3,
"R": -3,
"N": -4,
"D": -5,
"C": -2,
"Q": -4,
"E": -3,
"G": -4,
"H": -1,
"I": 0,
"L": 1,
"K": -4,
"M": 0,
"F": 8,
"P": -4,
"S": -3,
"T": -2,
"W": 1,
"Y": 4,
"V": -1
},
"P": {
"A": -1,
"R": -3,
"N": -2,
"D": -1,
"C": -4,
"Q": -1,
"E": -1,
"G": -2,
"H": -2,
"I": -3,
"L": -4,
"K": -1,
"M": -3,
"F": -4,
"P": 10,
"S": -1,
"T": -1,
"W": -4,
"Y": -3,
"V": -3
},
"S": {
"A": 1,
"R": -1,
"N": 1,
"D": 0,
"C": -1,
"Q": 0,
"E": -1,
"G": 0,
"H": -1,
"I": -3,
"L": -3,
"K": 0,
"M": -2,
"F": -3,
"P": -1,
"S": 5,
"T": 2,
"W": -4,
"Y": -2,
"V": -2
},
"T": {
"A": 0,
"R": -1,
"N": 0,
"D": -1,
"C": -1,
"Q": -1,
"E": -1,
"G": -2,
"H": -2,
"I": -1,
"L": -1,
"K": -1,
"M": -1,
"F": -2,
"P": -1,
"S": 2,
"T": 5,
"W": -3,
"Y": -2,
"V": 0
},
"W": {
"A": -3,
"R": -3,
"N": -4,
"D": -5,
"C": -5,
"Q": -1,
"E": -3,
"G": -3,
"H": -3,
"I": -3,
"L": -2,
"K": -3,
"M": -1,
"F": 1,
"P": -4,
"S": -4,
"T": -4,
"W": 15,
"Y": 2,
"V": -3
},
"Y": {
"A": -2,
"R": -1,
"N": -2,
"D": -3,
"C": -3,
"Q": -1,
"E": -2,
"G": -3,
"H": 2,
"I": -1,
"L": -1,
"K": -2,
"M": 0,
"F": 4,
"P": -3,
"S": -2,
"T": -2,
"W": 2,
"Y": 8,
"V": -1
},
"V": {
"A": 0,
"R": -3,
"N": -3,
"D": -4,
"C": -1,
"Q": -3,
"E": -3,
"G": -4,
"H": -4,
"I": 4,
"L": 1,
"K": -3,
"M": 1,
"F": -1,
"P": -3,
"S": -2,
"T": 0,
"W": -3,
"Y": -1,
"V": 5
}
}
#!/usr/bin/env python
import json
import os.path
import optparse
GAP = 1
MATCH = 2
MISMATCH = 3
START = 1
TOP = 2
LEFT = 3
TOP_LEFT = 4
parser = optparse.OptionParser()
parser.add_option("-a", "--alpha", dest="alpha", type="string", help="the first string, e.g. -a ATA")
parser.add_option("-b", "--beta", dest="beta", type="string", help="the second string, e.g. -b GATC")
parser.add_option("-M", "--match", dest="match", type="int", default=1, help="the match score, e.g. -M 1")
parser.add_option("-m", "--mismatch", dest="mismatch", type="int", default=-2, help="the mismatch score, e.g. -m 2")
parser.add_option("-G", "--gapExtension", dest="gapExtension", type="int", default=-1, help="the gap extension score, e.g. -g -1")
parser.add_option("-g", "--gapInitiation", dest="gapInitiation", type="int", default=-3, help="the gap initiation score, e.g. -g -3")
parser.add_option("-d", "--debug", dest="debug", action="store_true", default=False, help="enable debug prints and disable json result")
parser.add_option("-x", "--substitutionMatrix", dest="substitutionMatrix", help="the json with a matrix of substitutions scores", metavar="FILE")
(options, args) = parser.parse_args()
if options.alpha is None or options.beta is None or len(options.alpha) < 1 or len(options.beta) < 1:
parser.error("You need to informe the alpha and beta strings")
alphaLen = len(options.alpha) + 1
betaLen = len(options.beta) + 1
table = [[0 for b in range(betaLen)] for a in range(alphaLen)]
path = [[0 for b in range(betaLen)] for a in range(alphaLen)]
substitutionMatrix = {}
if options.substitutionMatrix is not None:
if os.path.isfile(options.substitutionMatrix):
substitutionMatrix = json.loads("".join([line for line in open(options.substitutionMatrix,"r")]).replace("\n","").replace(" ",""))
else:
parser.error('Substitution Matrix not found')
for a in range(alphaLen):
top = None
left = None
topLeft = None
hasLeft = None
hasTop = a > 0
alpha = options.alpha[a - 1]
for b in range(betaLen):
hasLeft = b > 0
if hasTop and hasLeft:
topLeft = table[a - 1][b - 1]
if (options.alpha[a - 1] in substitutionMatrix and options.beta[b - 1] in substitutionMatrix):
topLeft = topLeft + substitutionMatrix[options.alpha[a - 1]][options.beta[b - 1]]
if options.alpha[a - 1] == options.beta[b - 1]:
topLeft = topLeft + options.match
else:
topLeft = topLeft + options.mismatch
else:
topLeft = None
if hasTop:
top = table[a - 1][b]
if (hasLeft and path[a - 1][b] == TOP_LEFT) or (a == 1):
top = top + options.gapInitiation
else:
top = top + options.gapExtension
else:
top = None
if hasLeft:
left = table[a][b - 1]
if (hasTop and path[a][b - 1] == TOP_LEFT) or (b == 1):
left = left + options.gapInitiation
else:
left = left + options.gapExtension
else:
left = None
if (hasLeft and hasTop and topLeft >= top and topLeft >= left):
table[a][b] = topLeft
path[a][b] = TOP_LEFT
elif (hasTop and top >= topLeft and top >= left):
table[a][b] = top
path[a][b] = TOP
elif (hasLeft and left >= topLeft and left >= top):
table[a][b] = left
path[a][b] = LEFT
else:
table[a][b] = 0
path[a][b] = START
a = alphaLen - 1
b = betaLen - 1
result = []
merge = []
while path[a][b] != START:
if path[a][b] == TOP_LEFT:
result.append([options.alpha[a - 1], options.beta[b - 1]])
a = a - 1;
b = b - 1;
merge.append(MATCH if options.alpha[a] == options.beta[b] else MISMATCH)
elif path[a][b] == TOP:
result.append([options.alpha[a - 1], '_'])
a = a - 1;
merge.append(GAP)
elif path[a][b] == LEFT:
result.append(['_', options.beta[b - 1]])
b = b - 1;
merge.append(GAP)
alphaResult = ""
betaResult = ""
result.reverse()
merge.reverse()
for r in result:
alphaResult = alphaResult + str(r[0])
betaResult = betaResult + str(r[1])
score = table[alphaLen - 1][betaLen - 1]
if options.debug:
print ("")
print ("Options:")
print (options)
print ("")
print ("Substitution Matrix:")
print (substitutionMatrix)
print ("")
print ("Result table")
biggestDigits = 0
for i in range(len(table)):
for j in range(len(table[i])):
biggestDigits = max(biggestDigits, len(str(table[i][j])))
def printSeparator():
print ("+---" + (("+----" + ("-" * biggestDigits)) * len(table[0])) + "+")
printSeparator()
line = "| |"
base = " " * (biggestDigits + 2)
for b in " " + options.beta:
line = line + base + b + " |"
print (line)
printSeparator()
def charForPath(item):
if item == TOP_LEFT:
return '\xe2\x86\x96'
elif item == TOP:
return '\xe2\x86\x91'
elif item == LEFT:
return '\xe2\x86\x90'
else:
return '\xe2\x96\xb6'
base = " {:<1} {:>" + str(biggestDigits) + "} |"
for i in range(len(table)):
line = "| " + (" " if i == 0 else options.alpha[i - 1]) + " |"
for j in range(len(table[i])):
line = line + base.format(charForPath(path[i][j]), table[i][j])
print line
printSeparator()
print ("")
print ("Strings comparations")
print (alphaResult)
print (betaResult)
mergeString = ""
for m in range(len(merge)):
if merge[m] == GAP:
mergeString = mergeString + '_'
elif merge[m] == MISMATCH:
mergeString = mergeString + '*'
else:
mergeString = mergeString + result[m][0]
print (mergeString)
print ("")
print ("Score: " + str(score))
print ("")
else:
result = { \
"alphaResult" : alphaResult,\
"betaResult" : betaResult,\
"path" : path,\
"table" : table,\
"merge" : merge,\
"score" : score }
print json.dumps(result, separators=(',',':'))
{
"A": {
"A": 4,
"C": 1,
"T": -1,
"G": 2
},
"C": {
"A": 1,
"C": 3,
"T": 2,
"G": -3
},
"T": {
"A": -1,
"C": 2,
"T": 2,
"G": 0
},
"G": {
"A": 2,
"C": -3,
"T": 0,
"G": 4
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment