ashinzekene/middle_edge.md

## middle_edge.md

      
    Raw
  

              middle_edge.md
            
          
    Finding a Middle Edge in Alignment Graph in Linear Space Problem

Find a middle edge in the alignment graph in linear space.
Input: A match score m, a mismatch penalty ?, a gap penalty ?, and two DNA strings s and t.
Output: The maximum alignment score of s and t followed by an alignment achieving this
maximum score.
The quadratic memory required to store the entire alignment matrix can become overly massive for long DNA strings. However, keep in mind that, during the alignment algorithm, we only need two rows (i.e., linear space) at any given time in order to compute the optimal score. It
turns out that we can exploit this phenomenon to compute the entire optimal alignment in linear
space as well.
Given strings s = s₁… s_n and t = t₁… t_m define middle=n/2.
The middle column of the alignment graph between s and t is the column containing all nodes
(i,middle) for 0 ≤ i ≤ m. A longest path from source to sink in the alignment graph must cross the
middle column somewhere, and our first task is to figure out where using only O(n) memory. We
refer to a node on this longest path belonging to the middle column as a middle node. The
edge on the longest path from source to sink that starts at the middle node is referred to as the
middle edge.
Input Format: The first line of the input contains m followed by ? followed by ? (separated by
spaces), the second line of the input contains a DNA string s, and the third line of the input
contains a DNA string t.
Output Format: The middle edge in the form (i, j) (k, l), where (i, j) and (k, l) are
adjacent nodes in the alignment graph, i.e., there is an edge between these nodes.
Constraints: |s| ≤ 4,000; |t| ≤ 4,000
.
SAMPLE DATASET:

Input:
1 1 2
GAGA
GAT

Output:

(2, 2) (2, 3)
The middle edge of a blue optimal path in the alignment graph of GAGA and GAT
(shown in green) connects node (2, 2) with a node (2, 3).

TEST DATASET 1:

Input:
1 5 1
TTTT
CC

Output:
(0, 2) (0, 3)	OR	(1, 2) (1, 3)
TEST DATASET 2:

Input:
1 1 2
GAT
AT

Output:
(0, 1) (1, 2)
TEST DATASET 3:

Input:
1 1 1
TTTT
TTCTT

Output:
(2, 2) (3, 2)	OR	(3, 2) (4, 3) 
TEST DATASET 4:

Input:
1 5 1
GAACCC
G

Output:
(1, 3) (1, 4)
TEST DATASET 5:

Input:
2 3 1
ACAGT
CAT

Output:
(1, 2) (2, 3)
TEST DATASET 6:

Input:
2 5 3
T
AATCCC

Output:
(0, 0) (1, 0)	OR	(1, 0) (2, 0)	OR	(2, 0) (3, 1)