Skip to content

Instantly share code, notes, and snippets.

@ashinzekene
Created May 3, 2021 10:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ashinzekene/576e9bc5cabeddcec7850a4da4e9b884 to your computer and use it in GitHub Desktop.
Save ashinzekene/576e9bc5cabeddcec7850a4da4e9b884 to your computer and use it in GitHub Desktop.

Finding a Middle Edge in Alignment Graph in Linear Space Problem

Find a middle edge in the alignment graph in linear space. Input: A match score m, a mismatch penalty ?, a gap penalty ?, and two DNA strings s and t. Output: The maximum alignment score of s and t followed by an alignment achieving this maximum score.

The quadratic memory required to store the entire alignment matrix can become overly massive for long DNA strings. However, keep in mind that, during the alignment algorithm, we only need two rows (i.e., linear space) at any given time in order to compute the optimal score. It turns out that we can exploit this phenomenon to compute the entire optimal alignment in linear space as well.

Given strings s = s1… sn and t = t1… tm define middle=n/2.

The middle column of the alignment graph between s and t is the column containing all nodes (i,middle) for 0 ≤ i ≤ m. A longest path from source to sink in the alignment graph must cross the middle column somewhere, and our first task is to figure out where using only O(n) memory. We refer to a node on this longest path belonging to the middle column as a middle node. The edge on the longest path from source to sink that starts at the middle node is referred to as the middle edge.

Input Format: The first line of the input contains m followed by ? followed by ? (separated by spaces), the second line of the input contains a DNA string s, and the third line of the input contains a DNA string t.

Output Format: The middle edge in the form (i, j) (k, l), where (i, j) and (k, l) are adjacent nodes in the alignment graph, i.e., there is an edge between these nodes.

Constraints: |s| ≤ 4,000; |t| ≤ 4,000

.

SAMPLE DATASET:
Input:

1 1 2
GAGA
GAT

Output:
(2, 2) (2, 3)

The middle edge of a blue optimal path in the alignment graph of GAGA and GAT (shown in green) connects node (2, 2) with a node (2, 3).

TEST DATASET 1:
Input:

1 5 1
TTTT
CC

Output: (0, 2) (0, 3) OR (1, 2) (1, 3)

TEST DATASET 2:
Input:

1 1 2
GAT
AT

Output: (0, 1) (1, 2)

TEST DATASET 3:
Input:

1 1 1
TTTT
TTCTT

Output: (2, 2) (3, 2) OR (3, 2) (4, 3)

TEST DATASET 4:
Input:

1 5 1
GAACCC
G

Output: (1, 3) (1, 4)

TEST DATASET 5:
Input:

2 3 1
ACAGT
CAT

Output: (1, 2) (2, 3)

TEST DATASET 6:
Input:

2 5 3
T
AATCCC

Output: (0, 0) (1, 0) OR (1, 0) (2, 0) OR (2, 0) (3, 1)

@ashinzekene
Copy link
Author

SAMPLE DATASET:
Input:

1 1 2
GAGA
GAT

Output:
(2, 2) (2, 3)

The middle edge of a blue optimal path in the alignment graph of GAGA and GAT
(shown in green) connects node (2, 2) with a node (2, 3).

Picture1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment