Skip to content

Instantly share code, notes, and snippets.

@seemaullal
Created July 14, 2015 18:45
Show Gist options
  • Save seemaullal/2d2c3493b5fe3a737264 to your computer and use it in GitHub Desktop.
Save seemaullal/2d2c3493b5fe3a737264 to your computer and use it in GitHub Desktop.
Longest Common Subsring

A dynamic programming approach to this problem is described below. This approach takes O(n*n) time since we are comparing the letters of a string using an n*n matrix (where n is the length of the longer string since empty characters are added to the shorter string).

A dynamic programming approach to this problem is described below. This approach takes O(n*n) time since we are comparing the letters of a string using an n*n matrix (where n is the length of the longer string since empty characters are added to the shorter string).


(click on the image to be taken to the video)

Now that we know how to think about this programming from a dynamic programming perspective, we can write some code to solve the problem. There are a few different approaches to this but generally they all have some things in common:

  • you will have a 2-d array that represents the matrix which compares the string values
  • generally, matrix[i][j] will represent a comparision between the i-th character of one string and the j-th character of the other
  • you fill this matrix in with
    • 0 when the two letters of the strings don't match
    • 1+matrix[i-1][j-1] if there is a match (special case: matrix[i-1][j-1] will be 0 if i=0 or j=0 as that means you are at the beginning of one of the strings and there is no previous value)
  • you should keep track of the current longest substring as you go fill out the matrix because you don't want to go back and go through it again to calculate the longest substring after filling it out (for efficiency reasons)

Here is a Javascript implementation of the dynamic programming implementation to this problem:

  function longestCommonSubstring(str1, str2) {
    if (!str1.length || !str2.length)
        return '';
    //we keep track of longest common substring length, 
    //where it starts, and the string lengths
    var lcs = 0, sequence = '', subStrStart = -1, lastSubStrStart = 0, len1 = str1.length, len2 = str2.length; 
    var diffs = [ ];
    for (var i = 0; i <= len1 ; i++) { //initialize diff table
        diffs.push([ ]);
        for (var j = 0; j <= len2 ; j++) {
            diffs[i][j] = 0;
        }
    }
    for (var k = 0; k < len1; k++) {
        for (var m = 0; m < len2; m++) {
            if (str1[k] === str2[m]) { //letters match
                if (m === 0 || k === 0)
                    diffs[k][m] = 1;
                else {
                    diffs[k][m] = 1 + diffs[k-1][m-1];
                }
                if (diffs[k][m] > lcs) {//new longest common substring
                    lcs  = diffs[k][m];
                    subStrStart =  k - diffs[k][m] + 1;
                    if (lastSubStrStart === subStrStart) {//continuing previous substring
                        sequence += str1[k];
                    } else { //new longest substring
                        lastSubStrStart = subStrStart;
                        sequence =  str1.substr(lastSubStrStart, 
                                    (k+1) - lastSubStrStart );
                    }
                }
            }
        }
    }
    return sequence;
}

If you are interested in learning about the linear solution to this problem, you can read about that solution here: https://en.wikipedia.org/wiki/Ukkonen%27s_algorithm or check out any of these videos: https://www.youtube.com/results?search_query=ukonnen+algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment