Skip to content

Instantly share code, notes, and snippets.

@jasonm23
Last active August 26, 2023 04:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jasonm23/a6e68e9546e5a487b3498f44e8039718 to your computer and use it in GitHub Desktop.
Save jasonm23/a6e68e9546e5a487b3498f44e8039718 to your computer and use it in GitHub Desktop.
Levenshtein Distance Algorithm in plain english

Levenshtein Distance Algorithm in plain English

Inputs: Two strings s and t.

Goal: Calculate the minimum number of single-character edits (insertions, deletions, or substitutions) required to transform string s into string t.

  1. Initialize the Matrix:

    • Create a matrix dp with dimensions (len(s) + 1) x (len(t) + 1).
    • Initialize the first row with values 0 to len(t) and the first column with values 0 to len(s).
  2. Fill in the Matrix:

    • For each cell dp[i][j] in the matrix, where i represents the current position in string s and j represents the current position in string t:
      • If the characters s[i-1] and t[j-1] are equal, set dp[i][j] to the value of the cell diagonally above-left (dp[i-1][j-1]).
      • Otherwise, set dp[i][j] to the minimum of the three adjacent cells' values (dp[i-1][j] + 1, dp[i][j-1] + 1, and dp[i-1][j-1] + 1).
  3. Result:

    • The Levenshtein Distance between strings s and t is given by the value in the bottom-right cell of the matrix (dp[len(s)][len(t)]).

Applications:

  • Levenshtein Distance is often used in spell checking, DNA sequence alignment, and natural language processing tasks.
  • It can also be used in applications where similarity between strings needs to be measured, such as text search engines and plagiarism detection.

Advantages:

  • The Levenshtein Distance algorithm provides a quantitative measure of the difference between two strings.
  • It is flexible and can handle strings of different lengths.

Limitations:

  • The algorithm has a time complexity of O(m * n) where m is the length of string s and n is the length of string t. It can be computationally expensive for very long strings.

The Levenshtein Distance algorithm is a fundamental tool for measuring the similarity and difference between strings by quantifying the number of edit operations needed to transform one string into another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment