Skip to content

Instantly share code, notes, and snippets.

@tmplt
Last active February 9, 2017 00:58
Show Gist options
  • Save tmplt/60ec8e3923bf6c09ffff655fbc3c3485 to your computer and use it in GitHub Desktop.
Save tmplt/60ec8e3923bf6c09ffff655fbc3c3485 to your computer and use it in GitHub Desktop.
C++ levenshtein edit distance

When working on fuzzywuzzy I required a C++ implementation of the Levenshtein edit distance. I found different ways to implement this, so I took a bit of everything, and tried to simplify it as much as I could.

Majority of credit goes to Martin Ettl, for his submission on Rosetta Code.

Licese

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to http://unlicense.org

#include <vector> // std::vector
#include <utility> // std::pair
#include "levenshtein_edit_distance.hpp"
#include "utils.hpp"
using std::experimental::string_view;
using std::vector;
namespace algorithm {
size_t algorithm::levenshtein_distance(const string_view &s1, const string_view &s2)
{
size_t len1 = s1.size();
size_t len2 = s2.size();
size_t i, j;
/* Base case: a string is empty. */
if (len1 == 0) return len2;
if (len2 == 0) return len1;
/*
* Initialize rows[i].first (the previous row of distances).
* This first row represents the edit distance againts an
* empty string, so the distance is just the numbers of
* characters to delete from the non-empty string.
*/
vector<std::pair<size_t, size_t>> rows(lenb + 1);
for (i = 0; i < rows.size(); i++)
rows[i].first = i;
for (i = 0; i < lena; i++) {
/* Edit distance is delete (i+1) chars from non-empty to empty. */
rows[0].second = i + 1;
/* Fill in the rest of the row. */
for (j = 0; j < lenb; j++) {
auto cost = (s1[i] == s2[j]) ? 0 : 1;
rows[j + 1].second = utils::min(rows[j].second + 1,
rows[j + 1].first + 1,
rows[j].first + cost);
}
/*
* Copy the current row to the previous
* one in preparation for next iteration.
*/
for (auto &row : rows)
row.first = row.second;
}
return rows[lenb].second;
}
}
#pragma once
#include <experimental/string_view> // std::experimental::string_view
namespace algorithm {
size_t levenshtein_dist(const string_view &str1, const string_view &str2);
}
#pragma once
#include <algorithm> // std::min_element()
#include <vector> // std::vector
/*
* An "extension" of std::min() so that more than two arguments
* can be passed. The first argument decides What everything else
* is casted too.
*
* Hopefully the compiler will complain if we pass this something stupid.
*/
template <typename T, typename... Args>
constexpr auto min(T first, Args... args)
{
static_assert(sizeof...(Args) > 1, "use std::min() instead, since it's only two arguments.");
std::vector<T> vec = {first, static_cast<T>(args)...};
auto min = std::min_element(vec.begin(), vec.end());
return *min;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment