Skip to content

Instantly share code, notes, and snippets.

@soderlind
Last active June 27, 2023 03:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save soderlind/74a06f9408306cfc5de9 to your computer and use it in GitHub Desktop.
Save soderlind/74a06f9408306cfc5de9 to your computer and use it in GitHub Desktop.
Multibyte Similar Text for PHP, mb_similar_text()
<?php
/*
* solves the problem at http://stackoverflow.com/questions/31002690/how-to-use-similar-text-php-code-in-arabic, not that I know Arabic, but Norwegian also has multibyte charachters: æøåÆØÅ
*/
//from http://www.phperz.com/article/14/1029/31806.html
function mb_split_str($str) {
preg_match_all("/./u", $str, $arr);
return $arr[0];
}
//based on http://www.phperz.com/article/14/1029/31806.html, added percent
function mb_similar_text($str1, $str2, &$percent) {
$arr_1 = array_unique(mb_split_str($str1));
$arr_2 = array_unique(mb_split_str($str2));
$similarity = count($arr_2) - count(array_diff($arr_2, $arr_1));
$percent = ($similarity * 200) / (strlen($str1) + strlen($str2) );
return $similarity;
}
//demo:
//$var = mb_similar_text('عمار', 'ياسر', $per);
//output: $var = 2, $per = 25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment