Skip to content

Instantly share code, notes, and snippets.

@DenesKellner
Last active February 21, 2023 22:39
Show Gist options
  • Save DenesKellner/7da33c1c927731faee15b7f82b3cc63b to your computer and use it in GitHub Desktop.
Save DenesKellner/7da33c1c927731faee15b7f82b3cc63b to your computer and use it in GitHub Desktop.
Text comparison by words, returning a similarity rate between 0..1 - 0 for nothing like it, 1 for exact match. Depending on how you use it, some prefiltering may be needed for punctuation & case & stuff.
<?php
function wordSimilarity($s1,$s2) {
$wordsof = function($s) {
$a=[];foreach(explode(" ",$s)as $w) if($w) $a[$w]=1;
return $a;
};
$w1 = $wordsof($s1); if(!$w1) return 0;
$w2 = $wordsof($s2); if(!$w2) return 0;
$allWords = "";
$allWords.= join("",array_keys($w1));
$allWords.= join("",array_keys($w2));
$totalLen = max(strlen($allWords),1);
$charDiff = 0;
foreach($w1 as $word=>$x) if(!isset($w2[$word])) $charDiff+=strlen($word);
foreach($w2 as $word=>$x) if(!isset($w1[$word])) $charDiff+=strlen($word);
return 1-($charDiff/$totalLen);
}
@DenesKellner
Copy link
Author

DenesKellner commented Feb 21, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment