Skip to content

Instantly share code, notes, and snippets.

@pete-rai
Created February 7, 2018 21:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pete-rai/c0c0a462f2f3edf144c9a42f58164d2c to your computer and use it in GitHub Desktop.
Save pete-rai/c0c0a462f2f3edf144c9a42f58164d2c to your computer and use it in GitHub Desktop.
Log-likelihood is a statistical technique that helps identify significant words in a given body of text when compared with a wider corpus. More information at: https://github.com/pete-rai/words-of-our-culture#log-likelihood
<?php
// for more info see : http://ucrel.lancs.ac.uk/llwizard.html
// $n1 = total words in corpus 1 (usually the normative corpus)
// $n2 = total words in corpus 2
// $o1 = observed count for the word in corpus 1 (usually the normative corpus)
// $o2 = observed count for the word in corpus 2
function logLikelihood ($n1, $o1, $n2, $o2)
{
$ll = 0;
if ($o1 && $o2)
{
// calculate expected values
$e1 = $n1 * ($o1 + $o2) / ($n1 + $n2); // expected counts in corpus 1
$e2 = $n2 * ($o1 + $o2) / ($n1 + $n2); // expected counts in corpus 2
// calculate log likelihood
$ll = (2 * (($o1 * log ($o1 / $e1)) + ($o2 * log ($o2 / $e2))));
}
return $ll;
}
?>
@pete-rai
Copy link
Author

pete-rai commented Feb 7, 2018

You can see Log-likelihood in action in my project Words of our Culture. Click here for a demo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment