Skip to content

Instantly share code, notes, and snippets.

@arm5077
Last active January 3, 2016 01:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arm5077/8392538 to your computer and use it in GitHub Desktop.
Save arm5077/8392538 to your computer and use it in GitHub Desktop.
Snippet from Tomlin ngram project.
<?PHP
// For this example, $ngram = 1, meaning we're only looking at one word at a time and not phrases
$textArray = explode( " ", $formatted ); // $formatted is the text sample with most punctuation removed
for ( $i = 0; $i < count( $textArray ) - $ngram; $i++ ) {
$chunk = "";
for ( $j = 0; $j < $ngram; $j++ ) {
$chunk .= $textArray[ $i + $j ] . " "; // keep adding words to chunk until ngram length is reached
}
$chunk = trim( $chunk ); //get rid of extra space at the end of chunk
if ( $ngramArray[ $chunk ] == "" )
$ngramArray[ $chunk ] = 1;
else
$ngramArray[ $chunk ]++;
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment