Skip to content

Instantly share code, notes, and snippets.

@pete-rai
Created February 7, 2018 21:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pete-rai/27c5056fd38d484553000ed92f895895 to your computer and use it in GitHub Desktop.
Save pete-rai/27c5056fd38d484553000ed92f895895 to your computer and use it in GitHub Desktop.
A text cleansing function that is useful for preparing strings prior to lexical analysis.
<?php
function cleanse ($text)
{
$text = iconv ('UTF-8', 'ASCII//TRANSLIT//IGNORE', $text); // accented character to 'normal'
$text = preg_replace ('/[\r\n\s\t]+/xms', ' ' , $text); // normalise whitespace to one space
$text = preg_replace ('/[^\w\s]+/xms' , '' , $text); // remove all punctuation
return strtolower (trim ($text)); // lowercase and trimmed
}
?>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment