Skip to content

Instantly share code, notes, and snippets.

@jaywilliams
Created May 28, 2009 19:25
Show Gist options
  • Star 28 You must be signed in to star a gist
  • Fork 22 You must be signed in to fork a gist
  • Save jaywilliams/119517 to your computer and use it in GitHub Desktop.
Save jaywilliams/119517 to your computer and use it in GitHub Desktop.
This simple function will remove any non-ASCII character. Feel free to fork and extend!
<?php
/**
* Remove any non-ASCII characters and convert known non-ASCII characters
* to their ASCII equivalents, if possible.
*
* @param string $string
* @return string $string
* @author Jay Williams <myd3.com>
* @license MIT License
* @link http://gist.github.com/119517
*/
function convert_ascii($string)
{
// Replace Single Curly Quotes
$search[] = chr(226).chr(128).chr(152);
$replace[] = "'";
$search[] = chr(226).chr(128).chr(153);
$replace[] = "'";
// Replace Smart Double Curly Quotes
$search[] = chr(226).chr(128).chr(156);
$replace[] = '"';
$search[] = chr(226).chr(128).chr(157);
$replace[] = '"';
// Replace En Dash
$search[] = chr(226).chr(128).chr(147);
$replace[] = '--';
// Replace Em Dash
$search[] = chr(226).chr(128).chr(148);
$replace[] = '---';
// Replace Bullet
$search[] = chr(226).chr(128).chr(162);
$replace[] = '*';
// Replace Middle Dot
$search[] = chr(194).chr(183);
$replace[] = '*';
// Replace Ellipsis with three consecutive dots
$search[] = chr(226).chr(128).chr(166);
$replace[] = '...';
// Apply Replacements
$string = str_replace($search, $replace, $string);
// Remove any non-ASCII Characters
$string = preg_replace("/[^\x01-\x7F]/","", $string);
return $string;
}
?>
@mattweg-zz
Copy link

Jay,

Thanks for this. I have spent the last couple of days looking for a clean way to sanitize some data before sending it up to a web service that triggers an error every time it encounters a non-utf-8 character. I'll fork it if I need to add any extra characters.

-Matt

@eeertekin
Copy link

Thanx dude, it saved my day :)

@optionsninja
Copy link

YOU DA MAN!!!!

@Rudis1261
Copy link

Jay you are the man, I have been struggling with content I have been pulling from a third party vendor and this sorted it out. THANK YOU!

@kubulai
Copy link

kubulai commented Nov 20, 2013

Nice.

@Techbrunch
Copy link

Thanks.

@sunylyons
Copy link

Very nice and thank you

@borantula
Copy link

that was helpful, thank you.

@lordgiotto
Copy link

Thank you very much, so useful! Exactly what i needed :)

@vunguyen-it
Copy link

Finally found what I need to solve my problem, very nice, thank you so much

@HassanKrayem
Copy link

Thanks for your sharing.

@Rudis1261
Copy link

:-D when you re-discover the only good solution the internet has to offer, ty

@paulintrognon
Copy link

paulintrognon commented Feb 20, 2020

Another great solution that will cover every edge cases (é will be turned into e, etc)

iconv('UTF-8', 'ASCII//TRANSLIT', $string);

(from https://stackoverflow.com/a/3542748/1822742 )

@jaywilliams
Copy link
Author

@paulintrognon, that's correct, however if you have some invalid UTF-8 input, it will error out. So that only works with valid unicode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment