Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
[WordPress] Occasionally, I've been parsing out data provided to me by third-parties and there have been hidden ASCII characters that can muck up programmatically inserting data into the database. Here's a simple regex for stripping out everything *except* alphanumeric characters.
// Replace anything that is not an 'a-z', 'A-Z', or '0-9' from the given $value
$value = preg_replace( "/[^a-zA-Z0-9\s]/", "", $value );
* Read the comments below to see some of the available functions WordPress provides for evaluating the validity of the characters in the input string.
* /
Copy link

KingYes commented Apr 2, 2013

And what's about Hebrew chars? or maybe Arabic, etc... ?

Copy link

thefuxia commented Apr 2, 2013

You should really fix the parser and leave the file content as it is.

Copy link

@KingYes: You're right - international characters shouldn't be ignored - but this particular regex was for a very simple, narrowly defined text file.

@Tascho: The thing is, I'm not sold that the file content being handed over was accurate.

Copy link

For those who are looking for a WordPress-based solution (which is what this particular gist was used for), there's a nice function that someone mentioned in this comment.

Specifically, wp_check_invalid_utf8 which can be found [](in the source in Trac).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment