[WordPress] Occasionally, I've been parsing out data provided to me by third-parties and there have been hidden ASCII characters that can muck up programmatically inserting data into the database. Here's a simple regex for stripping out everything *except* alphanumeric characters.
For those who are looking for a WordPress-based solution (which is what this particular gist was used for), there's a nice function that someone mentioned in this comment.
Specifically, wp_check_invalid_utf8
which can be found [http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L499](in the source in Trac).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@KingYes: You're right - international characters shouldn't be ignored - but this particular regex was for a very simple, narrowly defined text file.
@Tascho: The thing is, I'm not sold that the file content being handed over was accurate.