Skip to content

Instantly share code, notes, and snippets.

@tommcfarlin
Last active August 2, 2019 22:37
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save tommcfarlin/5264790 to your computer and use it in GitHub Desktop.
Save tommcfarlin/5264790 to your computer and use it in GitHub Desktop.
[WordPress] Occasionally, I've been parsing out data provided to me by third-parties and there have been hidden ASCII characters that can muck up programmatically inserting data into the database. Here's a simple regex for stripping out everything *except* alphanumeric characters.
<?php
// Replace anything that is not an 'a-z', 'A-Z', or '0-9' from the given $value
$value = preg_replace( "/[^a-zA-Z0-9\s]/", "", $value );
/*
* Read the comments below to see some of the available functions WordPress provides for evaluating the validity of the characters in the input string.
* /
@thefuxia
Copy link

thefuxia commented Apr 2, 2013

You should really fix the parser and leave the file content as it is.

@tommcfarlin
Copy link
Author

@KingYes: You're right - international characters shouldn't be ignored - but this particular regex was for a very simple, narrowly defined text file.

@Tascho: The thing is, I'm not sold that the file content being handed over was accurate.

@tommcfarlin
Copy link
Author

For those who are looking for a WordPress-based solution (which is what this particular gist was used for), there's a nice function that someone mentioned in this comment.

Specifically, wp_check_invalid_utf8 which can be found [http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L499](in the source in Trac).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment