Skip to content

Instantly share code, notes, and snippets.

@jchristopher
Created January 23, 2015 16:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jchristopher/2c3159c8052191b8e9ac to your computer and use it in GitHub Desktop.
Save jchristopher/2c3159c8052191b8e9ac to your computer and use it in GitHub Desktop.
The default SearchWP Regex Whitelist
<?php
// THE DEFAULT SEARCHWP REGEX WHITELIST
private $term_pattern_whitelist = array(
// these should go from most strict to most loose
// functions
"/(\\w+?)?\\(|[\\s\\n]\\(/is",
// Date formats
"/([0-9]{4}-[0-9]{1,2}-[0-9]{1,2})/is", // date: YYYY-MM-DD
"/([0-9]{1,2}-[0-9]{1,2}-[0-9]{4})/is", // date: MM-DD-YYYY
"/([0-9]{4}\\/[0-9]{1,2}\\/[0-9]{1,2})/is", // date: YYYY/MM/DD
"/([0-9]{1,2}\\/[0-9]{1,2}\\/[0-9]{4})/is", // date: MM/DD/YYYY
// IP
"/(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})/is", // IPv4
// initials
"/\\b((?:[A-Za-z]\\.\\s{0,1})+)/isu",
// version numbers: 1.0 or 1.0.4 or 1.0.5b1
"/([a-z0-9]+(?:\\.[a-z0-9]+)+)/is",
// serial numbers
"/(\\b[-_]?[0-9a-zA-Z]+(?:[-_]+[0-9a-zA-Z]+)+[-_]?)/isu", // hyphen/underscore separator
// strings of digits
"/\\b(\\d{1,})\\b/is",
// e.g. M&M, M & M
"/\\b([[:alnum:]]+\\s?(?:&\\s?[[:alnum:]]+)+)/isu",
);
@guyinpv
Copy link

guyinpv commented Mar 30, 2020

The pattern under "// functions" is problematic, it has two opening parenthesis which don't close. I assume the parenz that is just before the "/is" is actually a typo, shouldn't it be a closing parenz there?

Further, why do some patterns begin with an escaped backslash follow by a 'b'? Like for a serial number it starts "/(\b". This entire pattern doesn't match unless a person types a literal "\b" into the query. So this doesn't match: "122-333-444" but this does "\b122-333-444". That doesn't make sense. If you want to whitelist people typing a serial number, why do they have to type "\b" first?

If I add my own custom regex to this whitelist, do I have to include the "\b" as well, does it have special meaning somehow? It's not used for other patterns like the date, so I don't know.

@jchristopher
Copy link
Author

They are escaped backslashes e.g. https://regex101.com/r/tDjIzy/1

@guyinpv
Copy link

guyinpv commented Mar 31, 2020

I looked up this file in the actual WordPress plugin, it doesn't use escaped backslashes, it's actually just using the code \b which is for any word boundary character. This gist isn't really accurate compared to the source file in the WP plugin.

In any case, no big deal. I just was trying to make my own pattern and couldn't figure out why anybody would ever type "\b" into a search bar, or why that would be in the pattern in the first place. And it's not, this is just representing a word boundary, it's not supposed to be two backslashes, just one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment