The default SearchWP Regex Whitelist
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
// THE DEFAULT SEARCHWP REGEX WHITELIST | |
private $term_pattern_whitelist = array( | |
// these should go from most strict to most loose | |
// functions | |
"/(\\w+?)?\\(|[\\s\\n]\\(/is", | |
// Date formats | |
"/([0-9]{4}-[0-9]{1,2}-[0-9]{1,2})/is", // date: YYYY-MM-DD | |
"/([0-9]{1,2}-[0-9]{1,2}-[0-9]{4})/is", // date: MM-DD-YYYY | |
"/([0-9]{4}\\/[0-9]{1,2}\\/[0-9]{1,2})/is", // date: YYYY/MM/DD | |
"/([0-9]{1,2}\\/[0-9]{1,2}\\/[0-9]{4})/is", // date: MM/DD/YYYY | |
// IP | |
"/(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})/is", // IPv4 | |
// initials | |
"/\\b((?:[A-Za-z]\\.\\s{0,1})+)/isu", | |
// version numbers: 1.0 or 1.0.4 or 1.0.5b1 | |
"/([a-z0-9]+(?:\\.[a-z0-9]+)+)/is", | |
// serial numbers | |
"/(\\b[-_]?[0-9a-zA-Z]+(?:[-_]+[0-9a-zA-Z]+)+[-_]?)/isu", // hyphen/underscore separator | |
// strings of digits | |
"/\\b(\\d{1,})\\b/is", | |
// e.g. M&M, M & M | |
"/\\b([[:alnum:]]+\\s?(?:&\\s?[[:alnum:]]+)+)/isu", | |
); |
They are escaped backslashes e.g. https://regex101.com/r/tDjIzy/1
I looked up this file in the actual WordPress plugin, it doesn't use escaped backslashes, it's actually just using the code \b which is for any word boundary character. This gist isn't really accurate compared to the source file in the WP plugin.
In any case, no big deal. I just was trying to make my own pattern and couldn't figure out why anybody would ever type "\b" into a search bar, or why that would be in the pattern in the first place. And it's not, this is just representing a word boundary, it's not supposed to be two backslashes, just one.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The pattern under "// functions" is problematic, it has two opening parenthesis which don't close. I assume the parenz that is just before the "/is" is actually a typo, shouldn't it be a closing parenz there?
Further, why do some patterns begin with an escaped backslash follow by a 'b'? Like for a serial number it starts "/(\b". This entire pattern doesn't match unless a person types a literal "\b" into the query. So this doesn't match: "122-333-444" but this does "\b122-333-444". That doesn't make sense. If you want to whitelist people typing a serial number, why do they have to type "\b" first?
If I add my own custom regex to this whitelist, do I have to include the "\b" as well, does it have special meaning somehow? It's not used for other patterns like the date, so I don't know.