Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
The default SearchWP Regex Whitelist
<?php
// THE DEFAULT SEARCHWP REGEX WHITELIST
private $term_pattern_whitelist = array(
// these should go from most strict to most loose
// functions
"/(\\w+?)?\\(|[\\s\\n]\\(/is",
// Date formats
"/([0-9]{4}-[0-9]{1,2}-[0-9]{1,2})/is", // date: YYYY-MM-DD
"/([0-9]{1,2}-[0-9]{1,2}-[0-9]{4})/is", // date: MM-DD-YYYY
"/([0-9]{4}\\/[0-9]{1,2}\\/[0-9]{1,2})/is", // date: YYYY/MM/DD
"/([0-9]{1,2}\\/[0-9]{1,2}\\/[0-9]{4})/is", // date: MM/DD/YYYY
// IP
"/(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})/is", // IPv4
// initials
"/\\b((?:[A-Za-z]\\.\\s{0,1})+)/isu",
// version numbers: 1.0 or 1.0.4 or 1.0.5b1
"/([a-z0-9]+(?:\\.[a-z0-9]+)+)/is",
// serial numbers
"/(\\b[-_]?[0-9a-zA-Z]+(?:[-_]+[0-9a-zA-Z]+)+[-_]?)/isu", // hyphen/underscore separator
// strings of digits
"/\\b(\\d{1,})\\b/is",
// e.g. M&M, M & M
"/\\b([[:alnum:]]+\\s?(?:&\\s?[[:alnum:]]+)+)/isu",
);
@guyinpv

This comment has been minimized.

Copy link

guyinpv commented Mar 30, 2020

The pattern under "// functions" is problematic, it has two opening parenthesis which don't close. I assume the parenz that is just before the "/is" is actually a typo, shouldn't it be a closing parenz there?

Further, why do some patterns begin with an escaped backslash follow by a 'b'? Like for a serial number it starts "/(\b". This entire pattern doesn't match unless a person types a literal "\b" into the query. So this doesn't match: "122-333-444" but this does "\b122-333-444". That doesn't make sense. If you want to whitelist people typing a serial number, why do they have to type "\b" first?

If I add my own custom regex to this whitelist, do I have to include the "\b" as well, does it have special meaning somehow? It's not used for other patterns like the date, so I don't know.

@jchristopher

This comment has been minimized.

Copy link
Owner Author

jchristopher commented Mar 30, 2020

They are escaped backslashes e.g. https://regex101.com/r/tDjIzy/1

@guyinpv

This comment has been minimized.

Copy link

guyinpv commented Mar 31, 2020

I looked up this file in the actual WordPress plugin, it doesn't use escaped backslashes, it's actually just using the code \b which is for any word boundary character. This gist isn't really accurate compared to the source file in the WP plugin.

In any case, no big deal. I just was trying to make my own pattern and couldn't figure out why anybody would ever type "\b" into a search bar, or why that would be in the pattern in the first place. And it's not, this is just representing a word boundary, it's not supposed to be two backslashes, just one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.