Here you will find a brief explanation of how to write a regex to match parameters in a URL, without a specific order.
For use in SEO to identify what users are doing and where are they going.
Let's say you have an URL and want it to match the following rule: must have a param called PARAM1 with value VALUE1. Other params can be present. This is the regexp:
http[s]?:\/\/www\.example\.com\?(?=.*\bPARAM1=VALUE1\b).*
|<------ Block1 ----->|
If you want it to have at least PARAM1=VALUE1
and PARAM2=VALUE2
:
http[s]?:\/\/www\.example\.com\?(?=.*\bPARAM1=VALUE1\b)(?=.*\bPARAM2=VALUE2\b).*
|<------ Block1 ----->||<------ Block2 ----->|
So, each block is basically defined as (?=.*\bPARAM=VALUE\b)
If you want to check when a variable is present, regardless its value, just do (?=.*\bPARAM\b)
for the second
parameter as per the example below:
http[s]?:\/\/www\.example\.com\?(?=.*\bPARAM1=VALUE1\b)(?=.*\bPARAM2\b).*
|<------ Block1 ----->||<-- Block2 -->|
If you want to condition a parameter to not only one single value, but to a group of known options, wrap the value in
brackets and separate them by pipe, like that: (?=.*\bPARAM3=(VALUE3|VALUE4|VALUE5)\b)
Putting all above together, we could write a URL regexp like this:
http[s]?:\/\/www\.example\.com\?(?=.*\bPARAM1=VALUE1\b)(?=.*\bPARAM2\b)(?=.*\bPARAM3=(VALUE3|VALUE4|VALUE5)\b).*
|<------ Block1 ----->||<-- Block2 -->||<------------- Block3 -------------->|
Which means:
- Block1 =
(?=.*\bPARAM1=VALUE1\b)
- there must a parameterPARAM1
, equals toVALUE1
- Block2 =
(?=.*\bPARAM2\b)
- there must a parameterPARAM2
, regardless the value. It must be present though. - Block3 =
(?=.*\bPARAM3=(VALUE3|VALUE4|VALUE5)\b)
- there must be a parameterPARAM3
, and its value can only be one ofVALUE3, VALUE4 or VALUE5
So far, all blocks are mandatory (the condition between them is like an AND
). All of them must be present for the
URL to be accepted.
It is possible, though, to define OR
relations using brackets ()
and pipe |
for blocks, the same way we did above
for values, the following way:
http[s]?:\/\/www\.example\.com\?((?=.*\bPARAM1=VALUE1\b)(?=.*\bPARAM2\b)|(?=.*\bPARAM3=BLABLABLA\b)).*
| || | | |
|<------ Block1 ----->||<-- Block2 -->| |<-------- Block ------->|
So, here
- Block1 =
(?=.*\bPARAM1=VALUE1\b)
- Block2 =
(?=.*\bPARAM2\b)
- Block3 =
(?=.*\bPARAM3=BLABLABLA\b)
Notice that what is written is pretty much: http[s]?:\/\/www\.example\.com\?( Block1 Block2 | Block3 ).*
Which in essence means: ( Block1 AND Block2 ) OR ( Block3 )
(?=abc)
: positive lookahead\b
: word boundaries