Skip to content

Instantly share code, notes, and snippets.

@paulera
Last active November 16, 2020 11:14
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save paulera/7ae3024f1327950ed80ffbb56fdfd973 to your computer and use it in GitHub Desktop.
Save paulera/7ae3024f1327950ed80ffbb56fdfd973 to your computer and use it in GitHub Desktop.
How to write a regex to match and validate URL params, in no specific order.

URL Regex for SEO funnel

Here you will find a brief explanation of how to write a regex to match parameters in a URL, without a specific order.

For use in SEO to identify what users are doing and where are they going.

Filtering by URL parameters

One param with expected value

Let's say you have an URL and want it to match the following rule: must have a param called PARAM1 with value VALUE1. Other params can be present. This is the regexp:

http[s]?:\/\/www\.example\.com\?(?=.*\bPARAM1=VALUE1\b).*
                                |<------ Block1 ----->|

Two params, with expected values

If you want it to have at least PARAM1=VALUE1 and PARAM2=VALUE2:

http[s]?:\/\/www\.example\.com\?(?=.*\bPARAM1=VALUE1\b)(?=.*\bPARAM2=VALUE2\b).*
                                |<------ Block1 ----->||<------ Block2 ----->|

So, each block is basically defined as (?=.*\bPARAM=VALUE\b)

Require a parameter, regardless the value

If you want to check when a variable is present, regardless its value, just do (?=.*\bPARAM\b) for the second parameter as per the example below:

http[s]?:\/\/www\.example\.com\?(?=.*\bPARAM1=VALUE1\b)(?=.*\bPARAM2\b).*
                                |<------ Block1 ----->||<-- Block2 -->|

Multiple accepted values for a param

If you want to condition a parameter to not only one single value, but to a group of known options, wrap the value in brackets and separate them by pipe, like that: (?=.*\bPARAM3=(VALUE3|VALUE4|VALUE5)\b)

Putting all above together, we could write a URL regexp like this:

http[s]?:\/\/www\.example\.com\?(?=.*\bPARAM1=VALUE1\b)(?=.*\bPARAM2\b)(?=.*\bPARAM3=(VALUE3|VALUE4|VALUE5)\b).*
                                |<------ Block1 ----->||<-- Block2 -->||<------------- Block3 -------------->|

Which means:

  • Block1 = (?=.*\bPARAM1=VALUE1\b) - there must a parameter PARAM1, equals to VALUE1
  • Block2 = (?=.*\bPARAM2\b) - there must a parameter PARAM2, regardless the value. It must be present though.
  • Block3 = (?=.*\bPARAM3=(VALUE3|VALUE4|VALUE5)\b) - there must be a parameter PARAM3, and its value can only be one of VALUE3, VALUE4 or VALUE5

So far, all blocks are mandatory (the condition between them is like an AND). All of them must be present for the URL to be accepted.

This OR that parameter

It is possible, though, to define OR relations using brackets () and pipe | for blocks, the same way we did above for values, the following way:

http[s]?:\/\/www\.example\.com\?((?=.*\bPARAM1=VALUE1\b)(?=.*\bPARAM2\b)|(?=.*\bPARAM3=BLABLABLA\b)).*
                                 |                     ||              | |                        |
                                 |<------ Block1 ----->||<-- Block2 -->| |<-------- Block ------->|

So, here

  • Block1 = (?=.*\bPARAM1=VALUE1\b)
  • Block2 = (?=.*\bPARAM2\b)
  • Block3 = (?=.*\bPARAM3=BLABLABLA\b)

Notice that what is written is pretty much: http[s]?:\/\/www\.example\.com\?( Block1 Block2 | Block3 ).* Which in essence means: ( Block1 AND Block2 ) OR ( Block3 )

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment