I'm going ultra basic here as regex tends to be quite daunting when you're not familiar with it, my apologies if you already know most of this.
NOTE: a great tool for working with regexes is regex101. It contains a test-suit, regex analysis and basic documentation of regex special chars.
Do you know the exact keyword beforehand? and is the URL the full string (so no leading/trailing text?)
If it's always the same keyword (eg. filter
), you can use String.prototype.match and a few capture-groups to neatly prepare it:
A basic regex could look like /^(.*)(\/filter\/.*)$/
where:
^
-> an anchor for the start of the string (so the match MUST start at index 0)(.*)
-> the first capturing group (anything before/filter/
).
-> match any non-linebreak char*
-> repeat 0 or more times
(\/<your keyword>\/.*)
-> second capturing group (match anything from/filter/
until end of string)\/filter\/
-> it's important you escape forward slashes (/
) inside a regexp literal, otherwise they will terminate the expression and probably fail to compile..*
-> like above (just matches anything)
$
-> anchor for the end of string (so the whole match MUST include the entire string)
A basic function could look like this:
// NOTE: if you have to do this very often, you should declare
// the regexp outside the function and reuse it for a bit better performance
const matcher = `/^(.*)(\/filter\/.*)$/`
/**
* @param {string} url the url to process
* @returns {{ main: string, stripped: string}}
*/
function splitURL (url) {
const match = url.match(matcher)
return { main: match[1], stripped: match[2] }
}
String.prototype.match(regexp: RegExp)
can be a bit confusing if you're not used to it. But it's not that complicated. using the example url and regex:
('http://example.com/category/subcat/filter/size/1/').match(/^(.*)(\/filter\/.*)$/)
Will return a RegExpMatchArray like this:
[
'http://example.com/category/subcat/filter/size/1/', // <-- index 0, the full match (in this example it's the entire string)
'http://example.com/category/subcat', // <- index 1, the first capture group (`(.*)`)
'/filter/size/1/', // <- index 2, the second group (`(\/filter\/.*)`)
index: 0, // <-- key `index`, the starting index of the match (in this case 0, the start of the string)
input: 'http://example.com/category/subcat/filter/size/1/', // <- key `input`, the string on which `String.prototype.match` was called
groups: undefined // <- key `groups`, an object that stores the named capture groups and their value. (here undefined since we didn't have any named groups)
]
The way your average console.log
displays it is a little odd, so to crearify:
- we have a normal Array with 3 items:
[
'http://example.com/category/subcat/filter/size/1/filter/',
'http://example.com/category/subcat/filter/size/1',
'/filter/'
]
with 3 additional properties added to it:
index: 0
input: 'http://example.com/category/subcat/filter/size/1/filter/'
groups: undefined
so as a regular object it would be displayed as:
{
length: 3
0: 'http://example.com/category/subcat/filter/size/1/filter/',
1: 'http://example.com/category/subcat/filter/size/1',
2: '/filter/',
index: 0,
input: 'http://example.com/category/subcat/filter/size/1/filter/',
groups: undefined
}
In the above function we just pick index 1
and 2
from the Match Array and return them in an object.
We could use named capture groups too: /^(?<main>.*)(?<stripped>\/filter\/.*)$/
this way we can just do:
const matcher = /^(?<main>.*)(?<stripped>\/filter\/.*)$/
function splitURL (url) {
return url.match(matcher).groups
}
using that on the example url will return basically the same array, but now with a groups
property, which we can then return:
[
'http://example.com/category/subcat/filter/size/1/',
'http://example.com/category/subcat', // <- named groups are still indexed the same way they were without a name
'/filter/size/1/',
index: 0,
input: 'http://example.com/category/subcat/filter/size/1/',
groups: [Object: null prototype] { // <- we can return this object and save the picking we did before
main: 'http://example.com/category/subcat',
stripped: '/filter/size/1/'
}
]
-
the
groups
object has a prototype ofnull
, so it doesn't have any of the methods a normal object would (eg.toString
orhasOwnProperty
). Trying to call one of those will throw an error along the lines ofundefined is not a function
-
if the keyword isn't static, but you know it by the time you get the url, you can always use the RegExp constructor and a template literal, eg.
const matcher = new RegExp(`^(?<main>.*)(?<stripped>\/${yourKeywordVariable}\/.*)$`)
- the example regex here works for "best case" scenarios. It will fail when, for example the url has the keyword twice eg.
'http://example.com/category/subcat/filter/size/1/filter/'
. in this case the above functions would return:
{
main: 'http://example.com/category/subcat/filter/size/1',
stripped: '/filter/'
}
this could be fixed with conditionals in the regex like lookahead/lookbehind, but the exact form will depend on what the exact usecase is. it's usually not worth it to make a catch-all regex unless it's actually needed.