KilianKilmister/stackoverflow.regex-or-substring-operation-to-strip-out-a-url-from-a-keyword-onwards.md

## stackoverflow.regex-or-substring-operation-to-strip-out-a-url-from-a-keyword-onwards.md

      
    Raw
  

              stackoverflow.regex-or-substring-operation-to-strip-out-a-url-from-a-keyword-onwards.md
            
          
    Beforehand

I'm going ultra basic here as regex tends to be quite daunting when you're not familiar with it, my apologies if you already know most of this.
NOTE: a great tool for working with regexes is regex101. It contains a test-suit, regex analysis and basic documentation of regex special chars.
Circumstance

Do you know the exact keyword beforehand? and is the URL the full string (so no leading/trailing text?)
If it's always the same keyword (eg. filter), you can use String.prototype.match and a few capture-groups to neatly prepare it:
Basic Regex

A basic regex could look like /^(.*)(\/filter\/.*)$/ where:

^ -> an anchor for the start of the string (so the match MUST start at index 0)
(.*) -> the first capturing group (anything before /filter/)

. -> match any non-linebreak char
* -> repeat 0 or more times


(\/<your keyword>\/.*) -> second capturing group (match anything from /filter/ until end of string)

\/filter\/ -> it's important you escape forward slashes (/) inside a
regexp literal, otherwise they will terminate the expression and probably fail
to compile.
.* -> like above (just matches anything)


$ -> anchor for the end of string (so the whole match MUST include the entire string)

Basic matcher function

A basic function could look like this:
// NOTE: if you have to do this very often, you should declare
// the regexp outside the function and reuse it for a bit better performance
const matcher = `/^(.*)(\/filter\/.*)$/`

/**
 * @param {string} url the url to process
 * @returns {{ main: string, stripped: string}}
 */
function splitURL (url) {
  const match = url.match(matcher)
  return { main: match[1], stripped: match[2] }
}
Explanation

String.prototype.match(regexp: RegExp) can be a bit confusing if you're not used to it. But it's not that complicated. using the example url and regex:
('http://example.com/category/subcat/filter/size/1/').match(/^(.*)(\/filter\/.*)$/)
Will return a RegExpMatchArray like this:
[
  'http://example.com/category/subcat/filter/size/1/', // <-- index 0, the full match (in this example it's the entire string)
  'http://example.com/category/subcat', // <- index 1, the first capture group (`(.*)`)
  '/filter/size/1/', // <- index 2, the second group (`(\/filter\/.*)`)
  index: 0, // <-- key `index`, the starting index of the match (in this case 0, the start of the string)
  input: 'http://example.com/category/subcat/filter/size/1/', // <- key `input`, the string on which `String.prototype.match` was called
  groups: undefined // <- key `groups`, an object that stores the named capture groups and their value. (here undefined since we didn't have any named groups)
]
The way your average console.log displays it is a little odd, so to crearify:

we have a normal Array with 3 items:

[ 
  'http://example.com/category/subcat/filter/size/1/filter/',
  'http://example.com/category/subcat/filter/size/1',
  '/filter/'
]
with 3 additional properties added to it:

index: 0
input: 'http://example.com/category/subcat/filter/size/1/filter/'
groups: undefined

so as a regular object it would be displayed as:
{
  length: 3
  0: 'http://example.com/category/subcat/filter/size/1/filter/',
  1: 'http://example.com/category/subcat/filter/size/1',
  2: '/filter/',
  index: 0,
  input: 'http://example.com/category/subcat/filter/size/1/filter/',
  groups: undefined
}
In the above function we just pick index 1 and 2 from the Match Array and return them in an object.
Named Capture Groups

We could use named capture groups too: /^(?<main>.*)(?<stripped>\/filter\/.*)$/
this way we can just do:
const matcher = /^(?<main>.*)(?<stripped>\/filter\/.*)$/
function splitURL (url) {
  return url.match(matcher).groups
}
using that on the example url will return basically the same array, but now with a groups property, which we can then return:
[
  'http://example.com/category/subcat/filter/size/1/',
  'http://example.com/category/subcat', // <- named groups are still indexed the same way they were without a name
  '/filter/size/1/',
  index: 0,
  input: 'http://example.com/category/subcat/filter/size/1/',
  groups: [Object: null prototype] { // <- we can return this object and save the picking we did before
    main: 'http://example.com/category/subcat',
    stripped: '/filter/size/1/'
  }
]
A few Notes


the groups object has a prototype of null, so it doesn't have any of the methods a normal object would (eg. toString or hasOwnProperty). Trying to call one of those will throw an error along the lines of undefined is not a function


if the keyword isn't static, but you know it by the time you get the url, you can always use the RegExp constructor and a template literal, eg.


const matcher = new RegExp(`^(?<main>.*)(?<stripped>\/${yourKeywordVariable}\/.*)$`)

the example regex here works for "best case" scenarios. It will fail when, for example the url has the keyword twice eg. 'http://example.com/category/subcat/filter/size/1/filter/'. in this case the above functions would return:

{
  main: 'http://example.com/category/subcat/filter/size/1',
  stripped: '/filter/'
}
this could be fixed with conditionals in the regex like lookahead/lookbehind, but the exact form will depend on what the exact usecase is. it's usually not worth it to make a catch-all regex unless it's actually needed.