Skip to content

Instantly share code, notes, and snippets.

@jwilferd10
Last active October 25, 2022 01:03
Show Gist options
  • Save jwilferd10/488c627553ea7b23ff960331f631ccb1 to your computer and use it in GitHub Desktop.
Save jwilferd10/488c627553ea7b23ff960331f631ccb1 to your computer and use it in GitHub Desktop.
Let's Talk REGEX

Introducing REGEX

Regular Expression (Regex) is a sequence of characters that defines specific search patterns. When included in code or search algorithms, regular expressions can be used to find certain patterns of characters within a string, or to find and replace a character or sequence of characters within a string. They are also frequently used to validate input. Regular Expressions can also be used from the command line and in text editors in the process of finding text within a file.

Before we dive in, here are some things to note about Regular Expressions:

  • Regular expressions are almost universal
  • Syntax applies across all languages (Note: Slight variations exist, but are relatively the same)
  • Invaluable for helping computers sort through data quickly
  • May appear intimidating, but thankfully easy to understand.

Let's take a deeper look into how Regular Expressions work!

Summary

This is the following regex example that we will be working with:

/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

The above regular expression is for Matching An Email, I'll be trying my best to explain how these different parts work by explaining the Regex Components

Table of Contents

Regex Components

Anchors

Something to make note of when it comes to Anchors is that they specify a position in the string where a match must occur. An additional thing to note of is that when an Anchor is used in a search expression, the Regex doesn't advance through the string or consume characters. What it will do is search for a match in the specified position.

Taking that into consideration, here's a look at our Regex example again. Right off the bat we can see a couple of Anchors being used. Can you spot them? (Hint: That would be the ^ and $)

/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

  • ^: So the caret anchor matches at the start of the string that the regex pattern is applied to by default. (In multiline mode, it must occur at the beginning of the line.)

  • $: As seen in our example above, the dollar anchor must occur at the end of the string (In multiline mode, the anchor must occur before \n at the end of the string)

These Anchors are being used in conjunction with Quantifiers. In this example the Anchor starts at /^#? and ends at $/, whatever is inbetween the ^ and $ will be returned as a string!

Quantifiers

On the topic of Quantifiers, these specify how many instances of a prior element must be present in the string input for a match to be found. These are some examples of what a Quantifier looks like: *+?{}, below we'll bring out our example and explain where the Quantifiers are and what they are doing.

/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

At the beginning of the example the ? matches a string that has a number followed by zero or another string. Which would be anything inside this parameter [a-f0-9]{6}. The {} we see here matches the previous element exactly 6 times. Or exactly 3 times for [a-f0-9]{3}.

OR Operator

  • [a-f0-9]
  • |
  • [a-f0-9]

Above are all examples of OR Operators being used in action.

For further explanation let's dive in and look at [a-f0-9]! For further context as to what's happening here, these bracket expressions []are a set of one or more items. The a-f0-9 uses the range operator - to search from anything between a-f and also 0-9.

The | operator matches a string that has [a-f0-9]{6} followed by [a-f0-9]{3}

Character Classes

Something to note about these, a character class matches any one of a set charcacters. Some examples of Character Classes would be:

  • \d: Matches a single character that is a digit
  • \w: Matches a word character
  • \s: Matches for whitespace character
  • \S: Matches any non-white-space characters
  • .: Matches any character

Take a look at our example again: /^#?([a-f0-9]{6}|[a-f0-9]{3})$/

These [] can also be considered a Positive Character Group, these specify lists of characters. Any of which can appear in an input string for it to match. It's also worth noting that this list of characters may be specified individually, as a range, or both.

Flags

Regex may have flags that affect the search. While there are no flags to show in our example it will still be included as it is a fundamental concept. Regex usually comes with a search pattern that's boundaries (limit) by two /. You can specify a flag with these values or combine them with each other. There are only six flags specifically to JavaScript and here are four of those:

  • i: This flag indicates the search is case-insensitive. So something like /HaPpY/i which would still match with HappY
  • g: This flag indicates the search is looking for all matches (Note: Without it only the first match is returned),
  • m: This flag indicates the search is multi-lined, so ^ and $ matches the start and the end of the line.
  • s: This flag enables 'dotall' mode that will allow a . to match any newline character \n (A Character Class)

Bracket Expressions

We touched on Bracket Expressions a few times earlier. Here's a quick run down on what these expressions do again.

  • [fghi]: In this example the expressions are matching a string that has either F, G, H, or I. This works same to: f|g|h|i or [f-i]

And if you take a look at our example you'll see that happening in two places. [a-f0-9] and [a-f0-9]. This is matching anything from a-f or 0-9.

Greedy and Lazy Match

Here's some context on what Greedy and Lazy mean:

  • Greedy means it will match the longest possible string
  • Lazy means it will match the shortest possible string.

The quantifiers: *+{} are considered greedy operators. So they will expand the match as far as they can through the provided text. So in an example such as: [a-f0-9]{6} the {6} will expand as far as it's allowed, grabbing anything from a-f or 0-9 until it's requirements are met.

Author

That's a look into Regular Expression and some of it's components. This has been a very interesting journey while in the process of learning more about search algorithms. Thanks for having a read and I hope this has been informative!

Github: jwilferd10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment