Skip to content

Instantly share code, notes, and snippets.

@vcristian1
Created August 8, 2022 18:33
Show Gist options
  • Save vcristian1/b316e4707f0ac6ae0fd59a84259ed95d to your computer and use it in GitHub Desktop.
Save vcristian1/b316e4707f0ac6ae0fd59a84259ed95d to your computer and use it in GitHub Desktop.
Regex Tutorial

Regex Tutorial

Hello!

This is a Regular Expressions or Regex tutorial, focused on summarizing and breaking down a regex function's syntax in order to understand how it is used to search through text.

Summary

The Regular Expression that will be dissected further in this tutorial is /\d{3}[ -]?\d{3}[ -]?\d{4}/gm which checks/validates for phone numbers in text. Here, the syntax will be used check the text for any phone numbers by the three groups created using "\d" which is a metacharacter that stands for digit 0-9. These groups may be seperated by either a whitespace or a dash which is why we see "[ -]?" in the regex function. This is included because a user can enter a phone number in the text using either a dash, whitespace, or no space in between the groups of numbers. The first group and second group are supposed to match 3 digits in a row with with either whitespace, dash, or no space between each group if present, while the last group matches 4 digits in a row only.

Table of Contents

Regex Components

Anchors

Anchors are a special sequences which match an empty substring. There were no anchors used in the Regular Expression created example above, however below we define examples of anchor atoms in regex and how they are used.

  • ^ matches at the beginning of the target string
  • $ matches at the end of the target string

Quantifiers

Quantifiers are used to generate unbounded matching possibilities and other matching amount specifications in a Regular Expression. We use the "?" quantifier multiple times in this regex, and it is used to search text and see if a space or dash is used once or 0 times. This helps us cover our bases so if a user types a phone number with either all dashes, all spaces, or no spaces it will still read the phone number.

  • * represents 0 or more occurrences of the atom
  • + represents 1 or more occurrence of the atom
  • ? represents 0 or 1 occurrences of the atom

Grouping Constructs

Grouping Constructs in Regex are similar to what a parenthetical statement is in Math, more specifically math operations. We use grouping to bind expressions together to evaluate specific information in a specific order, similar to bracket notation when iterating over indeces in an array. In the Regex example used, we see grouping constructs after \d{3}, \d{3}, and \d{4} this is because a user AFTER inputting the three digits may either input a space, dash, or no space at all so we use groups to check if this is true.

Bracket Expressions

The bracket [] expression within the construct groups created evaluate criteria together and will consider all data in the [] for matching. In the Regex example used, we see bracket expressions used after d{3}, d{3}, and d{4}. [ -]? is used to to look for a space or dash in the text after the three digits.

Character Classes

Furthermore, a character class is a set of characters enclosed within brackets. It specifies the characters that will successfully match a single character from a given input string. The character classes used in the regex example, are an empty space and a dash [ -] to account for an empty space or dash the user may input after the 3 digits.

The OR Operator

The OR operator " | " operates the same as we would use it in a conditional statement within a javascript function. It is used to target one character or anothwe character if it is present. There were no OR operators used in the Regular Expression example above.

Ex: /t|T/g Search for t or T in the text.

Flags

Flags are a part of regular expression that consists of optional flags added on to the end of the regex: g , i , m , u , s , y . Without flags regex will not function properly considering each flag has its own unique use. The "m" after the regex example used is an option or flag that performs a multi line search, looking in each line of the whole string and returning all of the matches. In the event the "g" flag is added to the regex, the search looks for all matches as without it – only the first match is returned.

Character Escapes

Character escapes are special sequences representing commonly used character sets such as: \w, \W, \d, \D, \s, or \S. Backslashes are used ( \ ) to escape the special regex characters. If they are not used the regex function does not operate properly.

Ex: /\d{3}/g Match any digits that are three in a row.

Ex: /d{3}/g Match any characters with "d" three times in a row.

Here we see if the backslash is not used as a Character Escape, the regex created will only look for the character d in the text if it is 3 times in a row.

Author

Chicago Native who hopes to live somewhere else other than Illinois! Aside from building web applications that can make our lives easier, I love to eat great food, spend time with my family, and travel with my fiancee! Github

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment