Skip to content

Instantly share code, notes, and snippets.

@Am0031
Last active August 2, 2022 00:18
Show Gist options
  • Save Am0031/823c0567c43adbad4a539f8affe3587b to your computer and use it in GitHub Desktop.
Save Am0031/823c0567c43adbad4a539f8affe3587b to your computer and use it in GitHub Desktop.
regex-tutorial.md

Regex tutorial

This gist explains how to understand the structure of a regex.

What Is a Regex?

A regex, which is short for regular expression, is a sequence of characters that defines a specific search pattern. When included in code or search algorithms, regular expressions can be used to find certain patterns of characters within a string, or to find and replace a character or sequence of characters within a string. They are also frequently used to validate input.

Table of Contents

Regex Components

Anchors

Anchors are symbols that help group regex components to define what part of the input they will look to validate.

^ when used as the first character in the regex, this means that it will look to match the beginning of the input with the characters that follow it. For example: ^abc will look at the input and check that it starts with "abc".

$ this symbol means it will look to match the end of the input with the characters that precede it. For example, end$ will at the input and checks that it ends with "end"

Quantifiers

Quantifiers are symbols that help define how many times a character can be present in the input we want to validate.

* This symbol means that the character preceding it must be in the input zero or more times. For example, when applying the regex a*, the strings "cat" and "dog" would both be vaildated.

+ This symbol means that the character preceding it must be in the input one or more times. For example, when applying the regex a+, the strings "cat" would be validated but not the string "dog".

{3} When added behing a character, this means that the character is expected to be found exactly 3 times in the input. For example, when applying the regex a{3}, the string "paragraph" would be validated but the string "parallel" would not.

{2,4} When added behing a character, this means that the character is expected to be found between 2 and 4 times (2 and 4 included) in the input. For example, when applying the regex a{2,4}, the string "paragraph" and the string "parallel" would both be validated.

{2,} When added behing a character, this means that the character is expected to be found between 2 or more times in the input. For example, when applying the regex a{2,}, the strings "paragraph", "wagamama" and "mammal" would all be validated.

? This means that the preceeding character is optional. For example, when applying the regex plurals?, the strings "plural" and "plurals" would both be validated as the "s" is optional but accepted when present.

Grouping Constructs

Grouping constructs help define a subexpression in a regex and these help validate a substring of the input string. They can be used to match a subexpression or apply a quantifier to a subexpression which containes several character criteria.

(...) These brackets define a group. The content of the brackets will define the regex to be applied to that part of the input string. It can be used to either apply different subexpressions to different parts of an input string, or to group character criteria and quantifiers to a substring. For example, when validating input string of type xxxxx-xxxxxx, the structure (...)-(...) would apply the subexpression in the first set of brackets to the first part of the input string located before the "-", and apply the subexpression in the second set of brackets to the part of the input string located after the "-".

Some of the possible structures inside a group are:

(a|b) this means the substring will be validated if it matches criteria a or criteria b

(?: x) this means that this is a non capturing part of the regex. The input is compared to this criteria x the regex and returns a matching/not matching status without keeping a record of what was actually compared. It allows the processing of the regex to be faster.

Bracket Expressions

Bracket expressions use square brackets [] and are used to define a range of characters.

[abc] this means the range to be used for comparison is this range of 3 letters, a or b or c only.

[a-z] this means the the range to be used for comparison is letters, all letters from a to z in lower case.

[A-Z] this means the the range to be used for comparison is letters from a to z in upper case.

[0-9] this means the the range to be used for comparison is numbers, all numbers from 0 to 9.

[a-z0-9] this means the the range to be used for comparison is letters from a to z in lower case, or numbers from 0 to 9.

Character Classes

Character classes help differentiate the different types of characters which could be used, letters, numbers of special characters.

[a-z] this will compare the input string to the character range indicated, here letters in lower case. Other examples can be [A-Z] for letters in upper case. [0-9] for numbers,

[^a-c] this will compare the input string to the characters not indicated in the range. For example, this regex will compare the input string to characters except the letters a, b and c.

[.] this will compare the input string to the dot character.

Additional character ranges shortcuts are:

. this will compare the input string to any character, except the line terminators \n or \r.

\d this is the equivalent to [0-9] and compares the input string to any number.

\D this is the equivalent to [^0-9] and compares the input string to anything which is not a number.

\w this is the equivalent to [a-zA-Z0-9_] and compares the input string to any alphanumeric character from the latin alphabet (including the underscore).

\W this is the equivalent to [^a-zA-Z0-9_] and compares the input string to anything which is not an alphanumeric character from the latin alphabet (including the underscore).

\s this will compare the input string to a single white space.

\S this will compare the input string to a character other than a white space.

Flags

There are 6 flags in javascript's regex: i, g, m, s, u, y.

i this flag makes the search case-sensitive.

g this flag makes the search look for all matches, not just the first match.

m this flag is useful for muultiline mode.

s this flag allows a dot to match newline character \n.

u this flag allows for correct processing of surrogate pairs.

y this flag allows to search at exact position in the input.

Character Escapes

Escaping helps treat the next characters differently, especially the ones that have a special meaning in a regex already.

\ this helps escape the following character

\Q this helps begin a literal sequence

\E this helps end a literal sequence

4 Regex explained

Regex syntax

A regex can be created with one of these two syntax type:

  • the "long" syntax : regex = new RegExp("pattern", "flags");
  • the "short" syntax: regex = /pattern/; or regex = /pattern/flags; The short syntax is used when the regex is static, while the long version can be used if part of the regex needs to be defined using template literals (${} in template strings).

The examples explained below are built using the short syntax, this means they are enclosed in the /.../ syntax. These are examples of a possible syntax, but there isn't just one way to write these regex.

Matching an email

The regex to use to check if an input matches an email is: /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Below is a diagram explaining each part of the structure: image

Matching a URL

The regex to use to check if an input matches a URL is: /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

Below is a diabram explaining each part of the structure: image

Matching an HTML tag

The regex to use to check if an input matches an HTML tag is: /^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

Below is a diagram explaining each part of the structure: image

Matching a Hex value

The regex to use to check if an input matches a hex colour value is: /^#?([a-f0-9]{6}|[a-f0-9]{3})$/

Below is a diagram explaining each part of the structure: image

More on Regex

For more information and practice on how regex are built, here a few useful links:

  • regex cheat sheet
  • regex explained by MDN
  • try to build regex with this website's interactive activities: regexone

Contact me

You can see more of my work on my github page. If you have any questions, you can contact me by email using the link below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment