Skip to content

Instantly share code, notes, and snippets.

@amymgardiner
Last active September 12, 2022 03:36
Show Gist options
  • Save amymgardiner/237c35332648bec7b2a3a68d7f616c91 to your computer and use it in GitHub Desktop.
Save amymgardiner/237c35332648bec7b2a3a68d7f616c91 to your computer and use it in GitHub Desktop.
Module 17 Challenge - Computer Science for JavaScript

Regex Tutorial - Matching an Email

This regex, which is short for regular expression, tutorial will explain what the different parts of a specific regex does. This tutorial is a challenge project for a bootcamp course I am taking, and the specific regex was given as part of that challenge.

Summary

A regex is a sequence of characters (literal and meta) that defines a search pattern, which can be used by developers. In this tutorial, I will break down each part of the expression below and describe what it does.

In simple terms, this complicated looking expression looks for the four parts of an email address which include the username, an @ symbol, email server name, a dot, and the domain.

Matching an Email Regex:

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

Table of Contents

Regex Components

Anchors

Anchors have special meaning in regular expressions. They do not match any character. Instead, they match a position before or after characters.

In this regex, we see the following:
^ – The caret anchor matches the beginning of the text.
$ – The dollar anchor matches the end of the text.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

Quantifiers

Quantifiers indicate numbers of characters or expressions to match.

In this regex, we see the following:
+ – Matches the preceding item 1 or more times.
{2,6} – Matches at least 2 and at most 6 occurrences of the preceding item.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

Character Classes

Character classes distinguish kinds of characters such as distinguishing between letters and digits.

In this regex, we see the following:
\d – Matches a single character that is a digit between 0 and 9.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

Grouping and Capturing

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses.

That has two effects:
It allows to get a part of the match as a separate item in the result array.
If we put a quantifier after the parentheses, it applies to the parentheses as a whole.

In this regex, we see the following groups:
([a-z0-9_\.-]+) – Username
([\da-z\.-]+) – Email Server Name
([a-z\.]{2,6}) – Domain

Bracket Expressions

A bracket expression (an expression enclosed in square brackets) shall match any character in the set.

In this regex, we see the following:
[a-z0-9_\.-] – Matches a character in the range of "a-z", a character in the range of "0-9", a "_" character, a "." character (a period or dot typically matches any single character which is why the backslash is before the dot to escape that special character and read it as a literal dot), and finally a "-" character.
[\da-z\.-] – Matches any digit character (as seen previously), a character in the "a-z" range, a "." character, and a "-" character.
[a-z\.] – Matches a character in the "a-z" range and a "." character.

Greedy and Lazy Match

The quantifiers "*", "+", and "{}" are greedy operators, so they expand the match as far as they can through the provided text. The lazy mode of quantifiers is an opposite to the greedy mode. It means repeat minimal number of times, and is enabled by putting a question mark after the quantifier.

In this regex:
The + quantifier matches any character one or more times within the bracket expression. So it can find any username or email server name no matter how many letters it uses (all lower case in the case of this regex), numbers it uses, and if it has other characters like "_", ".", or "-" used any number of times.

The {} quantifier, in the case of this regex, will look for any letters or the "." character used 2-6 times. Like the word "com" using a character "a-z" three times. Most domains aren't longer than six characters but are typically longer than two, which is why that is the quantifier.

Extra Explanations

-The forward slash is used twice to show the open, which indicates the start of a regular expression and the close, which indicates the end of a regular expression and the start of expression flags.
/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

-The literal @ symbol in emails is shown here between the username grouping and the email server name grouping, and the dot before the domain is shown here as an escaped character.
/^([a-z0-9_.-]+)@([\da-z.-]+)\.([a-z.]{2,6})$/

Author

Amy Gardiner is currently a student enrolled in Washington University's Coding Boot Camp. She is the manager for a local coffee company in St. Louis, MO and is interested in learning a new skill set and starting down a new career path with coding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment