Skip to content

Instantly share code, notes, and snippets.

@BeKind-Rewind
Last active September 21, 2022 23:35
Show Gist options
  • Save BeKind-Rewind/a151060660599ad59d9f87b34e64da7e to your computer and use it in GitHub Desktop.
Save BeKind-Rewind/a151060660599ad59d9f87b34e64da7e to your computer and use it in GitHub Desktop.
Matching Email Regex Tutorial

Regex Tutorial for Matching Email

In this tutorial, you will learn about regular expression syntax to extract or match specifically emails out of text data. Regular expressions are a way to describe patterns in a string data.

Summary

The regular expression, or "regex", for matching emails can take on various forms. It includes meta characters which guide the match process. This expression displayed below will cover a vast majority of email scenarios. Regular expressions (regex or regexp) are extremely useful in extracting information from any text by searching for one or more matches of a specific search pattern (i.e. a specific sequence of ASCII or unicode characters). Matching an Email:

`/^([A-Za-z0-9_\.-]+)@([\dA-Za-z\.-]+)\.([A-Za-z\.]{2,6})(\.[A-Za-z]{2,6})?$/` 

First, looking at the code provided, we'll break down the bits and discuss the responsibility of each.

The slashes /...../ tell JavaScript that we are writing a regular expression. The regex begins with the starting anchor ^ and is followed by 4 segments inside of () which are separated by literal characters @ and .. The 4th () segment is followed by ? which indicates the 4th segment can be matched optionally. And finally, the expression is ended with the $ anchor.

An easier way to visualize this is to start with an email, for example:

  • -----1---- @ ----2----- . -----3---- --------------------4--------------------
  • (yourname)``@``(domain)``.``(suffix)``(.optional suffix that starts with a dot)

Where we are expecting each section to contain:

  1. any letter, upper and/or lower case, any number, underscore, period, and/or hyphen -- any length
  2. any letter, upper and/or lower case, any number, period, and/or hyphen -- any length
  3. any letter, upper and/or lower case, and/or period -- length of 2 to 6 characters
  4. any letter, starting with a dot -- length of 2 to 6 characters and is only included optionally

Writing out the expectations of each segment in the search helps you understand what the code is doing here, but also, how you can customize it to fit your needs in the future.

Table of Contents

Regex Components

Anchors

Achnors match the position before or after characters but not any characters directly. ^ - marks matches at the beginning of a line $ - marks matches at the end of a line \b - sets word boundaries before the first letter and/or after the last letter (see boundaries below)

Quantifiers

Quantifiers modify the meta characters to look for how many of a type should be affected or matched in a row. Some basic quantifiers are:

  • * - "0 or less"
  • + - "1 or more"
  • ? - "0 or 1 optionally included"
  • {min,max}

The simplest quantifier is a number in curly braces: {n}. This indicates how many of a character or group () we want in succession to match. For example: [12345], [23415], and [11211]can all be matched by using /\d{5}/ which represents "any digit 0-9 with 5 in a row"

OR Operator

| or []

In this instance, that which is inside the brackets [] is approached as stringing an OR operator. For example: [abc] would include in the matching search "a or b or c". In this case, section 1 (yourname) ([A-Za-z0-9_\.-]+) is searching for any uppercase character A-Z OR any lowercase character a-z OR any digit 0-9 OR underscore_ OR period \. OR hyphen to be included.

Alternatively, instead of plucking the individual characters or range of characters, we can use () with | to look for specific groupings. For example, in the 3rd section (domain) we could choose (net|com) to search for "net" OR "com" and NOT "n or e or t".

Character Classes

Anything within the brackets [] is considered a character class; matches any one of the enclosed characters. In basic syntax, it can translate to matching one OR MORE literal characters inside. For example: [abc] would include in the matching search "a or b or c." Inside the brackets, some meta charachters' roles can be changed or reversed to literal with an "escape" \, such as the \., which would mean to include the "literal character '.'" in the search. Otherwise, the meta character . would refer to "any character."

Flags

The Slashes /.../ tell JavaScript we are creating a regular expression. They act in the same manner as quotes for strings.

In any regular expression, we can use the following flags:

  • g: matches the pattern multiple times
  • i: makes the regex case insensitive
  • m: enables multi-line mode. Where ^ and $ match the start and end of the entire string. Without this, multi-line strings match the beginning and end of each line.
  • u: enables support for unicode
  • s: short for single line, it causes the . to also match new line characters

Flags may also be combined in a single regular expression & the flag order doesn’t matter. They are added at the end of the string in regex literals.

Bracket Expressions

Anything within the brackets [] is considered a character class. In basic syntax, it can translate to matching any characters inside. For example: [abc] would include in the matching search "a and/or b and/or c." Inside the brackets, some meta charachters' roles can be changed or reversed to literal, such as the ".", which would mean to include the "literal character . in the search. Otherwise, the meta character . would refer to "any character."

You can specify a range of characters by using a hyphen [a-z] meaning "all lowercase chars between and including a and z", but if the hyphen appears as the first or last character enclosed in the square brackets, it is taken as a literal hyphen to be included in the character class as a normal character.

Additionally, using metacharacter ^ would nullify the contents of the bracket and search for everything ELSE. For example: [abcdefg] would match each of the chars in "nuts", starting with the first one, "n", by default. You can either use settings in your code editor to choose to match all or a flag to match all to match "nuts". However, if "donuts" it would match "o" first and "onuts" if flagged for all.

Boundaries

\b It matches a position that is called a “word boundary”. The word boundary match is zero-length.

Setting boundaries can be helpful in finding whole words or groupings that could otherwise exist within a larger string of characters.

It has 3 positions:

  1. Before the first character in a string if the first character is a word character.
  2. After the last character in a string if the last character is a word character.
  3. Between two characters in a string if one is a word character and the other is not.

Author

Amanda Perry, Full-Stack Developer, making Regex tutorials. Contact me with questions and comments: challenge641@gmail.com GitHub: BeKind-Rewind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment