Skip to content

Instantly share code, notes, and snippets.

@chris6661
Last active April 21, 2021 02:50
Show Gist options
  • Save chris6661/607d8d6e76ce93b4f7c4450419b6c750 to your computer and use it in GitHub Desktop.
Save chris6661/607d8d6e76ce93b4f7c4450419b6c750 to your computer and use it in GitHub Desktop.
Email Validation Regex Tutorial.md

Email Validation Regex Tutorial

Summary

The following is a regex tutorial on regex validation for an email address for use in JavaScript. We will begin with a sample code of regex for email validation, first by explaining what a regex is and then breaking down the code that each part is used for though this will not be either exhaustive or all inclusive for regex validating an email address. But first, what is a regex?

A regex is short for regular expression (typically shortened to regex or regexp, though it has also been referred to as a rational expression) and is s equence of characters that allows you to create patterns that help match, locate, and manage text and in this case we will show how it is used for matching an email address.

Table of Contents

Regex Components

Email regex components are as follows: /^([a-zA-Z0-9_.-]+)@([\da-zA-Z.-]+).([a-zA-Z.]{2,})$/. What does each section mean?

    • ([a-zA-Z0-9_.-]+) (NAME) is the section of the code that will verify each character will be an upper or lower case letter, number from 0-9, or special character sown in the code. The '+' symbol at the end of the bracket denotes there can be from 1 to an infinite number of characters in the section.
    • @([\da-zA-Z.-]+) (DOMAIN) is the section of that does the same as the preivous section except it is for validating a domain name for an email address and that it is valid.
    • ([a-zA-Z.]{2,}) (EXTENSION) is the section for validating the email address extension, uually .com or .org. An email address is able to end with 2 characters, as in the case of a '.co' email address, has the option for adding a second extension such as '.uk' after the first extension, and must be minimum two characters particularly for differnet country codes with some examples being '.us' for the USA, '.ca' for Canada, '.de' for Germany, etc.

Anchors

The anchors included with an email vlaidation regex are the ^ and the $ shown at ehe beginning and end of the code shown here:

  • /^([a-zA-Z0-9_.-]+)@([\da-zA-Z.-]+).([a-zA-Z.]{2,6})$/

The ^ signifies the regex will match with the code group in the first set of parantheses that the caret precedes meaning the information provided before the @ symbol appears in the rest of the code string preventing a user from simply submitting an '@(DOMAIN).com email address without the first required part of the email address, the NAME section.

The $ symbol signifies the string regex will match ends with the DOMAIN the dollar sign follows.

Examples:

  1. Not matched: @yahoo.com; there is no NAME.

  2. Not matched: chris@email.c; the extension is less than two letters, the minimum required for the extension.

  3. Matched: chris@email.co.uk; this email address has a name, domain name, extnesion, and an acceptable country code extension.

Quantifiers

The quantifiers in this regex are the + and the {2,6} in the expression below:

  • /^([a-zA-Z0-9_.-]+)@([\da-zA-Z.-]+).([a-zA-Z.]{2,6})$/ Quantifiers indicate the amount of characters the regex will match.

The + quantifier will match 1 or more characters. The * quantifier will match 0 or more characters. The ? quantifier will match 0 or 1 characters. The {n} quantifier will match n characters. The {a,b} quantifier will match an expression that is at least a characters, but no more than b characters.

The quantifier often follows a character class or bracket expression that defines what characters the regex will match, as in the example below from the regex above:

  • [a-zA-Z0-9_.-]+ In this example, ([a-zA-Z.]{2,6}), a matching expression will be anywhere from 2 to 6 characters in length composed of characters as defined by the bracket expression that precedes the {} quantifier.

The characters within the braces define what the regex will match, while the quantifier indicates how many.

OR Operator

| acts like a boolean OR. It will match the expression before or after the | and can operate within a group or on a whole expression. The patterns will be tested in order just as java will match either set of characters.

Character Classes

The character class in regex below is the \d within the second set of braces.

  • /^([a-zA-Z0-9_.-]+)@([\da-zA-Z.-]+).([a-zA-Z.]{2,6})$/

The \d character class matches a single digit. The \w character class matches a single alphanumeric character or underscore. The \s character class matches a single white space such as a space or tab.

Capitalizing the letter will negate the character class, so case is important when using these in your regex. For example, \D would match any character that is NOT a digit.

Flags

Flags in regex are used for advanced searching. In this example we are using a multi-line flag that is shown by the use of ^ in the begining and $ at the end of the regex. This allows the expression to be on multiple lines without breaking the code.

Grouping and Capturing

Paranteses are used for grouping in the regex seen below. In this regex, grouping is used to seperate meta characters from literal characters; grouping and capturing can also be used to isolate part of a string to back reference or replace a part of the string.

  • /^([a-zA-Z0-9_.-]+)@([\da-zA-Z.-]+).([a-zA-Z.]{2,6})$/

  • ([a-zA-Z0-9_.-]+) and ([\da-zA-Z.-]+) and ([a-zA-Z.]{2,6}) are all groups found in the regex above; the groups are for the NAME, DOMAIN, and EXTENSION of the given email address.

Bracket Expressions

Bracket expressions are used to define what caracters will be matched with the regex, an example from the regex used in this tutorial so far being:

  • [a-zA-Z0-9_.-]

The characters that will match this bracket include the letters a-z, A-Z, the numbers 0-9, an underscore, a period, and a dash.

Boundaries

Boundaires will not really be needed for email validation since the '@' symbol will already act as a boundary for a given email address and the NAME, DOMAIN, and EXTENSION (see above) will act as theor own words with te boundaires being in front of and after the '@' symbol, particularly since the email address would need to be found as a whole.

Back-references

Back references for email vlaidation will not really be necessary for email validation in my opinion, simply because they are usually more matching an HTML element or referring to matching text between a tag in an HTML document.

References

Author

I am currently a student attending the UTA Full Stack Web Devleopment Coding Boot Camp. Find me on GitHub! https://github.com/chris6661

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment