Skip to content

Instantly share code, notes, and snippets.

@brianbixby
Last active May 24, 2022 11:36
Show Gist options
  • Save brianbixby/1bc49ae80dcfb5e4257898217d9af956 to your computer and use it in GitHub Desktop.
Save brianbixby/1bc49ae80dcfb5e4257898217d9af956 to your computer and use it in GitHub Desktop.

Regular Expression Tutorial: Validating an Email Address

Validating user inputs and matching search strings has become increasingly important in today's digital age. One of the most valuable and indecipherable (at least at first) tools for this are Regular expressions, or RegEx for short. Regular expressions are a series of characters that defines a search pattern in a body of text. In this tutorial we will thoroughly decipher the following regular expression which we will call "Matching an Email":

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Summary

This series of characters might look like your cat was sitting on your keyboard, but I can assure you it's much more than random characters. It's actually a search pattern meant for email validation. This string checks to see if a user entered the basic requirements of an email address. Here is a brief explanation of how it works (we’ll explore it in more detail below):

  • The string is divided into 3 groups, group 1 can contain any lowercase letters a-z, numbers 0-9, underscores, backslashes, periods and hyphens. This group has a minimum length of 1 and no maximum length.
  • Group 1 of the string is then followed by a single "@" symbol and group 2 of the string.
  • Group 2 of the string can contain any digits(\d), any lowercase letters a-z, backslashes, periods and hyphens. This group has a minimum length of 1 and no maximum length.
  • Group 2 of the string is followed by a single "." and group 3 of the string.
  • Group 3 can contain any lowercase letters a-z, backslashes, periods and must be between 2 and 6 characters long.

Table of Contents

Regex Components

There are two ways to create a RegEx object: a literal notation and a constructor.

  • The literal notation's parameters are enclosed between slashes and does not use quotation marks.
  • The constructor function's parameters are not enclosed between slashes but do use quotation marks. If we examine the “Matching an Email” RegEx, you'll see that is a literal notation and that the matching pattern is surrounded by slashes:
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Anchors

After our opening slash and before our closing slash we have two anchors "^" and "$" respectively. The caret "^" matches the position before the first character in the string. Similarly, "$" matches right after the last character in the string.

Quantifiers

Quantifiers specify how many instances of a character, group, or character class must be present in the input for a match to be found.

Our “Matching an Email” RegEx has 3 quantifiers, 1 for each group. Group 1 and group 2 have the "+" quantifier, meaning that the length of the matching string must be at least one, with no maximum length.

([a-z0-9_\.-]+)  &&  ([\da-z\.-]+)

Group 3 has the bracket quantifier with 2 numbers inside of a set of curly brackets separated by a comma, this represents a min and maximum length. The matching string of group 3 has a minimum length of 2 and a maximum length of 6.

([a-z\.]{2,6})

Unless specified quantifiers are greedy, meaning don't invite them over for dinner. Just kidding, they match as many occurrences of particular patterns as possible. Appending the "?" character to a quantifier makes it lazy; it causes the regular expression engine to match as few occurrences as possible. In our case, we won't be inviting our greedy quantifiers over for dinner.

Grouping Constructs

In our case of the "Matching an Email" RegEx, we have 3 different parts of the string that are separated by a literal character and each of these groups has specific requirements. The groupings in our RegEx our enclosed by parentheses and our grouping is capturing, but never used. I would explain more, but that is beyond the scope of this specific example.

([a-z0-9_\.-]+) && ([\da-z\.-]+) && ([a-z\.]{2,6})$/

Bracket Expressions

Bracket expressions are used to in RegEx to match a pattern or sequence of characters. Our "Matching an Email" example uses bracketed expressions 3 times, one for each grouping. The hyphen represents a range of characters when it is between 2 alphanumeric letters. It is important to note that inside of bracket notation, the only two special characters are the aforementioned hyphen "-" in that particular use case and the caret "^" If the caret is the first character inside of a bracket expression than it means "anything, but" what the expression evaluates to. There are only two special characters inside of bracket expressions and escaped characters lose their value inside of bracket expressions.

[a-z0-9_\.-] && [\da-z\.-] && [a-z\.]
  • Grouping 1 contains: any lowercase letters a-z, numbers 0-9, underscores, backslashes, periods and a hyphens.
  • Group 2 contains: any digits, any lowercase letters a-z, backslashes, periods and hyphens.
  • Group 3 contains: any lowercase letters a-z, backslashes, periods.

Character Classes

Character classes distinguish types of characters from each other (ex. letters and digits). I already discussed some character classes above in bracket expressions. Here is a small list:

  • \d -> Match any digit 0-9
  • \w -> Match any word: A-Z, a-z, 0-9
  • \s -> Match whitespace: spaces, tabs
  • . -> Match any character
  • '*' -> Match 0 or more of any character aka any number of any characters (wildcard)
  • CAPITAL LETTERS -> means anything not ex. \D means anything that's not a digit between 0-9
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Our example uses the digit "\d" character class in the second grouping.

The OR Operator

Once again, the section at bracket expressions touched on this briefly. Any bracket expression has the OR operator implied. However, there is another way of doing this as well. Wrapping acceptable options inside of grouping aka parentheses and separate your options with vertical line.

(com | org | edu | gov)

The above example shows that the extensions com, org, edu or gov would all be acceptable. However, there are many more valid top level domain names and this is why this version of the OR operator is not used in this particular example.

Flags

A flag is an optional parameter that modifies its behavior of searching and is denoted by a single lowercase letter after the ending slash. There are six flags and none are present in this example, but the two most common are:

  • i (Ignore Casing) -> Makes the expression search case-insensitively.
  • g (Global) -> Makes the expression search for all occurrences.

Character Escapes

The backslash "" in a regex escapes a character that otherwise would be interpreted literally. For example, if you recall from our character classes section "." means match any character, but "." means period. This is a very big difference. In our "Matching an Email" example we see the "." many times, especially inside of bracket expressions. It is important to reiterate once again, there are only two special characters inside of bracket expression and escaped characters lose their value inside of bracket expressions.

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Author

My name is Brian Bixby I am an avid builder with an eye for design and affinity for efficiency. I am a Front-End Developer and enjoy working in challenging and collaborative environments. Please check out my GitHub to see my recent projects or to reach out. My updated contact information is listed on my profile.

GitHub

Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment