Skip to content

Instantly share code, notes, and snippets.

@fdeaquino
Last active December 8, 2022 17:50
Show Gist options
  • Save fdeaquino/86551e7293936454821abe072fb1b79b to your computer and use it in GitHub Desktop.
Save fdeaquino/86551e7293936454821abe072fb1b79b to your computer and use it in GitHub Desktop.
A regex tutorial for matching an email

Regex Tutorial: Matching an Email

In simple terms, Regular Expressions (regex) are search patterns used by developers to find information from a string of text. Developers may use regex for input validation, to find patterns of characters, or to find and replace a sequence of characters. Some developers also use regex to extract content that meets their search patterns.

Summary

This tutorial covers the regex /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/, which will match emails with this particular pattern. The tutorial will explain the syntax and components of the regex, and includes examples of emails that match the requirements.

Table of Contents

Regex Components

If you want to understand regex, you've got to start with the syntax. The regex components applicable to our example include anchors, quantifiers, bracket expressions, character classes, flags as well as grouping constructs such as capturing groups. In addition to the components previously mentioned, our example's syntax includes forward slash / characters at the start and end of the expression. This is called literal notation. A regular expression will almost always start and end with the forward slash characters. Exceptions include expressions that use flags and RegExp constructors. RegExp constructors are not discussed in this tutorial.

Anchors

Anchors are used to match a position instead of any character and are used at the beginning and end of searches. Anchors can be thought of as parameters. Some of the most common anchors are ^ and $.

  • The ^ anchor is used to find matches with the characters that follow it.
  • The $ anchor is used to find matches with the characters that come before it.

In our example /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/, we can see the ^ near the start and the $ near the end of the expression.

  • This ^([a-z0-9_\.-]+) means the expression will match any text that contains that pattern at the START.
  • This ([a-z\.]{2,6})$ means the expression will match any text that contains the pattern at the END.
  • Note: Patterns that follow and precede the anchors will be explained later in the tutorial.

Quantifiers

Quantifiers will set the limits on the number of characters that we want our regex to match. In other words, they indicate that the preceding character must be matched a certain number of times. There are many different quantifiers, however the ones in our example are curly brackets { } and the plus sign +.

In our example /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/, we see the plus sign twice, both times before the closing paranthesis. The curly brackets are found toward the end of the expression.

  • Inside these parenthesis ([a-z0-9_\.-]+) we see a plus sign. The + means that we want to match the preceding characters 1 or more times.
    • Here we are looking for a string at least 1 character long that must match those defined in the bracket expressions [].
  • Inside these parenthesis ([a-z\.]{2,6}) we see a pair of curly brackets with two values between them. These two values will set the minimum and maximum number of times the preceding characters in the set must match.
    • Here we are looking for a string between 2 and 6 characters long that match those defined between the bracket expressions [].
    • Matches: "gov", "com", "uk" -Not a match: "network"

Bracket Expressions

Bracket expressions contain characters inside of square brackets []. Bracket expressions can be positive character groups or negative character groups, and outline the characters we want to include or exclude.

  • Positive character groups, such as [0-9], match strings that contain any digit.
  • Negative character groups are written similarly to Positive character groups, but they contain the ^ symbol. Negative character groups, such as [^A-Z], will match strings without capital letters.

In our example /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/, we see three bracket expressions - all of which are positive character groups. They contain characters we do want to include.

  • [a-z0-9_\.-] means we want to match strings containing any lowercase letters, any digits 0-9, an underscore, a period, and a hyphen.
  • [\da-z\.-] means we want to match strings containing any digits 0-9, any lowercase letters, a period, and a hyphen.
    • Note: \d is a predefined Character Class discussed in the next section.
  • [a-z\.] means we want to match strings containing any lowercase letters and a period.

Character Classes

Character classes define a set of characters that we want to match (or not) in a string. The bracket expressions mentioned above, including positive and negative character groups, are considered character classes. Character classes can also be predefined such as ., \w, \W, \d, \D, and \s, among others.

In our example /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/, we see the character class \d in the middle of the expression. This character class matches any digit character and means the same as [0-9].

Flags

Earlier, we stated that regular expressions must be wrapped in forward slashes / and that there were a couple of exceptions. One of those exceptions are flags. Flags follow the closing forward slash, and can be one or a mix of the following: i, g, m, u, y, and s. This tutorial won't cover all of them in detail, but the flags gim deserve a special mention.

  • g means global search
  • i means ignore case
  • m means multiline

Our example /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/, DOES NOT include any flags. IF we were to add the gim flags following the closing forward slash, our regex would:

  • retain the index of the last match, allowing iterative searches → g
  • make the whole expression case insensitive → i
  • match the start and end of a line instead of the start and end of the whole string → m

Grouping and Capturing

Grouping allows developers to increase the complexity of the regex by checking multiple parts of a string to determine that different sections meet different requirements. We can group a sequence of characters by placing them between opening and closing parenthesis (). Groups can be differentiated into capturing and non-capturing groups. The details of capturing and non-capturing groups won't be discussed in this tutorial, but it is important to note that capturing groups capture the matched character sequences for possible re-use.

In our example /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/, we see three capturing groups; ([a-z0-9_\.-]+), ([\da-z\.-]+), and ([a-z\.]{2,6}).

  • ([a-z0-9_\.-]+) is a capturing group that will match a string that includes any combination of lowercase letters between a-z, any number between 0-9, and the special characters of an underscore, a period, and a hyphen. Additionally, this group contains the quantifier + which will match 1 or more of the preceding characters.
  • ([\da-z\.-]+) This capturing group is very similar to the one mentioned above. The key differences are the replacement of 0-9 with \d (which convey the same meaning), and the exclusion of the underscore _ special character.
  • ([a-z\.]{2,6}) is a capturing group that will match a string that includes a combination of lowercase letters and a period. This group is different from the ones above because the quantifier {2,6} indicates the string must match the character set a minimum of 2 times and a maximum of 6 times.

Regex Explanation and Examples

Putting it all together, our regex /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/ will match strings that:

  • contain one or more lowercase letters a-z, digits 0-9, and special characters _, ., and - in the first group before the @ symbol
  • contain one or more digits 0-9, lowercase letters a-z, and the characters . and - in the second group between the @ symbol and the .
  • contain a minimum of 2 and maximum of 6 lowercase letters and . in the last group after the .

Here are some example emails that match our regex:

  • i_love_dogs@myemail.com
  • grumpy-cat@cats.com
  • national.parks@nature.gov
  • reuse.reduce_recycle-2022@gogreen.org

The following example email do not match:

  • I_LOVE_DOGS@myemail.com
  • GruMPy-caT@cats.com
  • nat!onal.parks@Nature.gov
  • reu$e.Reduce~recycle-2022@goGreen.network

Author

Thank you for using my regex tutorial! My name is Fidel Deaquino. I'm a junior full stack web developer diving into new technologies to expand and strengthen my skills. If you notice any mistakes in my tutorial, please feel free send your feedback via email. Visit my GitHub profile to view my recent projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment