Skip to content

Instantly share code, notes, and snippets.

@akayer19
Last active April 21, 2024 20:03
Show Gist options
  • Save akayer19/bf280219a508e172ae2c61098294e5d2 to your computer and use it in GitHub Desktop.
Save akayer19/bf280219a508e172ae2c61098294e5d2 to your computer and use it in GitHub Desktop.

Understanding Email Regular Expressions in Depth

Regular expressions, or regex, are sophisticated tools used to pinpoint specific patterns within text, such as email addresses. They act as advanced search queries, ensuring data like email addresses are correctly formatted on websites. In this tutorial, we'll dissect a particular regex designed for validating email addresses. By unraveling its components and understanding how they interact, you'll gain a thorough comprehension of how regex can be applied in web development tasks. Let's embark on this exploration of a powerful tool together!

Summary

The regex pattern '/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/' is designed to match and validate email addresses. In this tutorial, we'll break down each component of the regex pattern, explaining how it works and what it matches. We'll discuss anchors, character classes, quantifiers, grouping, boundaries and the overall structure of the pattern to deepen understanding of regex usage in email validation.

Table of Contents

Regex Components

Anchors

  • The '^' anchor declares the start of a new string and the '$' anchor declares the end of that string. Together, they ensure that the entire string is matched by the regex pattern, from the beginning to the end, without any additional characters before or after the pattern.
  • For example, in the regex '/^hello$/', the ^ declares that the pattern must start with "hello", and the $ declares that the pattern must end with "hello". So, this regex will only match strings that consist solely of the word "hello".

Quantifiers

The ‘+’ quantifier is used twice in the regex pattern. It is applied to the character classes '[a-z0-9_.-]+' and '[\da-z.-]+'. This quantifier indicates that one or more characters from the preceding character classes should be matched. Additionally, the '{2,6}' quantifier is used in '[a-z.]{2,6}'. This quantifier specifies that the TLD part of the email address should have a length between 2 and 6 characters.

Character Classes

By using character classes, we can create regex patterns that match specific types of characters or combinations of characters within text data. This allows for accurate pattern matching and validation, making character classes a fundamental piece in regular expressions.

  • The '[a-z0-9_.-]+' character class is utilized to match the username part of an email address, permitting lowercase letters (a-z), numbers (0-9), underscores (_), dots (.), and hyphens (-). The (+) quantifier ensures that one or more of these characters are matched, adapting to usernames of varying lengths.

    • 'john_doe123'
    • 'mary.smith'
    • 'jane_doe-123'
  • The '[\da-z.-]+' character class is utilized to match the domain part of an email address, allowing digits (\d), LOWERCASE letters (a-z), dots (.), and hyphens (-). The + quantifier ensures that one or more of these characters are matched, adapting to domains of varying lengths. Note: Uppercase letters are not included and therefore will not match.

    • 'example'
    • 'test_domain'
    • 'my-domain'
  • The '[a-z.]{2,6}' character class is utilized to match the top-level domain (TLD) part of an email address, permitting lowercase letters (a-z) and dots (.). The '{2,6}' quantifier specifies that the TLD should have a length between 2 and 6 characters, aligning with common TLDs such as ".net", ".com", or ".org".

    • '.net'
    • '.com'
    • '.org'

Grouping and Capturing

In the regex pattern '/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/', there are three sets of parentheses used for grouping and capturing:

  • '([a-z0-9_.-]+)': This group captures the username part of the email address. It consists of lowercase letters (a-z), numbers (0-9), underscores (_), dots (.), and hyphens (-). The captured username can be accessed later in the regex pattern or in the program using back-references.
  • '([\da-z.-]+)': This group captures the domain part of the email address. It allows digits (\d), lowercase letters (a-z), dots (.), and hyphens (-). Similar to the username group, the captured domain can be referenced later if needed.
  • '([a-z.]{2,6})': This group captures the top-level domain (TLD) part of the email address. It permits lowercase letters (a-z) and dots (.). The length of the TLD is restricted to between 2 and 6 characters. The captured TLD can be used for validation or further processing.

Bracket Expressions

  • '[a-z0-9_.-]': This character class matches any lowercase letter (a-z), digit (0-9), underscore (_), dot (.), or hyphen (-). It allows for a variety of characters to be matched in the username part of the email address.
  • '[\da-z.-]': This character class matches any digit (\d), lowercase letter (a-z), dot (.), or hyphen (-). It allows for a combination of characters to be matched in the domain part of the email address.
  • '[a-z.]': This character class matches any lowercase letter (a-z) or dot (.). It allows for lowercase letters and dots in the top-level domain (TLD) part of the email address.

By using bracket expressions, you can define specific sets of characters that are valid for different parts of the email address, allowing for accurate pattern matching and validation.

Boundaries

  • '@' is used to separate the username part from the domain part in an email address. It acts as a boundary between these two parts.
  • '.' is used to separate the domain part from the top-level domain (TLD) part in an email address. It also acts as a boundary between these two parts.

Author

Alex Kaye is a web development enthusiast with a background in the military, where he utilizes regular expressions (regex) for writing queries in Microsoft Access and Excel. His journey into web development began as simplifying ways to filter through large amounts of infomation, sparking an interest that has grown over the years. With experience spanning a few years in working with regex patterns, he emphasize the importance of continuous learning and practice for beginners. Outside of web development, Alex is passionate about Python programming and data extraction from various sources, particularly for creating player projections in sports. He stays updated on the latest developments in web development and regex by following diverse developers on Twitter. When facing challenges with regex, Alex believes in thorough research and seeking solutions from the community, leveraging the collective knowledge and experiences of others.

GitHub Profile: GitHub Gist Profile: Gist

@akayer19
Copy link
Author

This is my initial template for a GIST!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment