Skip to content

Instantly share code, notes, and snippets.

@brownj47
Last active August 17, 2022 06:13
Show Gist options
  • Save brownj47/b9197c6033b9b1ecba40fa4359a7e6f9 to your computer and use it in GitHub Desktop.
Save brownj47/b9197c6033b9b1ecba40fa4359a7e6f9 to your computer and use it in GitHub Desktop.
A gist that breaks down how to parse a regex that finds emails

Parsing an Email with Regex

Regular expresssions are an important part of software development. They allow us to search for specififc patterns so that we can create more robust queries.

Summary

In this gist, I will break down how the regex below works. It allows a user to find and match emails.

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Table of Contents

Regex Components

To start off, the regex is wrapped in / to denote that it is a string literal.

Anchors

  • The ^ anchor says that the search target begins with the characters that follow.
  • The $ anchor says that the search target begins with the characters that precede it.

Quantifiers

Our expression has 3 quantifiers:

  • The {2,6} after the last bracket expression, this specifes that the pattern must match the bracket expression between 2 and 6 times. In other words, have 2-6 characters.
  • The + signs after each of the first two bracket expressions indicate that the pattern denoted in the brackeet expressions need to be matched one or more times. This means that the section of the searched string cannot be empty.

Grouping Constructs

  • Following the opening ^ we have 3 subexpressions. These are denoted by pairs of parentheses ().
  • The first two are separated by an @ symbol, and the last two are separated by an escaped . symbol

Bracket Expressions

Our expression has 3 bracket expressions:

  • [a-z0-9_\.-] This expression contains the lowercase letters a to z, numbers 0 to 9, underscores, periods, and dashes.
  • [\da-z\.-] This expression contains the lowercase letters a to z, numbers 0 to 9, periods, and dashes.
  • [a-z\.] This expression contains the lowercase letters a to z and periods.

Character Classes

We have one character class in our expression:

  • The class \d tells us to include all numeric characters or 0-9

Character Escapes

We have one character escape in our expression:

  • The escape \. tells us not to use . with its character class (all characters except newline \n)

Author

Hello, my name is Justus! I am a recent graduate of the University of Washington's Biology program, and I am looking for a way to combine technology and programming skills with medicine/biology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment