Skip to content

Instantly share code, notes, and snippets.

@cgsdesign
Last active January 11, 2021 06:20
Show Gist options
  • Save cgsdesign/d007ba818798c45856ac2edd08602b82 to your computer and use it in GitHub Desktop.
Save cgsdesign/d007ba818798c45856ac2edd08602b82 to your computer and use it in GitHub Desktop.
REJEX email breakdown

Regex Email Formula Breakdown

Summary

Regex formulas are immensely helpful tool for standardizing a sequence of characters. At their base, they define a search pattern and work across most languages. Ensuring user inputs are provided in the correct format, preventing incorrect data from undermining database integrity, and searching a document are just a few of the valuable uses for regular expression.

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

One of the most common and useful regex expressions is the expression above uses to define email address.
Like all regex, this expression can be broken down into parts. This gist will breakdown the code above.

Table of Contents

Regex Components


Anchors

Anchors are the characters ^ and $. They define the beginning and end of of a string. Take a look at them in the email code shown.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

The ^ character lets the computer know that the pattern starts here. It cannot start with "Ambers email is....". The string must only be the email. Likewise the $ character lets the computer know that the expression to be searched must end after the email. Thus the string entered cannot end with "...is her email."

Additional Ex.

  1. ^love$ would match exactly love. "love me", "lovin'" or "in love" would not match because they do not either start or end in the correct places.
  2. ^Eat would match any string that starts with capital Eat ex. "Eat me!"
  3. otion$ would match any string that ended with otion ex. "motion" or "potion"
  4. Only no anchors allow for matches within a body of text. ex. /love/ allows for the love in "I love you" to me a match.

Quantifiers

Quantifiers are characters that specify how many of a character or group of characters must belong in a match. Take a look at them in the email code shown.

([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})

Above the + indicates that the characters must match one or more times. Thus an email of @gmail.com is not valid because [a-z0-9_\.-] does not match even one time.
The {2,6} indicate that the last string of characters must be between two and 6 characters in length. Thus cat@io.com is valid but cat@io.reallylongtext is not a real email address.

Additional Ex. * + ? and {}

  1. (goal!){3,4}$ matches: goal!goal! but not goal!
  2. goal(!!)*$ matches goal plus any number of iterations of !!: goal , goal!! , or goal!!!!
  3. goal(!!)+$ matches goal plus one or more !!: goal!! goal!!!! but not goal
  4. goal(11)?$ matches goal plus zero or one !!: goal, goal!! but not goal!!!!


Character Classes

Character Classes or Character Sets are prebuilt sets of characters that fit a match. \d \w \s .

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

\d above designates that all digits 0-9 are viable characters.

Additional Ex.

  1. \d includes all numbers
  2. \w includes all letters and numbers [a-zA-Z0-9_]
  3. \s includes all spaces [\r\n\t\f\v ]
  4. . includes everything but line terminators

Grouping and Capturing

Capture Groups are delineated by (). They are basically units of one or more characters with defined characteristics that work as a set.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

In the case of the email, we use capture groups to specify the qualities of the text before the @, the text after the @ and the text after the final period. Each of these groups has their own requirements that the characters within them must match. They usually contain bracket expressions and quantifiers.

Bracket Expressions

Brackets are used to delineate what qualifying characters fit or don't fit as a match. They are tied closely to Character Classes but allow for more specificity.

/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/

In the above, the brackets delineate what characters are viable. In the first brackets, all lower case letters a-z, numbers 0-9, and special characters _.- are viable characters. In the second case, all lower case letters a-z, numbers 0-9, and special characters .- are viable. In the last case, only a-z and .

Note: by adding ^ to the brackets like [^0-9] what is allowed becomes inverted. So in the example shown anything EXEPT 0-9 are viable.

Author

I am a front end developer with full stack experience. I enjoy coding, design, and project management. To contact me or see my work, check out my github @ https://github.com/cgsdesign/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment