Skip to content

Instantly share code, notes, and snippets.

@BradsW90
Last active August 21, 2022 18:29
Show Gist options
  • Save BradsW90/e0cab5bd3bbbea34c989030749dfc27e to your computer and use it in GitHub Desktop.
Save BradsW90/e0cab5bd3bbbea34c989030749dfc27e to your computer and use it in GitHub Desktop.

REGEX Tutorial

This tutorial is designed and broke down to hopefully simplify and better solidify ones understanding to using regular expressions efficiently.

Summary

The example I'm going to use for this tutorial is a Email validator the following is the Regex we will be looking at /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Table of Contents

Regex Components

In the code snippet in the summary we are going to break down the different components used to make up the expression and cover them more in detail down below.

  • / REGEX EXPRESSION /

  • ^ string condition here $

  • ( Group )

  • [ Character Set ]

  • \

  • + {Range}

  • \d

Anchors

In the Regex Expression in the summary you will see that most of the expression is wrapped in ^ $. Together these are dictating that any string that matches the condition in between these characters should return true or become highlighted. These characters can also be used seperately in a similar fashion. ^ If this character is used by itself then any condition that follows will check to see if the beginning of any string matches the condition. The opposite can also be said about the $. If used by itself at the end of a string then the condition preceding will check to see if the end of any string will match the condition.

There are 2 other anchors not convered in the above Expression and those are \b and \B. If you put a character infront of \b then you will search for strings containing that letter at the end of words. If you used \B instead then you will search for strings containing said character at the beginning of each word.

Quantifiers

In the Regex Expression in the summary you will find + sprinkled around and {} used once in the expression. These character(s) set the condition for how many characters in a string should match. + Tells the condition preceding it that one or more of the string should match said condition. {} This character sets a custom amount for how much of the string should match. If used in this way {2,6} this says the string should match 2 to 6 characters of the preceding condition. if used like {3} this states that exactly three characters should match the string. and if used {4,} this states the string should contain 4 or more of the preceding condition.

There are 2 other Quantifiers avaliable to use outside of the above example. * and ?. If used as such a* this searches for strings containing 0 or more a's, The ? is similar but the main difference is instead of searching for more then 0 it searches for 0 or one that matches.

OR Operator

The | symbol acts as the or operater. Lets review the following example ab(c|d|e) in the following expression its searching for a string that will match ab and c or d or e, any string that matches any of the characters will be highlighted

Character Classes

Using [] sets a Character set/group and can be used in many different ways, but first we will refer back to the above Example Expression. In that expression you have 2 sets of Character Sets [a-z0-9_\.-] and [a-z0-9\.-]. These character sets cover the brunt of what these Character Sets do. Lets break down Group One that group states that its looking for any character a-z and any number 0-9, _ as well as . or -.

Another Character class that was used in the above example was \d this has the same effect as if you used [0-9]. If you used \D instead you search would find strings that did not include any numbers. Other variations of this are \w, \W, \s, \S. Lowercase of all includes while uppercase excludes. w/W is Alphabet characters and s/S is whitespace.

Using the [] there is another variation that was not used in the main example. [^123] This statement says the string should'nt contain any of the following characters.

The last Character Class not covered in the above example is . When used it matched any character besides line breaks.

Flags

There are several flags that can be used in Regular Expressions none where used in the Main Example.

  • i This flag indicates the search should be case in-sensitive.

  • g This flag indicates the search should return all matches

  • m This flag has to be used with ^ $ and searches the entire line and returns anything that matches.

  • s This flag enables dotall which dictates that . can be treated as a new line character.

  • u This flag enables Unicode support

  • y When used it allows you to match at a particular index of a string.

Grouping and Capturing

The main example has 3 capture groups in it, they are what is in bewteen (). so lets review the example: /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/ in this example ([a-z0-9_\.-]+) this unnamed group is group one, ([\da-z\.-]+) this unnamed group is group 2, and ([a-z\.]{2,6}) this unnamed group is group 3. Now using these in this way has captured or (created a reference to everything inbetween the ()). If your ever needing to reference back to unnamed capture groups you can us \ followed by the corresponding group number. another way to do a capture group is by naming them and to do so is the following, (?<user-email>[a-z0-9_\.-]+) in this expression i have turned group one into user-email and to reference that group now we use the following: \k<user-email>.

There is a way to group characters together with out creating a capture group (?:dog) this expression groups d o g together with out labeling them with a name or group.

Look-ahead and Look-behind

Look-ahead and look behind have 2 ways of working both positive and negitave, positive means that the search is looking for a specific character after the beginning or end of a string, negitave is the opposite it does a search that looks for a string that doesnt contain a specific character after the beginning or end of said string.

Example of look-ahead: Lets say your example string is 'house' if you use h(?=o) then it would select your string. On the other hand if you use h(?!o) it will not select your string because its looking for a string that doesnt contain o as the second character.

Example of look-behind: We are going to use the same example string 'house'. If you use (?<=s)e then it would select the house string because s comes before the e, however user (?<!s)e will not select the house string because its looking for a string that does not have a s before an e

Author

Bradley Woodle (Full-Stack Developer)

GitHub:Bradsw90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment