Skip to content

Instantly share code, notes, and snippets.

@jsramraj
Forked from Abduler21/Tutorial.md
Created March 8, 2022 14:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jsramraj/ed2076b090cb5bf66af63f305d9aa8f3 to your computer and use it in GitHub Desktop.
Save jsramraj/ed2076b090cb5bf66af63f305d9aa8f3 to your computer and use it in GitHub Desktop.
REGEX TUTORIAL

Title (replace with your title)

Introductory paragraph (replace this with your text)

Summary

Briefly summarize the regex you will be describing and what you will explain. Include a code snippet of the regex. Replace this text with your summary.

Table of Contents

Regex Components

Anchors

Quantifiers

OR Operator

Character Classes

Flags

Grouping and Capturing

Bracket Expressions

Greedy and Lazy Match

Boundaries

Back-references

Look-ahead and Look-behind

Author

A short section about the author with a link to the author's GitHub profile (replace with your information and a link to your profile)

Regular Expression Tutorial

This tutorial is meant to be a foundational reference guide for anyone learning Regular Espressions. By the end of this tutorial you will know what a regular expression is, when to use them, all of their different functionality, as well as some cool tips on how to improve your own regex scripting. Throughout this tutorial we'll be referencing a specific regular expression, breaking down each component and learning about the functionality of each part.

Bonus: You can click here to navitage to an online regex editor so you can practice your regex scripting as you learn!

Summary

Regular expression for email: /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

The goal of each regex is to return a match for a continuous series of characters. The type of characters, number characters and order of characters can all be specified and modified in the regex. The regex we'll be referencing in this tutorial searches for an email address such as "bob@gmail.com" and return a match if the structure of that email matches the criteria of our regular expression.

Table of Contents

Regex Components



Anchors


  • Description:

    Anchors are used for matching characters or a phrase at the beginning and end of a string. RegEx recognizes the end of a string as a series of characters that is terminated by a return.

  • Syntax: The ^ "carrot" references the begining of a line and $ "dollar-sign" references the end of a line.

  • Example:

    I wanted to eat, so I ate a cheeseburger at McDonald's
    
  • Demo:

    Let's say we wanted to find a match for any character located at the begining and end from our example string. In regex, the . "period" simply means any possible character.

    Regex: ^.

    Match:

    Start (single line)

    Explaination:

    The regex ^. is basically saying, "Find an instance of any character . located at the begining of a string ^."


    Regex: .$

    Match:

    end (single line)

    Explaination:

    The regex .$ is basically saying, "Find an instance of any character . located at the end of a string $."



Quantifiers


  • Description:

    Quantifiers are uesd when you want to return a match for a certain number of characters.


  • All RegEx Quantifiers:

    • * 0 or More
    • + 1 or More
    • ? 0 or One
    • {5} Exact Number
    • {5,6} Min and Max range of numbers

  • Example:

    I am very veryy veryyy veryyyy hungry
    

  • Demo:

    Lorem

    Regex: very *

    Match:
    *

    Explaination:

    The regex very * is basically saying, "Find all instances of 'very' that are followed by 0 or more " " space characters."


    Regex: very +

    Match:
    +

    Explaination:

    The regex very + is basically saying, "Find all instances of 'very' that are followed by one or more " " space characters."


    Regex: very?

    Match:
    ?

    Explaination:

    The regex very? is basically saying, "Find all instances of 'ver' that are followed by 0 or 1 "y" characters."


    Regex: very{3}

    Match:
    {3}

    Explaination:

    The regex very{3} is basically saying, "Find all instances of 'ver' that are followed by exactly 3 "y" characters."


    Regex: very{2,4}

    Match:
    {2,4}

    Explaination:

    The regex very{2,4} is basically saying, "Find all instances of 'ver' that are followed by 2-4 "y" characters."



OR Operator


The OR operator is used to find a match for one \_\_ or another. The OR operator is invoked with the `|` "pipe" character.

  • Example String:

    “I like chocolate icecream. I like vanila icecream.”

If we wanted to match the whole string in the example above, we could do so with the following expression:

/i enjoy (chocolate|vanila) icecream./g



Character Classes


Character classes are used to find matches of a specific character set ane are invoked by the `[]` brackets. You can also join multiple character sets together by simply adding the next set imediatly after the previous set.

For example, if we wanted to find all lower-case alpha characters, we could do so with the following expression:

/[a-z]/g

If we wanted to find lower-case alpha characters, upper-case alpha characters, and numeric characters, we could do so with the following expression:

/[a-zA-Z0-9]/g



Flags


Flags in Regex are placed at the end of an expression and they define different criteria for the searching behavior.

- All RegEx Flags:
  • /g Global
    The Global flag returns all matches in the entire file instead of only returning the first instance of the match.

  • /i Case Insensitive
    The Case Insensitive flag returns matches regardless of upper or lowercase alpha characters.

  • /m Multiline
    The Multiline flag is used in conjunction with the ^ and $ anchors. By default, the ^ and $ anchors will only return a result if there is a match in the first line. When the /m flag is added however, the expression will search ALL lines of code for a match.

  • /s Single Line
    The Single Line flag returns matches

  • /u Unicode
    The Unicode flag returns matches

  • /y Sticky
    The Sticky flag returns matches



Grouping and Capturing


Grouping is useful if we want to find a specific character or phrase within another phrase we're searching for. Groups are invoked with the `()` parentheses.

- Example String: > Peter piper picked a patch of pickled peppers

For example, if we wrote /p(i|e|a)/g as our expression, we would match: Screen Shot 2021-11-29 at 10 05 29 AM



Bracket Expressions


Bracket expressions are very similar to character classes in that they are invoked by the same `[]` brackets except they are primarily used for matching specific special characters.

For example, the regex `[.[{()\\+*\]^$|?]` would match



Greedy and Lazy Match


  • Description:

    Lorem

  • Syntax: Lorem

  • Example:

    Lorem
    
  • Demo:

    Lorem



Boundaries


  • Description:

    Boundaries or "word boundaries" are used when we want to match one or more characters of a word but only if it's located at the begining or the end of the word.

  • Syntax: \b references a word boundary and \B references a non-word boundary.

  • Example:

    I wanted to eat, so I ate a cheeseburger at McDonald’s
    
  • Demo:

    Let's say we wanted to find a match for "at" inside of our example string. Depending on where we place the anchor in our expression, we can match different instances of the string we're searching.


    Regex: at\b

    Match:
    at b

    Explaination:

    The regex at\b is basically saying, "Find all instances of 'at' that are followed by a word boundary."


    Regex: at\B

    Match:
    at B copy

    Explaination:

    The regex at\B is basically saying, "Find all instances of 'at' that are NOT followed by a word boundary."


    Regex: \bat

    Match:
    bat

    Explaination:

    The regex \bat is basically saying, "Find all instances of 'at' that are preceded by a word boundary."


    Regex: \Bat

    Match:
    Bat copy

    Explaination:

    The regex \Bat is basically saying, "Find all instances of 'at' that are NOT preceded by a word boundary."



Back-references


  • Description:

    Back-references are used to search for multiple instances of some criteria inside a single string. Back-refernces are invoked with \1

  • Syntax: (criteria-1)\1

  • Example:

    We the People of the the United States, in Order to form a more more perfect Union, establish Justice, insure domestic Tranquility, provide for the the common defence, promote the general Welfare, and and secure the Blessings of Liberty to ourselves and our Posterity, do do ordain and establish this Constitution for the United States of of America.
    
  • Demo:

    Lets say we wanted to find all instances of repreated words in our example. We can use back-referencing to do this.

    Regex: \b(\w+)\s\1\b

    Match:

    back-reference

    Explaination:

    Our regex is basically saying, "Find all instances of a word \w of any length + followed by a space \s that repeats \1. These repeating words must also be inside a word boundary \b."



Look-ahead and Look-behind


  • Description:

    Look-ahead and Look-behind, collectively called “lookaround”, searches for a set of 2 criteria in sequence and returns a match for whatever is in "ahead" or "behind" it depending on the criteria specified in the regex.


  • Look-around example:

    https://www.google.com
    http://www.google.com
    https://www.facebook.com
    http://www.facebook.com
    

Positive Look-behind


  • Description:

    Positive Look-behind searches for 2 criteria in sequence and returns a match for the second criteria, but only if the first criteria is "behind" it.

  • Syntax: (?<=(criteria-1))(criteria-2)

  • Demo:

    Let's say we wanted to write a regex using Positive Look-behind that returns all matches for "google.com" so long as it has "https://www." behind it. We could do this with the following:

    Regex: (?<=(https:\/\/www.))(google.com)

    Match:

    Look-behind positive

    Explaination:

    This expression is basically saying, "Return and match 'google.com' but only if 'https://www.' is behind it." Notice the "google.com" on line 2 is not matched because the string behind it, "http://www.", doesn't match criteria-1 in our regex.


Negative Look-behind


  • Description:

    Negative Look-behind searches the exact same way as Positive Look-behind except it matches and returns the inverse.

  • Syntax: (?<!(criteria-1))(criteria-2)

  • Demo:

    If we use the same example as above except we change our regex to have a Negative Look-behind syntax, (?<=(https:\/\/www.))(google.com), it will match and return all instances of "google.com" that do NOT have "https://www." behind it.

    Regex: (?<!(https:\/\/www.))(google.com)

    Match:

    Look-behind negative

    Explaination:

    This expression will only return the "google.com" on line 2 becasue that's the only instance of "google.com" in our example where "https://www." is not behind it.


Positive Look-ahead


  • Description:

    Positive Look-ahead searches for 2 criteria in sequence and returns a match for the first criteria but only if the second criteria is ahead of it.

  • Syntax: (criteria-1)(?=(criteria-2))

  • Demo:

    Using our example, let's say we wnat to return all instances of "https://www." but only if "google.com" is ahead of it. We could do this with the following:

    Regex: (https:\/\/www.)(?=(google.com))

    Match:

    Look-ahead positive

    Explaination:

    This expression will only return the "https://www." on line 1 becuase that's the only instance in our example where "google.com" is ahead of it.


Negative Look-ahead


  • Description:

    Just like how Negative Look-behind searches the inverse of Positive Look-behind, Negative Look-ahead searches the exact same way as Positive Look-ahead but matches and returns the inverse. Negative Look-ahead searches for 2 criteria in sequence and returns a match for the first criteria EXCEPT if the second criteria is ahead of it.

  • Syntax: (criteria-1)(?!(criteria-2))

  • Demo:

    Regex: (https:\/\/www.)(?!(google.com))

    Match:

    Look-ahead negative

    Explaination:

    This expression will only return the "https://www." on line 3 because that is the only instance of "https://www." in our example where "google.com" is NOT ahead of it.



About the Author


Abdulmelik Ersoy is a coder and web-developer. He started his journey in the world of coding at with the Full-Stack Web Development Bootcamp at Rutgers University. Clayton looks forward to learning more about all asects of web development, sharpening his coding skills and meeting more awesome coders who are just as excited about coding as he is!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment