Skip to content

Instantly share code, notes, and snippets.

@NuclearReid
Last active March 27, 2024 01:58
Show Gist options
  • Save NuclearReid/e05d25418fc3748a375dd585759bbf23 to your computer and use it in GitHub Desktop.
Save NuclearReid/e05d25418fc3748a375dd585759bbf23 to your computer and use it in GitHub Desktop.
NuclearReid's Regex Tutorial

Reid's Regex Tutorial

Does this string mean anything to you? /^(?=.*[a-z])(?=.*[A-Z])(?=.*\W).{8,16}$/ Before I started learning about Regular Expressions, it just looked like a bunch of gibberish to me too!

This is a short tutorial that will hopefully demystify what each part of that expression means. Hopefully by the time you're done reading this tutorial, you'll be able to look at any regex and understand what pattern it's looking for.

note: to practice, click ctr find (windows), or command find (mac) and click on the .* symbol to use, practice, and see what patterns you're targetting with regex.

Summary

Here are a couple quick summaries about each section I'll be going more into:

  • Anchors: Learn how to use anchors like ^ and $ to match the start and end of a line.
  • Quantifiers: Understand how quantifiers like *, +, and ? can specify the number of repetitions of a character or group.
  • Grouping Constructs: Explore how to use parentheses () for creating groups and capturing matches.
  • Bracket Expressions: Discover the power of bracket expressions like [abc] to match any single character within the specified set.
  • Character Classes: Learn about predefined character classes like \d, \w, and \s for matching digits, word characters, and whitespace.
  • The OR Operator: Utilize the | symbol to create alternatives for matching different patterns.
  • Flags: Explore flags like i for case-insensitive matching and g for global matching.
  • Character Escapes: Understand how to escape special characters like . or * using the backslash .

Occasionally, I will reference and break down this regex that can be used for setting up rules for a password /^(?=.*[a-z])(?=.*[A-Z])(?=.*\W).{8,16}$/

(a full explination of the password regex is at the bottom)

Table of Contents

Regex Components

Anchors

Anchors can be used before, after, or between characters

They are used to tell the expression where to look for the pattern. For example, ^ will tell the expression the pattern will be at the start of the string/input and $ will tell the expression to look at the end of the string/input.

Looking at the password regex above, the ^ and $ is basically telling the regex to look for the a string that follows the pattern between those two anchors.

They can be found under 'Assertions' on MDN

Quantifiers

Quantifiers are pretty basic, basically tells the expression how long of a string/input to look for. they are placed between {} and can have a min & max. ie {8, 16}. The min is 8 and the max is 16. This is used at the end of the password regex above. Other ways {} can be used

- { x } checks if the length of the string is exactly x
- { x, } checks if the length of the string is at least x
- { x, y } checks if the length of the string is at least x characters long but not longer than y 

But wait! There's more! Quantifiers can also be used to check how many times a pattern has been done. For example...

- * checks if the pattern has been done 0 or more times
- + checks if the pattern was used at least once
- ? is essentially a boolean, checks if the pattern is done zero or one time

Grouping Constructs

These are used to group parts of the regex together. Kinda like the glue that binds the expression together

  • Used with (), they are basically there to tell the expression what to look for. For example, in the password check example, ([a-z]) is checking to make sure a lowercase letter has been used in the pattern.
    • the text within the () is referred to as a subexpression
    • these subexpressions can be grouped/chained together.
      • using the password example: (?=.*[a-z])(?=.*[A-Z]) is chaining subexpressions checking for both a lowercase letter and an uppercase letter

Bracket Expressions

On a very basic level, [] are used to wrap around what characters you're looking for Some different uses of Bracket Expressions

  • Character matching: used for checking individual characters [abc]. It'll match with any a, b, or c

  • Ranges: look for a range of characters

    • [a-z] looks for any lowercase letter.
    • [0-9] can be used to look for any number
  • Negative Character Group: basically the opposite of ranges. uses ^ to look for anything that is NOT in that range.

    • [^0-9] looks for any character that is NOT a digit
  • Combining Characters: you can put two kinds of checks inside one bracket

    • [a-z0-9] will look for any lower case character and any digit

keep in mind, these bracket expressions are case sensitive. [a-z] is not the same as [A-Z] Because of that reason, it's why both [a-z] & [A-Z] are used in the password regex.

Character Classes

There are a bunch of Character classes that can be used in regex. They are used to define the specific characters you want to match in your regex. Here's a list of a bunch of them, they can be used within the Bracket Expressions

  • Alphanumeric Characters

    • \w for any word character. Shorthand of [a-zA-Z0-9_]
    • \W Matches any non-word character
  • Digits

    • \d Matches any digit character. Shorthand of [0-9]
    • \D Matches any non-digit character
  • Whitespace Characters

    • \s Matches any whitespace characters (space, tab, newline)
    • \s Matches any non-whitespace character
  • Word Boundaries

    • \b Matches a word boundary (the space between a word and a non character)
      • basically ensures it's only matches the word and not the white space around it
    • \B Matches a non-word boundary
  • Some extras

    • [...] Matches any single character within the brackets\
      • [aeiou] will match with any vowel
    • These can be chained
      • [a-z&&[aeiou]] will look for any lowercase vowel

The OR Operator

The OR operator is used to check for multiple alternatives to the pattern. It is used with the | symbol.

It's used in basically the same way as || in JavaScript. for example

  • (horse|cow) will check to see if the expression matches either 'horse' or 'cow'
  • it can also be used to for multiple checks
    • (red|blue|green) will check for either red, blue, or green.
    • It's also NOT case sensitive. So, it'll find Red, blUe, or GreEn

Flags

These are essentially telling the regex the scope for where to look for the pattern. Flags are used outside of the / that is used at the start and end of the expression. These flags can either give the expression more functionality or limit what it's looking for. Some of these flags include:

  • g Global Search: uses the regex to look for the pattern in all possible matches in the string
  • i Case-insensitive Search: Basically just ignores the case of the string /hello/i and /hElLO/i will have the same matches
  • m Multi-Line: changes the behavior of ^ and $ to match the start and end of each line within the input string, not just the start/end of the entire string
  • s dotAll: allows the . to match newline characters as well (newline characters are \n put it in the regex search for visual representation )
  • u Unicode: enables full Unicode support for the regex pattern
  • y Sticky: matches starting at the current position in the target string without consuming characters

Character Escapes

These are used to represent characters with special meanings or to match characters that are difficult to represent directly. the backslash \ is used as an escape character to give special meaning to the character that follows it.

Heres a list of of some common Character Escapes:

  • \n: Represents a newline character
  • \t: Represents a tab character
  • \d: Represents any digit character
  • \w: Represents any word character (alphanumeric characters and underscore)
  • \s: Represents any whitespace character
  • \\: Represents a backslash character itself

The Full Password Check Breakdown:

/^(?=.*[a-z])(?=.*[A-Z])(?=.*\W).{8,16}$/

  • ^ signifies the start of the string
  • ?=.* is there to check that whatever follows the *, there is at least one of those in the pattern is in the pattern
    • ?=.*[a-z] checks there is at least one lowercase letter
    • ?=.*[A-Z] checks there is at least one uppercase letter
    • ?=.*[\W] checks to enusere there is at least one non-word character (ie: !@#$%)
  • {8,16} makes sure theres at least 8-16 characters used in the password
  • $ signifies the end of the string

In the end, this is making sure the password has:

  • at least one lowercase letter
  • at least one uppercase letter
  • at least one non-word character
  • between 8-16 characters long

Author

Thanks for giving this tutorial a read through! I'm a fledgling web developer currently enrolled in a full stack web dev bootcamp through EdX and the university of Sydney. If you have any questions, advice, or just want to chat, I can contacted via my github at GitHub profile. or email me at Reid.backcnmt@gmail.com

I used Microsoft Copilot for help with creating this tutorial

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment