Skip to content

Instantly share code, notes, and snippets.

@drph4nt0m
Created June 24, 2024 09:16
Show Gist options
  • Save drph4nt0m/ab56dee2292d3926df30a9d7042a26a2 to your computer and use it in GitHub Desktop.
Save drph4nt0m/ab56dee2292d3926df30a9d7042a26a2 to your computer and use it in GitHub Desktop.
Regex Guide

RegEx Instructions

Ever need to make sure some input given by a user follows a certain format? Need to specify exactly what the user can input? Well RegEx (or Regular Expressions) is what you're looking for! That's right! With this simple to understand concept, you (yes you!), can guarantee your desired format for your code!

Table of Contents

Regex Components

Anchors

Denoted using a caret (^) or a dollar ($), anchors are used to ensure strings contain a specific substring at certain positions. The caret, is used to denote an anchor for the start of the string, while the dollar indicates an anchor for the end of the string.

Quantifiers

Operators which determine the number of characters to be allowed in a substring, using +, *, or ?.

  • + = Appears once or more.
  • * = Appears or not.
  • ? = Appears once or none.

Grouping Constructs

A subsection of a Regular Expression in which a substring much match a certain pattern using (\w+), \s, or \W.

  • (\w+) = Match one or more characters in a word.
  • \s = Match whitespace.
  • \W = Match non-letter characters including numbers, spaces, and punctuation.

Bracket Expressions

A group of values to be accepted, listed within square brackets ([]).

  • [abc] = Denotes a, b, or c (only lowercase).

Character Classes

Limitations set to limit input, listed inside bracket expressions. Can be used to in ranges or lists, or in conjunction with the caret (^) or dollar ($) notation.

  • [a-z] = Limits to only lowercase letters from a to z.
  • [A-Z0-9] = Limits to only uppercase letters from A to Z or numbers from 0 to 9.

The OR Operator

Options which catches instances of more than one option, separated by the OR operator, |.

  • HTML|CSS|JavaScript = Assesses for HTML, CSS, or JavaScript.

Flags

Additional options passed to alter the scope of the Regular Expression. Includes 6 types, i, g, m, s, u, y.

  • i = Set to case insensitive, removes need to specify a-z and A-Z.
  • g = Look for all matches, rather than just the first match.
  • m = Multiline mode, allows application on multi-lined expressions, but only applies beginning and end anchors to the very beginning and very end of the string.
  • s = Dotall mode, matches the dot . to a new line \n.
  • u = Install unicode support, allowing for use of surrogate pairs.
  • y = Sticky mode, used to match characters at a specified position.

Character Escapes

Methods to use certain certain expressions as a substitute for expressions which cannot exist in the RegEx syntax, common character escapes including \n, \\, and \t.

  • \n = New line.
  • \\ = Back slash, as \ is often used in escapes.
  • \t = Tab indent.

Example

When given the example RegEx for a URL:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
  • &(https?:\/\/), requiring https:// to appear at the start (due to the anchor) in the URL substring.
  • ? immediately following the https:// expression states that the first section can appear either once or not at all.
  • ([\da-z\.-]+) \d states any number as a metacharacter and a-z states any lowercase letter
  • \.([a-z\.]{2,6}) declares an addition of a dot, . and some lowercase letters between 2 and 6 characters for the domain of the site.
  • ([\/\w \.-]*)*\/?$/ \w states any letter (both cases) or number as a metacharacter, with directories from the main site (after the domain).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment