Skip to content

Instantly share code, notes, and snippets.

@anitachengalva
Last active July 26, 2022 04:14
Show Gist options
  • Save anitachengalva/d9394fd83d1cc1bea74d47e7bae95257 to your computer and use it in GitHub Desktop.
Save anitachengalva/d9394fd83d1cc1bea74d47e7bae95257 to your computer and use it in GitHub Desktop.
Regex Tutorial

🌈 Regex Tutorial

This tutorial provides a brief ouline for what regex is and how it is applied in code. Hopefully this explaination provides a better understanding on the general gist of how these expressions work!

Summary

So, what even is regex? 'Regex' is shorthand for 'regular expression', which are patterns used to match character combinations in strings. The implementation of these are used to create a narrow pattern searches through functions such as FIND or FIND AND REPLACE. Often times, regex is utilized to validate input as well.

An example of a regular expression is this:

/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

Though at first glance this may appear as a string of incoherent, random characters, it is actually an expression utilized to match a Hex Value.

In this tutorial, we will focus specicically on breaking down this regex to understand how exacly it matches a Hex Value.

Provided below is a quick briefing on Hex Values, if a refresher is needed.

The hexadecimal (also known as base 16 or hex) is a positional numeral system that uses 16 distinct characters. The characters utilized are "0"-"9" to represent values 0 to 9, and "A"-"F", or "a"-"f", to represent values 10 to 15.

The hexadecimal has many uses, but in this example we will be focusing on it's applicability as a representation of a color in RGB format. RGB format defines a color by the amount of red, green and blue.

In some cases we can define a hex color code using only three characters versus the typical six-character combination. This is only applicable when the same character is used twice to represent each value in the RGB combination. For example: #9911bb can be rewritten as #91b.

/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

This particular regex allows us to find any hex code, whether it begins with a numeral sign (#) or not, and whether it contains 6 or 3 characters (digits and letters).

Table of Contents

Regex Components

Anchors

Anchors define where the search parameters begin and end. Anything after ^ (caret) and before $ (dollar) are part of the search definition. These are the only two anchors relevant in Javascript.

Examples of Anchors:

  • ^ - matches a string that starts with the indicated word. The indicated word being the characters following ^ within the regex.
  • $ - matches a string that ends with the indicated word. The indicated word being the characters preceding $ within the regex.

In our particular case, both ^ and $ anchors utilized.
^ to see if the string contains the optional #, and $ to ensure it contains the remainder of the necessary characters as defined by the rest of the regular expression.

Quantifiers

Quantifiers are used in regex to determine how many instances a character, group, or character class must be represented in the input to be matched.

Examples of Quanitifers:

  • * - matches a string that is followed by zero or more of the indicated character
  • + - matches a string that is followed by one or more of the indicated character
  • ? - matches a string that is followed by zero or one of the indicated character
  • {n,n} - matches a string that has exactly as many characters as defined by the set

This regex utilizes two types of quantifiers: ? and {n,n}. ? making sure the # appears either zero or one time, and {6} & {3} ensuring the Hex Value is either 6 or 3 characters in length.

Grouping Constructs

Grouping unifies a pattern so that it is matched as a complete block.

Examples of Grouping:

  • () - parentheses creates a capture group, utilized to combine multiple components together
  • (?:) - using ?: allows for multiple components to be combined, without creating a capture group
  • (?<>) - using ?<> allows a name to be put to the group

The regex we are breaking down: /^#?([a-f0-9]{6}|[a-f0-9]{3})$/, is a singular group of multiple components.

Bracket Expressions

A bracket expression is defined by a set of characters enclosed by [ and ] and matches any single character in that list.

Examples of Bracket Expressions:

  • [] - matches any single character enclosed by the brackets
  • [^] - matches a string that excludes any characters enclosed by the brackets

In this regex, bracket expressions are not utilized, but rather character classes (further explained in the next section).

Character Classes

A character class matches a character from a specifically defined set. It is utilized to broaden the search criteria by generating a range to match from.

Examples of Character Classes:

  • \d - matches a single character digit (0-9)
  • \w - matches a word character (any alphanumeric character and underscores)
  • \s - matches a whitespace character (including tabs and newlines)
  • . - matches any character (except a newline)

Something special to note is that using an uppercase letter in place of the lowercase letters above will inverse the match.

  • [] - matches any character within the set (bracket expression)
  • [^] - matches a string that excludes any characters enclosed by the brackets (bracket expression)
  • [A-Z] - matches any character included within a range

In the regex we are analyzing, character classes are utilized to find strings containing letters a-f and numbers 0-9 by creating a range for the search criteria.

The OR Operator

The use of | in regex acts as a Boolean OR. It matches to a choice of either what is defined to the left of |, or to the right.

Example of the OR Operator:

  • (.com|.org) - matches a string containing either .com OR .org

In our regex: /^#?([a-f0-9]{6}|[a-f0-9]{3})$/, the OR Operator is utilized when matching Hex Values either {6} OR {3} characters in length.

Flags

Flags are added to a plain expression to change how it is interpreted. They are implemented following the initial closing forward slash.

Examples of Flags:

  • i - "insensitive" - makes the expression insensitive to case
  • g - "global" - makes the expression global, searches for all occurences versus one occurence
  • m - "multiline" - makes the expression match the start and end of a line, versus of a whole string
  • u - "Unicode" - enables full Unicode support
  • s - "dotall" - enables a dot . (character class) to match any character, including a newline character

The regex provided to match a Hex Value does not utilize any flags, but the global flag g would be useful in matching multiple instances of Hex Values within a search.

Character Escapes

A character escape means utilizing a backslack \ to search for special characters [ ] ( ) { } / . * + ? | ^ $, literally "escaping" them from their typical special functionality in a regex and instead making them a searchable character.

Another important note is that in order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped.

Examples of Character Escapes:

  • \. - matches the dot . character

In our regex, character escapes are not utilized. This is due to the fact that Hex Values do not contain any characters with special functionality, therefore we have no need to include them in the search index.

Conclusion

Regular expressions are an indispensable tool in creating search operations for any programmer or developer.

I hope that by having broken down each individual component that contributes to the search for a Hex Value, you have gained not only the ability to decode the following regex with clarity, but also the ability to go forth with confidence and write your own.

/^#?([a-f0-9]{6}|[a-f0-9]{3})$/

Thank you for reading!

References

Author

Thank you for checking out my tutorial! If you would like to see more of my work, please take a peek at my GitHub and portfolio.

Linkedin LinkedIn   GitHub GitHub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment