Skip to content

Instantly share code, notes, and snippets.

Last active August 24, 2021 19:35
Show Gist options
  • Save shelleymcq/5574f9b656d169be9abe486b62863639 to your computer and use it in GitHub Desktop.
Save shelleymcq/5574f9b656d169be9abe486b62863639 to your computer and use it in GitHub Desktop.
A Regular Expression Tutorial

Regex Tutorial

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." -Jamie Zawinski


A regex, or regular expression, defines a sequence or pattern of characters that can be used to search a text.

There are typically two ways to approach dissecting and understanding a new function. You can backwards engineer a complete function and try to break down each part, or you can build it up from scratch, analyzing each step as you go. I took the latter approach in order to see the function in its most simplest steps to inform this tutorial.

I have created a regular expression that checks if a string is a valid postal code for the Netherlands. I chose the Dutch code because it included an interesting pattern - four numbers followed by two uppercase letters, excluding a few combinations of negative historical significance.

Table of Contents

Regex Components


Quantifiers, as the name implies, represent a quantity. More specifically, quantifiers restrict the number of characters to match.

Character Pattern Match
* appears 0 or more times
+ appears 1 or more times
? appears 0 or 1 time = optional character
{n} appears exactly n times
{min, max} appears a minimum and maximum number of times, indicates a range

The numbers 4 and 2 placed in curly braces below specify the number of of the type of character required just behind it. In the example, the pattern to match is exactly 4 digits, \d{4}, and 2 uppercase letters, [A-Z]{2}. regex-quantifier-charclass

Grouping Constructs

Grouping constructs allow the matching of a specific section of a string. This section is indicated with parentheses and known as a subexpression.

The parentheses surrounding the letter combinations below apply a pattern requirement to just those letters, and in this case, an exclusion if the pattern matches, (?!SA|SD|SS).


Bracket Expressions

Bracket Expressions indicate which characters to match. This is also known as a positive character group. A string that contains any character inside the brackets will return a positive match.

For example, the strings 'a', 'b', 'c', 'ac', 'cat', 'big', 'bridges', and '00c00' will all match the pattern [abc] because they contain at least one of the characters 'c', 'b', and/or 'c'. The string 'dog' will not match because it does not contain any of the three characters. Note that regular expressions are also case-sensitive. The string 'ABC' will not match this bracket expression. Patterns can be combined inside the brackets to include any desired character. For example, [a-zA-C4-6+] will return any string that contains a lowercase letter OR an uppercase A, B, C, OR a 4, 5, 6, OR a +.

In an early version of the Dutch Postal Code Regex, brackets were used to match all numbers and letters. [0-9] will return any string containing a digit from 0 through 9 and [A-Z] will return any string containing an upercase letter A through Z.


Character Classes

Bracket expressions and quantifiers are members of a broader category of regex components called Character Classes. These all match any included character that appears, but can be indicated by brackets [ ], a back slash , * or .


A lookaround is an example of an assertion. When a pattern match is made, a positive match returns 'match' and negative match returns 'no match'. A lookahead matches a pattern following something else and a lookbehind matches a pattern preceding.

Regex Lookaround
x(?=SA) positive lookahead for x followed by SA = match
x(?!SA) negative lookahead for x followed by SA = no match
x(?<=SA) positive lookbehind for x preceded by SA = match
x(?<!SA) negative lookbehind for x preceded by SA = no match

In the Dutch Postal Code Regex, a negative lookahead is used to exclude the three disallowed 2-letter combinations from the inclusive [A-Z]. The regex (?!SA|SD|SS) excludes SA, SD, or SS.


The OR Operator

The OR operator allows the matching of any characters without using a bracket expression. The OR is indicated with a pipe character, |. Using the example for bracket expressions above, the strings 'a', 'b', 'c', 'ac', 'cat', 'big', 'bridges', and '00c00' will all match the pattern (a|b|c).

The lookahead example above used the OR operator to exclude any of the three disallowed letter combinations, (SA|SD|SS). The OR operator is also used to include either a space OR a hyphen between the numbers and letters of the postal code.


Character Escapes

Character escapes are used to indicate a literal character versus a key regex character. The combination of an asterisk preceded by a backslash will be interpreted as the actual * character and not the wildcard.

A boundary escape, \b, is used in the code below to ensure a 4-digit number at the beginning of the string. If it were not used, any string of 4 digits or more would be included.




Shelley McHardy is a student in the Georgia Tech Coding Bootcamp looking forward to her Full Stack Web Developer Certificate in October.

Copy link


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment