Skip to content

Instantly share code, notes, and snippets.

@ahmadelgamal
Last active December 19, 2023 14:56
Show Gist options
  • Save ahmadelgamal/d3ea09a794856a386aeb949ef7a91f4e to your computer and use it in GitHub Desktop.
Save ahmadelgamal/d3ea09a794856a386aeb949ef7a91f4e to your computer and use it in GitHub Desktop.
Regex Tutorial

Regex Tutorial

This tutorial is a quick-reference for common regular expressions, or regex, components, and how to use them.

Summary

Regular expressions are used to validate input by searching for (matching) specific patterns in a given string. This is helpful in many applications, such as to check if an email input is the correct format of an email, or if a user enters a URL that matches the correct format of a URL, etc.

Although they may seem confusing at first, with practice they become easy to work with and can be very powerful in making your code less succeptible to user error.

Table of Contents

Anchors

Operator Pattern Matches a string that Match Example
plain text abc has abc in it xyzabcdef
^ ^abc starts with abc abcdefg
$ abc$ ends with abc xyzabc
^ $ ^abc$ is an exact match abc

Quantifiers

Operator Pattern Matches a string that has Match Examples
* abc* ab followed by 0 or more c ab, abc, etc.
+ abc+ ab followed by 1 or more c abc, abcc, etc.
? abc? ab followed by 0 or 1 c ab and abc
{} abc{2} ab followed by 2 c abcc
{,} abc{2,} ab followed by 2 or more c abcc, abccc, etc.
{,} abc{2,5} ab followed by 2 to 5 c abcc ... abccccc
()* a(bc)* a followed by 0 or more bc a, abc, abcbc, etc.
(){} a(bc){2,5} a followed by 2 to 5 bc abc ... abcbcbcbcbc

OR Operator

Operator Pattern Matches a string that has Match Example
| a(b|c) a followed by b or c, and captures b or c xyzabxyz, xyzacxyz, etc.
[] a[bc] (same as above) but does not capture b or c (same as above)
  • Capturing means saving the collected value in order to use it later.

Character Classes

Operator Description Match Example
\d matches a single digit character 0...9
\D matches a single non-digit character
\w matches a word character a...z, 0...9, _
\W matches a non-word character
\s matches a whitespace charachter (includes tabs and line-breaks)
\S matches a non-whitespace charachter
. matches any character
  • The backslash (\) is used to escape the literal characters.
  • Tabs are symbolized with \t
  • Line-breaks are symbolized with \n (for new-lines) on most OS. However, old Mac OS expects \r (for carriage return) and windows expects \r\n.

Flags

Flag Meaning Pattern Description Match Example
g global /abc/g Continues subsequent searching from the end of the previous match, instead of returning after the first match abcdefabcghiabc, etc
m multi-line /^abc/m makes ^ and $ match the start and end of a line, instead of the whole string abcdef
i insensitive /aBc/i Makes the whole expression case-insensitive AbC

Grouping and Capturing

Operator Pattern Description
() a(bc) Creates a capturing group with value bc
?: a(?:bc)* Disables the capturing group
?<foo> a(?<foo>bc) Names the group. In this case it names it foo
  • This operator allow us to capture the data in an array and access the values using the index of the match.
  • The names the groups are they keys of each group.

Bracket Expressions

Operator Pattern Description Match Example
[] [abc] same as a|b|c a, a b, a c
same [a‑c] same as above same as above
same [a‑zA‑Z0‑9] any alphanumeric character a...z, A...Z, 0...9
same [^a‑zA‑Z] ^ negates the expression. Matches a string with no characters from a to z nor from A to Z
  • Special characters, such as \, do not work inside brackets.

Greedy and Lazy Match

Operator Description
<.+?> matches any character, as many times, between < and >
<[^<>]+> matches any character, except < or >, as many times, between < and >
  • Greedy operators (* + {}) expand the match as far as possible.
  • ? makes them lazy.

Boundaries

Operator Pattern Description
\b \babc\b Matches whole words only (in this example, the word is abc). For example, one side is a space and the other is the start or end of a string or line
\B \Babc\B Negation of above. So, it matches the pattern only when surrounded by word characters

Back-references

Operator Pattern Description
\1 ([abc])\1 matches the same text that was matched by the 1st capturing group
([abc])([de])\2\1 matches the same text that was matched by the second capturing group
(?<foo>[abc])\k<foo> names the group foo then references it later. Same as \1

Look-ahead and Look-behind

Operator Pattern Description
(?=) a(?=b) matches an a only if it is followed by a b
(?!r) a(?!b) (negation of above) matches an a only if is not followed by a b
(?<=) (?<=b)a matches an a only if it follows a b
(?<!r) (?<!b)a (negation of above) matches an a only if it does not follow a b
  • All 4 scenarios above will not capture the b in the regex match.

Notes

  • Regex are usually used with the match() method, such as string.match(/[0-9]/).
  • Regex search patterns are delimited by two forward slash (/) characters.
  • Operators can be combined together.
  • Flags can also be combined together and can be combined with operators.
  • You can turn any pattern being matched into a token by enclosing the pattern in parentheses within the expression. For example, to create a token for a dollar amount, you could use (\$\d+). Each token in the expression is assigned a number from 1 to 255 going from left to right.
  • To make a reference to a token later in the expression, refer to it using a backslash followed by the token number. For example, when referencing a token generated by the third set of parentheses in the expression, use \3.

Credits

The main source for this tutorial is Johnny Fox's article, "Regex tutorial - A quick cheatsheet by examples", published at: https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285.

Author

Please send your questions and / or comments to Ahmad El Gamal at ahmadelgamal@gmail.com, or contact me on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment