Skip to content

Instantly share code, notes, and snippets.

@Bluekev22
Last active September 21, 2021 00:41
Show Gist options
  • Save Bluekev22/71e99bf295b2d93f4fd9eb1b74a0e45e to your computer and use it in GitHub Desktop.
Save Bluekev22/71e99bf295b2d93f4fd9eb1b74a0e45e to your computer and use it in GitHub Desktop.
Regex Tutorial: Matching a phone number

Regex Tutorial: Matching a phone number

In this tutorial we will be taking a dive into Regex. Regex, or regular expressions, are patterns that are used to search for character combinations in strings. They are used by string-searching algorithms and for matching input validation.

Summary

Today we are going to be taking a look how to implement a regex to verify that a phone number is valid:

/(?:(\+1)[ -]?)?\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/

It make look daunting at first, so let's break down what's going on here to get a better understanding.

Table of Contents

Regex Components

Character Classes

Character Classes, or a character set, tells the regex engine to match only one out of several specific characters, such as digits, words, or whitespace

  • \d - matches a single character that is a digit
  • \w - matches a word character (any alphanumeric character including underscore)
  • \s - matches a whitespace character (including tabs and line brakes)
  • . - matches any character(wildcard)
  • the capital case for any aformentioned characters will inverse the match
  • Examples:
\d    matches any single digit 0-9
\w    matches any single character a-z
\s    matches ` `
.     matches any character
\D    matches any single non-digit character
\W    matches any single non-character a-z
\S    matches a single non-` `

So, to search for just any one number we would create a regex like so

/\d/

Quantifiers

Qunatifiers are characters within the regex that specify how many instances a character, group, or character class that precedes it must be present in the input to be matched.

  • * - matches the pattern zero or more times

  • + - matches the pattern one or more times

  • ? - matches the pattern zero or one time

  • {n} - Matches the pattern exactly n number of times

  • {n,} - Matches the pattern at least n number of times

  • {n,x} - Matches the pattern from a minimum of n number of times to a maximum of x number of times

  • ()* - matches a string that has any preceding characters followed by zero or more copies of the string within the parentheses

  • Examples:

xyz*        matches a string that has xy followed by zero or more z
xyz+        matches a string that has xy followed by one or more z
xyz?        matches a string that has xy followed by zero or one z
xyz{2}      matches a string that has xy followed by 2 z
xyz{2,}     matches a string that has xy followed by 2 or more z
xyz{2,5}    matches a string that has xy followed by 2 up to 5 z
x(yz)*      matches a string that has x followed by zero or more copies of the sequence yz
x(yz){2,5}  matches a string that has x followed by 2 up to 5 copies of the sequence yz

Let's take a look a simple phone number:

1234567890

If we want to search for this then we would just specify we want ten single digits like this

/\d{10}/

But phone numbers can be valid in many ways. What if we wanted to make hyphens valid

123-456-7890

We would just add a hyphen - followed by the quantifier ? for it to be optional as a parameter, and then group the numbers together, 3, 3, and 4

/\d{3}-?\d{3}-?\d{4}/

But sometimes phone numbers are written with spaces like this

123 456 7890

We can also account for that space, but for that we need to learn about backet expressions first!

Bracket Expressions

Bracket Expressions are characters enclosed by a bracket [] matching any single character within the brackets. *note: if the first character within the brackets is a ^ then it signifies any chracter not in the list, and is unspecified whether it matches an encoding error.

Examples of Bracket Expressions are as follows:

  • [] - matching any single character within the brackets
  • []% - matching the string inside the brackets before the %
  • [^] - matching any string that has not a letter from within the brackets (negation of expression)
  • Examples:
[xyz]         matches a string that etiher has x or x y or x z (same as x|y|z)
[x-y]         similar to case above
[u-zU-Z0-9]   a string that represents a single hexadecimal digit, case insensitively
[0-9]%        a string that has a character from 0-9 before a %
[^a-zA-Z]     a string that has not a letter from a to z or from A to Z

So having that in mind, we would enclose the space and hyphen within the bracket like so

[ -]?

And then add that to what we have so far

/\d{3}[ -]?\d{3}[ -]?\d{4}/

Grouping and Capturing

Grouping unifies a pattern or string so that it is matched in a complete block

Examples of Grouping are as follows:

  • () - parentheses creates a capture group
  • (?:) - using ?: disables the capturing group
  • (?<>) - using ?<> puts a name to the group
  • Examples:
x(yz)           parentheses create a capturing group with value yz
x(?:yz)*        using ?: we disable the capturing group
x(?<bar>yz)     using ?<bar> we put a name to the group

So say we wanted to not only match the whole phone number entered, but the individual groups of numbers, so that we can format exactly the way we want? That's where we place the parentheses around each group of digits

/(\d{3})[ -]?(\d{3})[ -]?(\d{4})/

But what if the area code has parentheses around it like this?

(123) 456-7890

To match special characters, we need to use character escapes

Character Escapes

A backslash \ is used in regular expressions to escape the next character that would otherwise be interpreted literally. This allows us to include reserved character such as { } [ ] / \ + * . $ ^ | ? as matching characters. To use one of these special characters as a matching character, prepend it with \.

So in our example, we must wrap the area code with a backslash before each parenthesis, followed by ? to make it optional

/\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/

There's one more thing to account for. Sometimes the international calling code appears like this

+1 123 456 7890

In this case we would add the +1 with escapement \+1, then capture that together (\+1), account for the space or hyphen and make sure it's optional (\+1[ -])?

After that we want to put another grouping around just the plus 1 ((\+1)[ -])? and then we can disable the capturing of the space with ?:, like so

(?:(\+1)[ -])?

After that, we're all done and can bring it all together!

/(?:(\+1)[ -]?)?\(?(\d{3})\)?[ -]?(\d{3})[ -]?(\d{4})/

Resources

This was a simple example, and there are many other regex components to learn. So check out this website for more:

Learn Regex

You can also test and build on what you've learned with this regex testing website:

Regex Testing

Author

Kevin Shank is a web developer enrolled in Michigan State University's full stack coding bootcamp.

Feel free to check out his GitHub repo for all of his projects:

GitHub Profile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment