Skip to content

Instantly share code, notes, and snippets.

@janimuhlestein
Created November 24, 2020 07:56
Show Gist options
  • Save janimuhlestein/e056f7f6d498bd80f01ab16832147a24 to your computer and use it in GitHub Desktop.
Save janimuhlestein/e056f7f6d498bd80f01ab16832147a24 to your computer and use it in GitHub Desktop.
An analysis of the regular expression for verifying a social security number.

Defining a search pattern within a block of text.

Regular expressions are a series of characters that create a pattern for the code to use in searching for specific strings within text. They are used for several different things, such as finding types of text within a string, verifying user-entered data such as card numbers, phone numbers, and email addresses.

Summary

The regex I have chosen to explain is that for verifying a social security number. The regex itself is ^(?!666|000|9\d{2})\d{3}-(?!00)\d{2}-(?!0{4})\d{4}$.

Table of Contents

Regex Components

Anchors

We have a ^ as the beginning of the line, and $ as the end.

Quantifiers

The first quantifier the (?!666|000|9\d{2}) and means that the first three characters may not be 666, 000. The 9\d{2} states that the first three digits cannot be between 900 or 999.

Shortly thereafter we have a \d{3}, requiring three digits to start. Then, another \d{2} and a \d{4}, requiring two and four digits after the dashes.

We use it to guarantee that the four digits after are not equal to 0: (?00{4}).

OR Operator

The or operator is used in the beginning section (?!666|000|9) to represent that none of the conditions for the first three characters may exist (no 666, OR 000, OR 900-999).

Character Classes

The \d is used to designate that only digits are accepted.

Flags

No flags are required or used.

Grouping and Capturing

We use () to group and capture the three digit sections. (?666|000|9\d{2}) groups the first elements that the first three digits must not contain. (?00) groups the requirement for the second set of digits and states that they cannot be equal to 00. The, we have another grouping (?!0{4}) to say that the last set of digits cannot be 0000.

Bracket Expressions

There are none required.

Greedy and Lazy Match

The phrase (?!666|000|9\d{2}) contains a greedy or lazy match for any two digits after the 9. The \d{3} requires that there be a greedy or lazy match for any three digits (except the ones already excluded). Then we have another match for \d{2} for any two digits (except 00, which is previously excluded). Then the last set of digits is matched to \{4} (except for 0000). So it would lazily match anything from 0001 to 9999.

Boundaries

None used.

Back-references

None

Look-ahead and Look-behind

None

Final Analysis

We start and end with the anchors: ^ and $. The first section uses the negation operator to state that the first three digits cannot be 666, 000, or 900-999: (?!666|000|9\d{2}). Then, we require any other three digits, with \d{3}.

Then, we match against a dash with -.

The second group of numbers cannot be 00. We use the following to state that: (!00). We add \d{2} after to require any two digits from 01 to 99.

Then another dash is required.

Lastly, we verify that the last four digits are not 0000 with (?0{4}). However, we do require that there are four digits with \d{4}, which will match anything between 0001 and 9999.

Author

Jani Muhlestein is a software test engineer who is always interested in, and fascinate by, the software industry, and the continuously changing technology that drives it. My github is: https://github.com/janimuhlestein.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment