Skip to content

Instantly share code, notes, and snippets.

@janimuhlestein
Last active November 24, 2020 08:03
Show Gist options
  • Save janimuhlestein/622f9e5fcf23a17f031d02e7452be5ee to your computer and use it in GitHub Desktop.
Save janimuhlestein/622f9e5fcf23a17f031d02e7452be5ee to your computer and use it in GitHub Desktop.

Using a regular expression to verify a social security number.

Regular expressions are a series of characters that create a pattern for the code to use in searching for specific strings within text. They are used for several different things, such as finding types of text within a string, verifying user-entered data such as social security numbers.

Summary

The regex I have chosen to explain is that for verifying a social security number. The regex itself is ^(?!666|000|\\d{2})\\d{3}-(?!00)\\d{2}-(?!0{4}\\d{4}$.

Table of Contents

Regex Components

Anchors

We have a ^ as the beginning of the line, and $ as the end.

Quantifiers

The first quantifier the (?!666|000|9\\d{2}) and means that the first three characters may not be 666, 000. The 9\\d{2} states that the first three digits cannot be between 900 or 999.

Shortly thereafter we have a \\d{3}, requiring three digits to start. Then, another \\d{2} and a \\d{4}, requiring two and four digits after the dashes.

We use it to guarantee that the four digits after are not equal to 0: (?00{4}).

OR Operator

The or operator is used in the beginning section (?!666|000|9) to represent that none of the conditions for the first three characters may exist (no 666, OR 000, OR 900-999).

Character Classes

The \d is used to designate that only digits are accepted.

Grouping and Capturing

We use () to group and capture the three digit sections. (?666|000|9\\d{2}) groups the first elements that the first three digits must not contain. (?00) groups the requirement for the second set of digits and states that they cannot be equal to 00. The, we have another grouping (?!0{4}) to say that the last set of digits cannot be 0000.

Greedy and Lazy Match

The phrase (?!666|000|9\\d{2}) contains a greedy or lazy match for any two digits after the 9. The \\d{3} requires that there be a greedy or lazy match for any three digits (except the ones already excluded). Then we have another match for \\d{2} for any two digits (except 00, which is previously excluded). Then the last set of digits is matched to \{4} (except for 0000). So it would lazily match anything from 0001 to 9999.

Final Analysis

We start and end with the anchors: ^ and $. The first section uses the negation operator to state that the first three digits cannot be 666, 000, or 900-999: (?!666|000|9\\d{2}). Then, we require any other three digits, with \\d{3}.

Then, we match against a dash with -.

The second group of numbers cannot be 00. We use the following to state that: (!00). We add \\d{2} after to require any two digits from 01 to 99.

Then another dash is required.

Lastly, we verify that the last four digits are not 0000 with (?0{4}). However, we do require that there are four digits with \\d{4}, which will match anything between 0001 and 9999.

Author

Jani Muhlestein is a software test engineer who is always interested in, and fascinate by, the software industry, and the continuously changing technology that drives it. My github is: https://github.com/janimuhlestein.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment