A regular expression (regex) is a sequence of characters that defines a search pattern for a body of text. Regex are not programming-language specific objects. Regex consist of metacharacters and literals. Metacharacters are special characters, and literals are all other standard characters.
In JavaScript, a regex can be created using literal notation, which requires wrapping the the regex in forward slash characters (/) or by using a RegExp() constructor.
In this tutorial, I will describe the regex used for matching an email address.
/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/
- Regex Components
- Anchors
- Quantifiers
- Character Classes
- Grouping and Capturing
- Bracket Expressions
- Greedy and Lazy Match
- Resources
Anchors are characters that don't match any characters, but that assert something about the string or the matching process. The caret (^) anchor signifies a string that begins with the characters that follow it. The dollar sign ($) anchor signifies the last character in a string. Thus, carets signify the start of a string of characters to search for, and dollar signs signify the end of a string of characters to search for.
Quanifiers set the limit of the string that a regex is matching. The plus sign (+) and curly brackets ( { } ) signify quantifiers.
+
denotes that the pattern can be matched one or more times. In the matching an email address regex, the[a-z0-9_\.-]+)
and([\da-z\.-]+)
strings can be matched one or more times.{ }
denotes that the string must contain a minimum and maximum of characters. In the matching an email address regex,{2,6}
denotes that the string must contain a minimum of two (2) characters and a maximum of six (6) characters.
Character classes make regex objects more compact. The \d denotes to search a string for any digit (Arabic numeral).
Regex can be separated into groups of strings. In the matching an email address regex, the strings are grouped into three groups and separated by three different characters: @, \ , and . This pattern forms the pattern for an email address (e.g., name@emailaddress.com). The back slash ( \ ) in this case indicates that the following character should be treated specially or escaped (e.g., the period needs to be included in the search, so the back slash escapes the period instead of it acting as a wildcard).
The square brackets ( [ ] ) signify a bracket expression, which is written inside the brackets and signifies the range of characters the developer wants to match. There are three sets of bracket expressions in the matching an email address regex.
[a-z0-9_\.-]
denotes that the string can contain any characters from a-z, any digits between 0-9, an underscore ( _ ), a period (.), or a hyphen (-). The back slash ( \ ) indicates that the following character (a period) should be treated specially or escaped.[\da-z\.-]
denotes that the string can contain any characters from a-z, a period (.), or a hyphen (-). The \d denotes any digit (Arabic numeral). The back slash ( \ ) indicates that the following character (a period) should be treated specially or escaped.[a-z\.]
denotes that the string can contain any characters from a-z or a period (.). The back slash ( \ ) indicates that the following character (a period) should be treated specially or escaped.
Regex can be written in greedy mode (+) or lazy mode (?). In greedy mode, a character is repeated as many times as possible, while in lazy mode, a character is repeated a minimal number of times. In the matching an email address regex, the plus sign appears twice, denoting that the string patterns [a-z0-9_\.-]+)
and ([\da-z\.-]+)
can be matched one or more times.
- A Very Long Regular Expression Tutorial by Claudia Davis
- Regular Expressions - Programming with Text by The Coding Train
- Regular Expression Tutorial by The Full-Stack Blog
- The Only Guide You Need To Master Regular Expression by Sara A. Metwalli