AUTOPSY FILES is here to dissect a variety of REGEX expressions to help you understand, and breakdown, each component. REGEX expressions - or Regular Expressions - is an exceptionally useful sequence of characters that specifies a match pattern in text.
the expression will accept a certain set a strings that match the pattern, and reject the rest.
There are a variety of parts to every REGEX expression. We will be covering each portion in detail for the expressions below:
const regex = /^#?([a-f0-9]{6}|[a-f0-9]{3})$/i;
const regex = /^(\w[!@#$%\^&*)(+=./-])*$/;
const regex = /^(?:\d{3}|\(\d{3}\))([-.])\d{3}\1\d{4}$/;
Anchors are used at the start and end of a REGEX expression string, and describe the position of the expression in a line of text. Anchors are comprised of the caret ^
and dollar $
symbol.
The
^
symbol designates match start & the$
symbol designates match end.
Each REGEX expression below is defined by both the caret ^
and dollar $
symbol, stating the beginning and end of each match string.
/ ^ #?([a-f0-9]{6}|[a-f0-9]{3}) $ /i
/ ^ (\w[!@#$%\^&*)(+=./-])* $ /
/ ^ (?:\d{3}|\(\d{3}\))([-.])\d{3}\1\d{4} $ /
Quantifiers are used within the REGEX expression to dictate how many characters are expected within the string of text, and details how many instances the character(s) must be present for match.
- The optional symbol
?
informs that the proceeding character may, or may not, be present in the string for match. - The curly braces
{..}
orders a match of the proceeding character(s) for as many times defined inside the bracket. - The asterick symbol
*
orders a match of the preceding character(s) for 0 or more times (until infinity & beyond). This symbol is considered a repeater.
?
:: the component proceeding can match 0 to 1 time -([a-f0-9]{6}|[a-f0-9]{3})
.{6}
&{3}
:: the component preceeding these quantifiers should match - either 6 (Hex Triplet Format) or 3 (Shorthand Hex Format) characters.
/^# ? ([a-f0-9] {6} |[a-f0-9] {3} )$/i
*
:: the characters within the proceeding subexpression can match 0 or more times -(\w[!@#$%\^&*)(+=./-])
.
/^(\w[!@#$%\^&*)(+=./-]) * $/
{3}
&{4}
:: the component preceeding these quantifiers should match - ###-###-#### (three digits\d{3}
, three digits\d{3}
, four digits\d{4}
).
/^(?:\d {3} |\(\d {3} \))([-.])\d {3} \1\d {4} $/
Grouping Constructs, or subexpressions, are used to break up the string into sections to fulfill different requirements. Subexpressions are segements inside parenthesis ()
, and have two primary categories: capturing and non-capturing.
- capturing subexpressions capture the match character sequence for possible re-use.
- non-capturing subexpressions do not capture the match character sequence. This can be done by adding
?:
at the beginning of the expression string inside the()
.
(..)
:: match the(subexpression)
that's repeated in the input string.
/^#? ([a-f0-9]{6}|[a-f0-9]{3}) $/i
(..)
:: match one or more characters in the(subexpression)
0 or more times.
/^ (\w[!@#$%\^&*)(+=./-]) *$/
?:
:: match one or more characters in the(?:subexpression)
& do not assign the match to a captured group (non-capturing).(..)
:: match the subexpression within the[]
.
/^ (?:\d{3}|\(\d{3}\)) ([-.]) \d{3}\1\d{4}$/
Bracket Expressions, or positive character groups, are used to signify a range of characters needed for match. These expressions reside within square brackets []
.
- bracket expressions can be turned into negative character groups by adding the
^
symbol to the beginning of the expression string inside the[]
.
these expressions do not require the string to match all characters in the pattern.
[..]
:: match one or more characters in the outline (for both expressions).
/^#?( [a-f0-9] {6}| [a-f0-9] {3})$/i
[..]
:: match one or more characters in the outline.
/^(\w [!@#$%\^&*)(+=./-] )*$/
[..]
:: match one character in the outline.
/^(?:\d{3}|\(\d{3}\))( [-.] )\d{3}\1\d{4}$/
Character Classes define a set of characters, within a string, that fulfils a match to the REGEX expression.
- characters within
[..]
are accepted as a match. - characters within range expression
[.-.]
are accepted as a match. - the
\d
symbol matches any arabic numeral digit - is the equivalent to the range expression[0-9]
. - if the
^
is included within the expression string, then the characters are not a match - ie[^0-9]
means .
[a-f0-9]
:: match to character valuesa-f
&0-9
.
/^#?( [a-f0-9] {6}| [a-f0-9] {3})$/i
\w
:: match to any word character -[a-zA-Z0-9_]
.[..]
:: match to any character value[!@#$%\^&*)(+=./-]
.
/^ (\w[!@#$%\^&*)(+=./-]) *$/
[..]
:: match one character value --
or.
.\d
:: match any digit character value -[0-9]
.
/^(?: \d {3}|\( \d {3}\))( [-.] ) \d {3}\1 \d {4}$/
The OR operator matches any one element in the string proceding or succeeding the vertical bar |
character.
- match can be either
[a-f0-9]{6}
or[a-f0-9]{3}
.
/^#?([a-f0-9]{6} | [a-f0-9]{3})$/i
> > > > >
- match can be either
###
or|
(###)
/^(?: \d {3} | \( \d {3}\))( [-.] ) \d {3}\1 \d {4}$/
Flags are used at the end of the REGEX expression to define additional functionality or limits for match. A typical expression is wrapped in slash /
symbols, which inform the start and end of the /regex/
. There are 6 optional flags, but the three listed below are most frequently used:
- global search
g
- expression tested against all possible matches in a string. - case-insensitive search
i
- case should be ignored while attempting a match. - multi-line search
m
- multi-line input treated as multiple lines
i
:: match search is case-insensitive - can usefffff
orFFFFFF
.
/^#?([a-f0-9]{6}|[a-f0-9]{3})$/ i
> > > > >
> > > > >
Character Escapes are used to escape special characters by using the backslash symbol \
, making it literal and considered for match.
all special characters, including the backslash
\
, lose their significance inside bracket expressions[]
.
> > > > >
\w
:: match search uses character class escape to include any word character -[a-zA-Z0-9_]
.
/^( \w [!@#$%\^&*)(+=./-])*$/
\(
&\)
:: match uses(
&)
in its literal form -(###)
.\d
:: match search uses character class escape to include only digits -[0-9]
.\1
:: match remembered from first captured group -[-.]
.
/^(?: \d{3}| \( \d{3} \) )([-.]) \d{3} \1 \d{4}$/