Skip to content

Instantly share code, notes, and snippets.

@christiecamp
Last active December 16, 2023 22:25
Show Gist options
  • Save christiecamp/b2a020897ad44619a192deb35086c491 to your computer and use it in GitHub Desktop.
Save christiecamp/b2a020897ad44619a192deb35086c491 to your computer and use it in GitHub Desktop.
REGEX :: AUTOPSY FILES

header

markdown canva

darkmode

table-of-contents

overview

AUTOPSY FILES is here to dissect a variety of REGEX expressions to help you understand, and breakdown, each component. REGEX expressions - or Regular Expressions - is an exceptionally useful sequence of characters that specifies a match pattern in text.

the expression will accept a certain set a strings that match the pattern, and reject the rest.

regex

There are a variety of parts to every REGEX expression. We will be covering each portion in detail for the expressions below:

REGEX expression that checks for hex values (not case-sensitive)
const regex = /^#?([a-f0-9]{6}|[a-f0-9]{3})$/i;
REGEX expression that checks for all character values, including defined special characters
const regex = /^(\w[!@#$%\^&*)(+=./-])*$/;
REGEX expression that checks the validity of a phone number
const regex = /^(?:\d{3}|\(\d{3}\))([-.])\d{3}\1\d{4}$/;

anchors

Anchors are used at the start and end of a REGEX expression string, and describe the position of the expression in a line of text. Anchors are comprised of the caret ^ and dollar $ symbol.

The ^ symbol designates match start & the $ symbol designates match end.

Each REGEX expression below is defined by both the caret ^ and dollar $ symbol, stating the beginning and end of each match string.

hex value
/  ^  #?([a-f0-9]{6}|[a-f0-9]{3})  $  /i
character value
/  ^  (\w[!@#$%\^&*)(+=./-])*  $  /
phone number
/  ^  (?:\d{3}|\(\d{3}\))([-.])\d{3}\1\d{4}  $  /
We inspect the innards of each REGEX expression in the coming sections!

quantifiers

Quantifiers are used within the REGEX expression to dictate how many characters are expected within the string of text, and details how many instances the character(s) must be present for match.

  • The optional symbol ? informs that the proceeding character may, or may not, be present in the string for match.
  • The curly braces {..} orders a match of the proceeding character(s) for as many times defined inside the bracket.
  • The asterick symbol * orders a match of the preceding character(s) for 0 or more times (until infinity & beyond). This symbol is considered a repeater.
hex value quantifiers include: ?, {6}, {3}
  • ? :: the component proceeding can match 0 to 1 time - ([a-f0-9]{6}|[a-f0-9]{3}).
  • {6} & {3} :: the component preceeding these quantifiers should match - either 6 (Hex Triplet Format) or 3 (Shorthand Hex Format) characters.
/^#  ?  ([a-f0-9]  {6}  |[a-f0-9]  {3}  )$/i
character value quantifier: *
  • * :: the characters within the proceeding subexpression can match 0 or more times - (\w[!@#$%\^&*)(+=./-]).
/^(\w[!@#$%\^&*)(+=./-])  *  $/
phone number quantifiers include: {3}, {4}
  • {3} & {4} :: the component preceeding these quantifiers should match - ###-###-#### (three digits \d{3}, three digits \d{3}, four digits \d{4}).
/^(?:\d {3} |\(\d {3} \))([-.])\d {3} \1\d  {4}  $/

grouping

Grouping Constructs, or subexpressions, are used to break up the string into sections to fulfill different requirements. Subexpressions are segements inside parenthesis (), and have two primary categories: capturing and non-capturing.

  • capturing subexpressions capture the match character sequence for possible re-use.
  • non-capturing subexpressions do not capture the match character sequence. This can be done by adding ?: at the beginning of the expression string inside the ().
hex value subexpression: ([a-f0-9]{6}|[a-f0-9]{3})
  • (..):: match the (subexpression) that's repeated in the input string.
/^#?  ([a-f0-9]{6}|[a-f0-9]{3})  $/i
character value subexpression: (\w[!@#$%\^&*)(+=./-])
  • (..):: match one or more characters in the (subexpression)0 or more times.
/^  (\w[!@#$%\^&*)(+=./-])  *$/
phone number subexpressions includes: (?:\d{3}|\(\d{3}\)) & ([-.])
  • ?: :: match one or more characters in the (?:subexpression) & do not assign the match to a captured group (non-capturing).
  • (..):: match the subexpression within the [].
/^  (?:\d{3}|\(\d{3}\)) ([-.])  \d{3}\1\d{4}$/

bracket

Bracket Expressions, or positive character groups, are used to signify a range of characters needed for match. These expressions reside within square brackets [].

  • bracket expressions can be turned into negative character groups by adding the ^ symbol to the beginning of the expression string inside the [].

these expressions do not require the string to match all characters in the pattern.

hex value bracket expressions: [a-f0-9] & [a-f0-9]
  • [..]:: match one or more characters in the outline (for both expressions).
/^#?(  [a-f0-9] {6}| [a-f0-9] {3})$/i
character value bracket expression: [!@#$%\^&*)(+=./-]
  • [..]:: match one or more characters in the outline.
/^(\w  [!@#$%\^&*)(+=./-]  )*$/
phone number bracket expression: [-.]
  • [..]:: match one character in the outline.
/^(?:\d{3}|\(\d{3}\))( [-.] )\d{3}\1\d{4}$/

classes

Character Classes define a set of characters, within a string, that fulfils a match to the REGEX expression.

  • characters within [..] are accepted as a match.
  • characters within range expression [.-.] are accepted as a match.
  • the \d symbol matches any arabic numeral digit - is the equivalent to the range expression [0-9].
  • if the ^ is included within the expression string, then the characters are not a match - ie [^0-9] means .
hex value character classes: [a-f0-9] & [a-f0-9]
  • [a-f0-9] :: match to character values a-f& 0-9.
/^#?(  [a-f0-9]  {6}|  [a-f0-9]  {3})$/i
character value character classes: (\w[!@#$%\^&*)(+=./-])
  • \w :: match to any word character - [a-zA-Z0-9_].
  • [..] :: match to any character value [!@#$%\^&*)(+=./-].
/^  (\w[!@#$%\^&*)(+=./-])  *$/
phone number character classes: [-.] & \d
  • [..] :: match one character value - - or ..
  • \d :: match any digit character value - [0-9].
/^(?: \d {3}|\( \d {3}\))( [-.] ) \d {3}\1 \d {4}$/

operator

The OR operator matches any one element in the string proceding or succeeding the vertical bar | character.

hex value OR operator | seperates two bracket expressions: [a-f0-9]{6} & [a-f0-9]{3}
  • match can be either [a-f0-9]{6} or [a-f0-9]{3}.
/^#?([a-f0-9]{6}  |  [a-f0-9]{3})$/i
character value does not contain an OR operator
    > > > > > 
phone number OR operator | seperates two character classes: \d{3} & \(\d{3}\)
  • match can be either ### or | (###)
/^(?: \d {3} | \( \d {3}\))( [-.] ) \d {3}\1 \d {4}$/

flags

Flags are used at the end of the REGEX expression to define additional functionality or limits for match. A typical expression is wrapped in slash / symbols, which inform the start and end of the /regex/. There are 6 optional flags, but the three listed below are most frequently used:

  • global search g - expression tested against all possible matches in a string.
  • case-insensitive search i - case should be ignored while attempting a match.
  • multi-line search m - multi-line input treated as multiple lines
hex value flag: i
  • i:: match search is case-insensitive - can use fffff or FFFFFF.
/^#?([a-f0-9]{6}|[a-f0-9]{3})$/  i
character value does not contain a flag
    > > > > > 
phone number does not contain a flag
    > > > > > 

escapes

Character Escapes are used to escape special characters by using the backslash symbol \, making it literal and considered for match.

all special characters, including the backslash \, lose their significance inside bracket expressions [].

hex value does not contain an escape character
    > > > > > 
character value escape character \ defines: w
  • \w:: match search uses character class escape to include any word character - [a-zA-Z0-9_].
/^(  \w  [!@#$%\^&*)(+=./-])*$/
phone number escape character \ defines: (, ), d. 1
  • \( & \):: match uses ( & ) in its literal form - (###).
  • \d :: match search uses character class escape to include only digits - [0-9].
  • \1 :: match remembered from first captured group - [-.].
/^(?:  \d{3}| \(  \d{3}  \) )([-.])  \d{3}  \1  \d{4}$/

sources

Dissect the following sources to expand your knowledge further!
  1. MDN Web Docs
  2. Regular-Expressions
  3. Geeks for Geeks
  4. Full-Stack Blog
  5. Wikipedia

connect

reach out with feedback! 🧫

github

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment