ErikPlachta/CS Regex Reference Guide JavaScript.md

## CS Regex Reference Guide JavaScript.md

      
    Raw
  

              CS Regex Reference Guide JavaScript.md
            
          
    CS Regex Reference Guide for JavaScript

Check out this Gist if you're interested in learning more about Regex, aka
Regular Expressions.

It's not a complete guide, but I did cover the basics to help you get started.
I've also included my references and contact information at the bottom if you
want to learn more.


Summary

This is a general reference guide on how to understand and use some basic regular
expressions. I've broken down specific functions below with simple examples to
help you learn the concepts quickly.

Publish Notes
The content on this Gist was created on this GitHub Repo, published to this GitHub Website, and published on this GitHub Gist.


Repo Stats

 
Table of Contents


CS Regex Reference Guide for JavaScript

Summary
Repo Stats
Table of Contents
1. What is Regex?

What are some other ways to explain Regular Expressions?


2. Starting with Examples

Understanding Regex -> Regular Expressions Are not Easy to Understand
Example - Phone Number
Example - Email Address


The Syntax

1. Regex Components

1.1 Literal Characters
1.2 Meta Characters


2. Anchors / Positions
3. Quantifiers -  Greedy and Lazy Match
4. OR Operators

4.1 Character Classes / Bracket Expressions
4.2 Alteration Classes / Grouping and Capturing


5. Boundaries
6. Flags
7. Back-references
8. Look-ahead and Look-behind


Author
Contact Me
Resources and References


1. What is Regex?

... aka regular expression, is a universal syntax language used to simplify
advanced searching/filtering of content based on a user-specified search pattern.

You define what you are searching at the level of precision you need.

What makes a regex search/filter different from others is that it searches for
patterns in ASCII or Unicode characters.

You're not just looking for a specific character value, you're looking for all
instances of a pattern within the content. For example, all phone-numbers,
email-addresses, websites, or really any type of content that follows a
universal pattern.


What are some other ways to explain Regular Expressions?

Great questions!
The MDN team said,

"Regular expressions are patterns used to match character combinations in strings."

Wikipedia says,

"A regular expression is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation."


2. Starting with Examples

Understanding Regex -> Regular Expressions Are not Easy to Understand

To understand a regex pattern, the search / filter you're creating, you'll need
to learn some syntax. But first, let's start with some examples.
Example - Phone Number

Without the area-code, phone numbers are generally 10-digits separated by a space
or a hyphen.

We can look for 10 digits like this /\d\d\d-\d\d\d-\d\d\d\d/
Here's, we're using the meta character argument \d for
each unique digit that we're searching for separated by a hyphen.
BUT what IF the single phone-number is formatted differently through-out the data?
For Example, our regex expression ran on the below data would only return 1
results, 123-456-7890, even though ALL of them are the same phone number.

(123)-456-7890, 123-456-7890, 123.456.7890, and 123 456 7890


So how could we improve our regex expression?
Well, considering the same phone numbers 4 times again, we want to account for
(,),  , and -.

(123)-456-7890, 123-456-7890, 123.456.7890, and 123 456 7890

1. Add a FEW OR operators to account for spaces vs hyphens
> To do this we'll use [], the character class syntax,
> where each literal character inside the square-
> brackets is considered a unique argument.
2. Add the ability to match ( and ) if they exist.
> For optional parameters, we'll use the ? quantifier,
> which allows us to search for instances that a value does and does not exist.
What does this fully fleshed out syntactically accurate regex argument look like?
/(\(?)+(\d{3})+[-.) ]+(\s?)(\d{3})+[-. ]+(\s?)+(\d{4})/
Let's break it down 👇🏼


Syntax
Description


/
Starting the regex expression


\(?
Left-parenthesis ( if exists


+
Followed by...


\d{3}
A collection of 3 digits


+
Followed by...


[)-. ]
a right-parenthesis ), OR hyphen -, OR period ., OR a space 


+
Followed by...


\s?
A white-space if it exists


[- ]
Hyphen OR a space


+
Followed by...


\d{3}
A collection of 3 digits


+
Followed by...


[-. ]
A hyphen, dash, or space


+
Followed by...


\s?
A white-space if it exists


+
Followed by...


\d{4}
A collection of 4 digits


/
Ending the regex expression


Example - Email Address

Now that we've covered the basics, let's look at a regex search pattern built to
search for email addresses.

You'll notice I've covered less details here.

Do you see a pattern in this regex search pattern?
/([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})/
Let's break it apart into smaller chunks based on the high-level patterns we see!


Symbol
Description


/
Starting the regex expression


(
Encapsulating a sub-expression


[a-z0-9_\.-]
Any alpha-numeric characters along with _, ., and -. Results must


+)
Ending sub-expression, and requiring it be followed by the next argument.


@
MUST include the @ character.


([\da-z\.-]+)
Any digit or alpha-number character, followed by a . or -.


([a-z\.]{2,6})
Look for alpha-characters of any combination between 2-6 characters in length


/
Ending the regex expression


The Syntax

In this section, I've included the syntax used in the above examples. The goal
is to have it server as a point-of-reference for the above along with helping you
develop the confidence and knowledge to create your own regex patterns.

Not all sections contain examples by design.


Note: If you're looking for a resource to make it easy to lean regex while
following along with this guide, check out this  website https://regexr.com/.

It allows you to create regex search patterns and get live-feedback.


1. Regex Components

1.1 Literal Characters

Any/all ASCII or Unicode characters you're wanting to search for or filter out.

This will include single characters.


Syntax
Description


a-z
Any lower-case letters


A-Z
Any upper-case letters


0-9
Any digits


\.
A period character


Unicode Characters
There's a lot. here's an index -> Microsoft - Insert ASCII or Unicode Latin-based symbols and characters


1.2 Meta Characters

Regex operator that represent specific data-types within the ASCI or Unicode
character sets.


Syntax
Description
Note
Example


\
Converts qualifying ASCI or Unicode character into meta-characters.
WARNING: If you don't use this it will be considered a literal-character!


/^
Any new line


.
Any ASCI or Unicode Character.
WARNING: Within a character class, a . does not need to be escaped to be read as a literal character


\d
Any Digit 0-9


\w
Anything that is a word-character
A-Z, a-z, 0-9
\w\w -> Returns any sequent of two word-characters.


\W
Anything that is NOT a word-character


\s
Any white-space characters.
Space, Tab, and sometimes new-line
\s\s -> Returns any sequent of two white-space characters.


\S
Anything that is NOT white-space.
Space, Tab, and sometimes new-line


[a-z]
All character a-z.
When in a class, the - plays as an operator to reutrn a-z character argument values. See the character classes section for more details


2. Anchors / Positions

... are used to match the location of a literal character within your defined
search parameters.


Syntax
Description
Example


^
Used to look for a string value start
^test looks for all strings that start with the literal characters t, e, s, and t.


$
Used to look for literal characters that end with a specific value.
/test$/ looks for all strings that end with the literal characters t, e, s, and t.


3. Quantifiers -  Greedy and Lazy Match

... are a meta character that modify the pervious meta characters in a regular expression.

Based on your regex search parameters, how many of times do you want it to
match in a row?


Syntax
Description
Example


*
0 or more
/\d*/ -> returns all digits, period.


?
0 or 1
/test?t/ -> all combinations of test and testt where the second T is optional.


+
1 or more
/\d+/ -> returns all digits, of length 1 or more.


{min,max}
Range of number of times former argument must exist to qualify as a result.
\w{1,5} -> All word-character combinations with 1-5 characters followed by white-space.


{n}
Number of times the former argument must exist to qualify as a result.
\w{5}\s -> All word-character combinations with 5 character followed by white-space.


4. OR Operators

How to use OR arguments within a regex statement.
4.1 Character Classes / Bracket Expressions

... is one of the two OR operators, where arguments are placed inside of square-brackets [ ].


Syntax
Description
Notes
Example


[^argument]
NOT OR Operator, returns anything except for the argument in the Class
A carrot, ^, becomes a meta character when used at the preface of a class. Anywhere else and it becomes a literal character
[^0-5] -> anything not 0-5. [^a-c] -> Anything that is not between the letters a-c.


[.]
All literal character instances of a period, .
Within a class, does not need to be escaped to be read as a literal character.
[-.] -> looks for the literal characters - OR .


a[bc]de
All cases of abde AND/OR acde.


[letter-letter]
Any literal characters a-z based on character case.
A hyphen, - becomes a meta character when used between two literal characters of the same family within a class.
[a-c] -> return all characters between a and c. /\b[A-Za-z]{4}\b/ -> to match any 4-letter word with letter literal characters in it. /\b[A-Z][a-z]*\b/ -> to match any 0-or more letter word with letter literal characters in it starting with a capital letter. \b[\w]{4}\b -> All 4 letter words that contain any value used within words.


[number-number]
A range between two numbers
A hyphen, - becomes a meta character when used between two literal characters of the same family within a class.
[0-5]{3} -> All combinations of 3-digits where each unique digit is between 0 - 5


4.2 Alteration Classes / Grouping and Capturing

... is the second OR Operator, and is used with as an or operator to look
for grouped literal characters within parenthesis and separated by a vertical
bar ( arg1 | arg2 ).


Syntax
Description
Notes
Example


(arg1|arg2)
Return all instances where arg1 or arg2 exist.
This is how you search for very specific groups of literal characters.
/[\w.]+@\w+\.+(com|net|edu)/ -> Returns all email address that end with .net, .com, or.edu


5. Boundaries


Syntax
Description
Example


\b
A word boundry
All 4 letter words -> /\b\w{4}/ -> Looking at each word, look ALL word-character values of length 4. /\btest\b/ -> Returns a whole word search


6. Flags

... are used to classify specific search-case scenarios to you regex expression.
They can be combined or used individually as needed, and are added to the end
of your regex expression. /regex-pattern/flag


Syntax
Description
Example


g
Globally searching.
/[a-z]/g -> returns all letters within all content.


i
Case-insensitive searching
/[a-z]/gi -> returns ALL literal character despite case from A-Z and a-z


m
Multi-line searching.
/^\d/gm -> returns ALL initial digits within all lines


s
Dotall mode returns results with any literal character between them.


u
Enable Unicode support


y
Sticky mode allows you to search exact position within content


7. Back-references

… are used to synchronize pattern-group result parameters within a regex expression based on the pattern group you specify.


Syntax
Description
Example


\n
References n pattern group for what parameters to look for.
(['"])(.*?)\1 ran on the content `Testing: "my regex expression..."` -> returns "my regex expression...".


8. Look-ahead and Look-behind


Syntax
Description


(?=arg)
Lookahead returns what's immediately after arg


(?<=arg)
Lookbehind returns what's immediately before arg


Author

Erik Plachta
Thanks for taking the time to read this!

If you want to check out more of my work, head on over to my GitHub Page.

Contact Me

Do you want to get in touch?

Feel free to connect with me on my Twitter @ErikPlachta or LinkedIn @ErikPlachta


Resources and References

A collection of resources I used to learn about Regex.

Wikipedia - Regular Expressions
MDN - Regular Expressions
Microsoft - Regular Expression Language - Quick Reference
MDN - Regular expression syntax cheat sheet
Regular-Exprsesions.info
RexEgg.com
YouTube - The Coding Train - Introduction to Regular Expressions - Programming with Text
Regex tutorial — A quick cheat sheet by examples
zone.ni.com - Regular Expressions Components
javascript.info - Backreferences in pattern
rexegg.com - Mastering Lookahead and Lookbehind
Syntax	Description
`/`	Starting the regex expression
`\(?`	Left-parenthesis `(` if exists
`+`	Followed by...
`\d{3}`	A collection of 3 digits
`+`	Followed by...
`[)-. ]`	a right-parenthesis `)`, OR hyphen `-`, OR period `.`, OR a space
`+`	Followed by...
`\s?`	A white-space if it exists
`[- ]`	Hyphen OR a space
`+`	Followed by...
`\d{3}`	A collection of 3 digits
`+`	Followed by...
`[-. ]`	A hyphen, dash, or space
`+`	Followed by...
`\s?`	A white-space if it exists
`+`	Followed by...
`\d{4}`	A collection of 4 digits
`/`	Ending the regex expression
Symbol	Description
`/`	Starting the regex expression
`(`	Encapsulating a sub-expression
`[a-z0-9_\.-]`	Any alpha-numeric characters along with `_`, `.`, and `-`. Results must
`+)`	Ending sub-expression, and requiring it be followed by the next argument.
`@`	MUST include the `@` character.
`([\da-z\.-]+)`	Any digit or alpha-number character, followed by a `.` or `-`.
`([a-z\.]{2,6})`	Look for alpha-characters of any combination between 2-6 characters in length
`/`	Ending the regex expression
Syntax	Description
`a-z`	Any lower-case letters
`A-Z`	Any upper-case letters
`0-9`	Any digits
`\.`	A period character
`Unicode Characters`	There's a lot. here's an index -> Microsoft - Insert ASCII or Unicode Latin-based symbols and characters
Syntax	Description	Note	Example
`\`	Converts qualifying ASCI or Unicode character into meta-characters.	WARNING: If you don't use this it will be considered a literal-character!
`/^`	Any new line
`.`	Any ASCI or Unicode Character.	WARNING: Within a character class, a `.` does not need to be escaped to be read as a literal character
`\d`	Any Digit 0-9
`\w`	Anything that is a word-character	`A-Z`, `a-z`, `0-9`	`\w\w` -> Returns any sequent of two word-characters.
`\W`	Anything that is NOT a word-character
`\s`	Any white-space characters.	`Space`, `Tab`, and sometimes `new-line`	`\s\s` -> Returns any sequent of two white-space characters.
`\S`	Anything that is NOT white-space.	`Space`, `Tab`, and sometimes `new-line`
`[a-z]`	All character a-z.	When in a class, the `-` plays as an operator to reutrn a-z character argument values. See the character classes section for more details
Syntax	Description	Example
`^`	Used to look for a string value start	`^test` looks for all strings that start with the literal characters `t`, `e`, `s`, and `t`.
`$`	Used to look for literal characters that end with a specific value.	`/test$/` looks for all strings that end with the literal characters `t`, `e`, `s`, and `t`.
Syntax	Description	Example
*``**	0 or more	*`/\d/`** -> returns all digits, period.
`?`	0 or 1	`/test?t/` -> all combinations of `test` and `testt` where the second T is optional.
`+`	1 or more	`/\d+/` -> returns all digits, of length 1 or more.
`{min,max}`	Range of number of times former argument must exist to qualify as a result.	`\w{1,5}` -> All word-character combinations with 1-5 characters followed by white-space.
`{n}`	Number of times the former argument must exist to qualify as a result.	`\w{5}\s` -> All word-character combinations with 5 character followed by white-space.
Syntax	Description	Notes	Example
`[^argument]`	NOT OR Operator, returns anything except for the argument in the Class	A carrot, `^`, becomes a meta character when used at the preface of a class. Anywhere else and it becomes a literal character	`[^0-5]` -> anything not 0-5. `[^a-c]` -> Anything that is not between the letters a-c.
`[.]`	All literal character instances of a period, `.`	Within a class, does not need to be escaped to be read as a literal character.	`[-.]` -> looks for the literal characters `-` OR `.`
`a[bc]de`	All cases of `abde` AND/OR `acde`.
`[letter-letter]`	Any literal characters a-z based on character case.	A hyphen, `-` becomes a meta character when used between two literal characters of the same family within a class.	`[a-c]` -> return all characters between `a` and `c`. `/\b[A-Za-z]{4}\b/` -> to match any 4-letter word with letter literal characters in it. `/\b[A-Z][a-z]*\b/` -> to match any 0-or more letter word with letter literal characters in it starting with a capital letter. `\b[\w]{4}\b` -> All 4 letter words that contain any value used within words.
`[number-number]`	A range between two numbers	A hyphen, `-` becomes a meta character when used between two literal characters of the same family within a class.	`[0-5]{3}` -> All combinations of 3-digits where each unique digit is between 0 - 5
Syntax	Description	Example
`g`	Globally searching.	`/[a-z]/g` -> returns all letters within all content.
`i`	Case-insensitive searching	`/[a-z]/gi` -> returns ALL literal character despite case from A-Z and a-z
`m`	Multi-line searching.	`/^\d/gm` -> returns ALL initial digits within all lines
`s`	`Dotall mode` returns results with any literal character between them.
`u`	Enable Unicode support
`y`	Sticky mode allows you to search exact position within content
Syntax	Description
`(?=arg)`	Lookahead returns what's immediately after `arg`
`(?<=arg)`	Lookbehind returns what's immediately before `arg`