Skip to content

Instantly share code, notes, and snippets.

@kaylaanngrace
Last active May 18, 2022 17:53
Show Gist options
  • Save kaylaanngrace/6f032cd43c0ea9ced5871a234ae9540d to your computer and use it in GitHub Desktop.
Save kaylaanngrace/6f032cd43c0ea9ced5871a234ae9540d to your computer and use it in GitHub Desktop.
Regular Expression, URL, breakdown
# urlRegexTutorial-ByMakWils
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?$/
This gist describes the compentents of matching a URL. Matching a URL is considered a regular expression or regex for short. A regex is a sequence of characters that defines a specific search pattern.
## Summary
Matching a URL is considered a regular expression or regex for short. A regex is a sequence of characters that defines a specific search pattern.
The following regex will match any valid URL:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
## Table of Contents
- [Anchors](#anchors)
- [Quantifiers](#quantifiers)
- [Character Classes](#character-classes)
- [Grouping and Capturing](#grouping-and-capturing)
- [Bracket Expressions](#bracket-expressions)
## Regex Components
The expression may seem obscure at first, this tutorial will break it down in order to better understand the regex.
### Anchors
Anchors match a position within a string, not a character.
'^' - This anchor matches the beginning of a string.
'$' - This anchor matches the end of a string.
### Quantifiers
Quantifiers indicate that the preceding token must be matched a certain number of times. Quantifiers, by default, are greedy meaning they will match as many characters as possible.
This regex has 4 quantifiers preceding the 4 capturing groups
'?' - This quantifier matches between 0 or 1 of the preceding token.
'*' - This is a quantifier that matches 0 or more of the preceding token.
'{2,6}' - This quantifier matches between 2 and 6 of the preceding token.
'+' - This is a quantifier that matches 1 or more of the preciding token.
### Character Classes
'\d' - This is a digit token and will match any digit character (0-9).
'\w' - This is a word token, which will match any word character, including alphanumetic and underscore.
'a-z' - This is a range and matches a character between the range of "a" to "z" and is case sensitive.
'\.' - This is an escaped character, which will match a "." character.
'\/' - This is an escaped character, whichh will match a "/" character.
'-' - This is a character. This matchesx a "-" character.
### Grouping and Capturing
Groups allow you to combine a sequence of tokens to handle them together.
'()' - Parentheses group multiple tokens together and create a capture group for extracting a substring.
This regex has 4 capturing groups.
1. (https?:\/\/) - the "h", "t", "t", "p", "s" ":" are literals. This will match the literal characters h, t, t, p, s and :
2. ([\da-z\.-]+) - this the domain name
3. ([a-z\.]{2,6}) - this is the top level domain ie .com, .gov, etc
4. ([\/\w \.-]*)- this the file path
Each part of the capturing groups are further described in this gist.
### Bracket Expressions / OR Operator
'[]' - brackets are character sets and will match any character or character class in the set.
### Greedy and Lazy Match
The quantifiers ( * + {}) are greedy operators, so they expand the match as far as they can through the provided values.
Using a (?) quanifier is considered a lazy operators.
## Author
Makayla Wilson
https://github.com/kaylaanngrace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment