Last active
May 18, 2022 17:53
-
-
Save kaylaanngrace/6f032cd43c0ea9ced5871a234ae9540d to your computer and use it in GitHub Desktop.
Regular Expression, URL, breakdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# urlRegexTutorial-ByMakWils | |
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?$/ | |
This gist describes the compentents of matching a URL. Matching a URL is considered a regular expression or regex for short. A regex is a sequence of characters that defines a specific search pattern. | |
## Summary | |
Matching a URL is considered a regular expression or regex for short. A regex is a sequence of characters that defines a specific search pattern. | |
The following regex will match any valid URL: | |
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/ | |
## Table of Contents | |
- [Anchors](#anchors) | |
- [Quantifiers](#quantifiers) | |
- [Character Classes](#character-classes) | |
- [Grouping and Capturing](#grouping-and-capturing) | |
- [Bracket Expressions](#bracket-expressions) | |
## Regex Components | |
The expression may seem obscure at first, this tutorial will break it down in order to better understand the regex. | |
### Anchors | |
Anchors match a position within a string, not a character. | |
'^' - This anchor matches the beginning of a string. | |
'$' - This anchor matches the end of a string. | |
### Quantifiers | |
Quantifiers indicate that the preceding token must be matched a certain number of times. Quantifiers, by default, are greedy meaning they will match as many characters as possible. | |
This regex has 4 quantifiers preceding the 4 capturing groups | |
'?' - This quantifier matches between 0 or 1 of the preceding token. | |
'*' - This is a quantifier that matches 0 or more of the preceding token. | |
'{2,6}' - This quantifier matches between 2 and 6 of the preceding token. | |
'+' - This is a quantifier that matches 1 or more of the preciding token. | |
### Character Classes | |
'\d' - This is a digit token and will match any digit character (0-9). | |
'\w' - This is a word token, which will match any word character, including alphanumetic and underscore. | |
'a-z' - This is a range and matches a character between the range of "a" to "z" and is case sensitive. | |
'\.' - This is an escaped character, which will match a "." character. | |
'\/' - This is an escaped character, whichh will match a "/" character. | |
'-' - This is a character. This matchesx a "-" character. | |
### Grouping and Capturing | |
Groups allow you to combine a sequence of tokens to handle them together. | |
'()' - Parentheses group multiple tokens together and create a capture group for extracting a substring. | |
This regex has 4 capturing groups. | |
1. (https?:\/\/) - the "h", "t", "t", "p", "s" ":" are literals. This will match the literal characters h, t, t, p, s and : | |
2. ([\da-z\.-]+) - this the domain name | |
3. ([a-z\.]{2,6}) - this is the top level domain ie .com, .gov, etc | |
4. ([\/\w \.-]*)- this the file path | |
Each part of the capturing groups are further described in this gist. | |
### Bracket Expressions / OR Operator | |
'[]' - brackets are character sets and will match any character or character class in the set. | |
### Greedy and Lazy Match | |
The quantifiers ( * + {}) are greedy operators, so they expand the match as far as they can through the provided values. | |
Using a (?) quanifier is considered a lazy operators. | |
## Author | |
Makayla Wilson | |
https://github.com/kaylaanngrace |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment