In this Regex tutorial, I'll be covering how to search a page for a url. I'll be breaking down each part of the expression and explain what each part does.
/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-])/?$/
This is a regex used to match a URL. This set of characters looks like it means nothing but it is infact a search pattern to find a url.
^
and $
are both anchors. These define where the string of the regex starts and ends. ^
signifies the start of the string and &
signifies the end. The string can be one of two things:
- Exact string match
- Range of possible matches
In this example this (https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?
is what is contained between our anchors
Quantifiers are used to set limits of the string or sections of it in a regex. In our example we use {2,6}
to set our quantifiers. This means that section of the string can only be between 2 and 6 characters. This is the part of the url that is the .com, .org, .co part of the url.
These ()
are grouping constructs used to seperate the groups of a regex. In our example we have four groups.
(https?:\/\/)
([\da-z\.-]+)
([a-z\.]{2,6})
([\/\w \.-]*)
They are seperated by ?
, \.
and *\
because these discern the pattern for a url.
These []
are square brackets. Anything inside these is used to represent a range of characters that need to match. Our url regex uses one exactly like this.
[a-z\.]
This means the url can have all of the lowercase alphanumeric characters.
A character class in a regex defines a set of characters, any one of which can occur in an input string to fulfill a match. in our url example, these would include:
.
which matches any character (except the newline character \n which we don't use here)
\d
which matches any arabic numeral digit. (this is equivalent to the bracket expression [0-9]
)
This \
in regex means a character escape meaning the character following it will not be interpreted literally. For example this url regex uses the \
in front of periods so that they will be part of the search instead instead of taken literally.
Abby Castelow
https://github.com/Bodheim