Given a character of strings, we want to determine if the given string is a valid URL address. To do this in JavaScript would take many lines of code along with many conditional statements. Well, thank goodness we can get the job done with a little something called Regular Expressions. Regular Expressions or Regex for short, are a series of special characters that define a search pattern.
This is the Regex code snippet that is used to validate a URL:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/
. At first glance, this may look overwhelming, so
let's breakdown the code into smaller chunks to better understand what is happening.
- Anchors
- Quantifiers
- Grouping Constructs
- Bracket Expressions
- Character Classes
- The OR Operator
- Flags
- Character Escapes
The first thing you may notice about our snippet is that it is wrapped in (/
), front and back. This way of creating a
Regex is known as literal notation.
In our snippet, the characters ^
and $
are both considered anchors. Anchors match the position before or after
characters. The ^
anchor matches the beginning of text, in our case, /^(https)
and similarly, the $
anchor matches
the end of a text that precedes it, ie ([\/\w \.-]*)*\/?$/
we will come to that later. Our broken down Regex should
look like this: /^$/
so far. The rest of our Regex will sit between the ^
and the $
.
Quantifiers set the limits of the string that your regex matches, and they are as followed:
- * - 0 or more
- + - 1 or more
- ? - 0 or One
- {n} - Exact Amount
- {min, max} - Range of Amounts
In the first ()
of our Regex breakdown, we want to validate that a string's URL protocol begins with https:// or
http://. To accomplish this, we can use the question mark quantifier after the (s
), /^(https?)$/
to validate that
there is no (s
) in our pattern, or just one.
We haven't discussed this yet, but as you may have guessed, we can group patterns together to further breakdown our
string URL. The way we do this is with paranthesese ()
, so let's put our code snippet into groups in the same way as
we breakdown a URL string ((https://)(www.somename.co.us.)(com)(/stuff)). /^(https?://)?()()()$/
. Note: Like the (?
)
after the (s
) made the (s
) optional, similar is to be said about (?
) outside of the grouped pattern
(https?://)?
. The pattern in paranthesese are now optional.
Similar to how we can group our pattern in paranthesese, we can also use bracket expressions to specify a range of
characters that we want to match; for example. This would match all upper/lower case letters, [a-zA-Z]
. Let's add some
bracket expression to our code snippet. /^(https?://)?([a-z.-]+).([a-z.]{2, 6.})([/.-])$/
. Lets discuss briefly what
each group of code is doing while thinking about our URL breakdown. In our first group or our URL protocol:
/^(https?://)?
, we are validating that a pattern starts with an optional string of https:// or http://. Next, the
domain group: ([a-z.-]+).
uses bracket expression to match lower case letters a - z, a period, and a hyphen while the
(+
) quantifier validates that one or more of specified characters can be present, then the period validates that that
group ends with a literal (.
). Moving on to the top-level-domain group: ([a-z.]{2, 6.})
, we set a bracket expression
that matches let a-z and matches a (.
) along with a min/max quantifier that validates pattern is 2 or more characters
but not more than 6. Finally, in the path group: ([/.-]*)*/?$
. The bracket expression matches a slash, period, or a
hyphen and that can match 0 or more times while the dollar sign ($
) anchor validates pattern ends with an optional
forward-slash.
With our code breakdown snippet starting to look somewhat similar to our URL validation code snippet, let's discuss another useful tool in Regex called Character Classes. Character Classes defines a set of characters, any one of which can occur in an input string to fulfill a match. We've had some exposure to character classes when we discussed bracket expressions, now let's show some common Character Classes:
.
- matches any character except the newline character (\n).\d
- matches a digit (equal to[0-9]
)\w
- matches any word character (equal to[a-zA-Z0-9_]
\s
- matches a single whitespace character, including tabs and line breaks
Note: Each of the last three character classes can be changed to perform an inverse match by capitalizing the letter character. For example, \D matches a non-digit character.
Now we update our snippet breakdown with character classes: /^(https?://)?([\da-z.-]+).([a-z.]{2, 6})([/\w.-]*)*/?$/
By now you are probably noticing some things that are contradictory in our breakdown snippet like the (/
) or the
(.
). How do we know if we want to match any character or just a literal (.
)?. To do this in Regex, we can escape
characters with a (\
), so if wanted to match a literal (.
) we just put the back-slash before it (\.
). Now that
we are familiar with escaping special characters, let's update our snippet to escape any special characters to conclude
our URL validation code snippet. /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2, 6})([\/\w\.-]*)*\/?$/
.
Document was created by Demetri Dillard, a Jr. Developer and graduate of Trilogy Schools (Uninversity of Minnesota).
Check out my GitHub!