Meechlouch/gist-template.md

## gist-template.md

      
    Raw
  

              gist-template.md
            
          
    URL Validation using Regex.

Given a character of strings, we want to determine if the given string is a valid URL address. To do this in JavaScript
would take many lines of code along with many conditional statements. Well, thank goodness we can get the job done with
a little something called Regular Expressions. Regular Expressions or Regex for short, are a series of special
characters that define a search pattern.
Summary

This is the Regex code snippet that is used to validate a URL:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/. At first glance, this may look overwhelming, so
let's breakdown the code into smaller chunks to better understand what is happening.
Table of Contents


Anchors
Quantifiers
Grouping Constructs
Bracket Expressions
Character Classes
The OR Operator
Flags
Character Escapes

Regex Components

The first thing you may notice about our snippet is that it is wrapped in (/), front and back. This way of creating a
Regex is known as literal notation.
Anchors

In our snippet, the characters ^ and $ are both considered anchors. Anchors match the position before or after
characters. The ^ anchor matches the beginning of text, in our case, /^(https) and similarly, the $ anchor matches
the end of a text that precedes it, ie ([\/\w \.-]*)*\/?$/ we will come to that later. Our broken down Regex should
look like this: /^$/ so far. The rest of our Regex will sit between the ^ and the $.
Quantifiers

Quantifiers set the limits of the string that your regex matches, and they are as followed:

* - 0 or more
+ - 1 or more
? - 0 or One
{n} - Exact Amount
{min, max} - Range of Amounts

In the first () of our Regex breakdown, we want to validate that a string's URL protocol begins with https:// or
http://. To accomplish this, we can use the question mark quantifier after the (s), /^(https?)$/ to validate that
there is no (s) in our pattern, or just one.
Grouping Constructs

We haven't discussed this yet, but as you may have guessed, we can group patterns together to further breakdown our
string URL. The way we do this is with paranthesese (), so let's put our code snippet into groups in the same way as
we breakdown a URL string ((https://)(www.somename.co.us.)(com)(/stuff)). /^(https?://)?()()()$/. Note: Like the (?)
after the (s) made the (s) optional, similar is to be said about (?) outside of the grouped pattern
(https?://)?. The pattern in paranthesese are now optional.
Bracket Expressions

Similar to how we can group our pattern in paranthesese, we can also use bracket expressions to specify a range of
characters that we want to match; for example. This would match all upper/lower case letters, [a-zA-Z]. Let's add some
bracket expression to our code snippet. /^(https?://)?([a-z.-]+).([a-z.]{2, 6.})([/.-])$/. Lets discuss briefly what
each group of code is doing while thinking about our URL breakdown. In our first group or our URL protocol:
/^(https?://)?, we are validating that a pattern starts with an optional string of https:// or http://. Next, the
domain group: ([a-z.-]+). uses bracket expression to match lower case letters a - z, a period, and a hyphen while the
(+) quantifier validates that one or more of specified characters can be present, then the period validates that that
group ends with a literal (.). Moving on to the top-level-domain group: ([a-z.]{2, 6.}), we set a bracket expression
that matches let a-z and matches a (.) along with a min/max quantifier that validates pattern is 2 or more characters
but not more than 6. Finally, in the path group: ([/.-]*)*/?$. The bracket expression matches a slash, period, or a
hyphen and that can match 0 or more times while the dollar sign ($) anchor validates pattern ends with an optional
forward-slash.
Character Classes

With our code breakdown snippet starting to look somewhat similar to our URL validation code snippet, let's discuss
another useful tool in Regex called Character Classes. Character Classes defines a set of characters, any one of which
can occur in an input string to fulfill a match. We've had some exposure to character classes when we discussed bracket
expressions, now let's show some common Character Classes:

. - matches any character except the newline character (\n).
\d - matches a digit (equal to [0-9])
\w - matches any word character (equal to [a-zA-Z0-9_]
\s - matches a single whitespace character, including tabs and line breaks

Note: Each of the last three character classes can be changed to perform an inverse match by capitalizing the letter
character. For example, \D matches a non-digit character.
Now we update our snippet breakdown with character classes: /^(https?://)?([\da-z.-]+).([a-z.]{2, 6})([/\w.-]*)*/?$/
Character Escapes

By now you are probably noticing some things that are contradictory in our breakdown snippet like the (/) or the
(.). How do we know if we want to match any character or just a literal (.)?. To do this in Regex, we can escape
characters with a (\), so if wanted to match a literal (.) we just put the back-slash before it (\.). Now that
we are familiar with escaping special characters, let's update our snippet to escape any special characters to conclude
our URL validation code snippet. /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2, 6})([\/\w\.-]*)*\/?$/.
Author

Document was created by Demetri Dillard, a Jr. Developer and graduate of Trilogy Schools (Uninversity of Minnesota).
Check out my GitHub!