Skip to content

Instantly share code, notes, and snippets.

Last active October 1, 2021 18:24
Show Gist options
  • Save Bodheim/53c70274ae074d699b2f4ec762992e31 to your computer and use it in GitHub Desktop.
Save Bodheim/53c70274ae074d699b2f4ec762992e31 to your computer and use it in GitHub Desktop.
url tutorial

URL Search Tutorial

In this Regex tutorial, I'll be covering how to search a page for a url. I'll be breaking down each part of the expression and explain what each part does.


/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-])/?$/

This is a regex used to match a URL. This set of characters looks like it means nothing but it is infact a search pattern to find a url.

Table of Contents

Regex Components


^ and $ are both anchors. These define where the string of the regex starts and ends. ^ signifies the start of the string and & signifies the end. The string can be one of two things:

  1. Exact string match
  2. Range of possible matches

In this example this (https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/? is what is contained between our anchors


Quantifiers are used to set limits of the string or sections of it in a regex. In our example we use {2,6} to set our quantifiers. This means that section of the string can only be between 2 and 6 characters. This is the part of the url that is the .com, .org, .co part of the url.

Grouping Constructs

These () are grouping constructs used to seperate the groups of a regex. In our example we have four groups.

([\/\w \.-]*)

They are seperated by ?, \. and *\ because these discern the pattern for a url.

Bracket Expressions

These [] are square brackets. Anything inside these is used to represent a range of characters that need to match. Our url regex uses one exactly like this.


This means the url can have all of the lowercase alphanumeric characters.

Character Classes

A character class in a regex defines a set of characters, any one of which can occur in an input string to fulfill a match. in our url example, these would include:
. which matches any character (except the newline character \n which we don't use here)
\d which matches any arabic numeral digit. (this is equivalent to the bracket expression [0-9])

Character Escapes

This \ in regex means a character escape meaning the character following it will not be interpreted literally. For example this url regex uses the \ in front of periods so that they will be part of the search instead instead of taken literally.


Abby Castelow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment