Skip to content

Instantly share code, notes, and snippets.

@Bodheim
Last active October 1, 2021 18:24
Show Gist options
  • Save Bodheim/53c70274ae074d699b2f4ec762992e31 to your computer and use it in GitHub Desktop.
Save Bodheim/53c70274ae074d699b2f4ec762992e31 to your computer and use it in GitHub Desktop.
url tutorial

URL Search Tutorial

In this Regex tutorial, I'll be covering how to search a page for a url. I'll be breaking down each part of the expression and explain what each part does.

Summary

/^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-])/?$/

This is a regex used to match a URL. This set of characters looks like it means nothing but it is infact a search pattern to find a url.

Table of Contents

Regex Components

Anchors

^ and $ are both anchors. These define where the string of the regex starts and ends. ^ signifies the start of the string and & signifies the end. The string can be one of two things:

  1. Exact string match
  2. Range of possible matches

In this example this (https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/? is what is contained between our anchors

Quantifiers

Quantifiers are used to set limits of the string or sections of it in a regex. In our example we use {2,6} to set our quantifiers. This means that section of the string can only be between 2 and 6 characters. This is the part of the url that is the .com, .org, .co part of the url.

Grouping Constructs

These () are grouping constructs used to seperate the groups of a regex. In our example we have four groups.

(https?:\/\/)
([\da-z\.-]+)
([a-z\.]{2,6})
([\/\w \.-]*)

They are seperated by ?, \. and *\ because these discern the pattern for a url.

Bracket Expressions

These [] are square brackets. Anything inside these is used to represent a range of characters that need to match. Our url regex uses one exactly like this.

[a-z\.]

This means the url can have all of the lowercase alphanumeric characters.

Character Classes

A character class in a regex defines a set of characters, any one of which can occur in an input string to fulfill a match. in our url example, these would include:
. which matches any character (except the newline character \n which we don't use here)
\d which matches any arabic numeral digit. (this is equivalent to the bracket expression [0-9])

Character Escapes

This \ in regex means a character escape meaning the character following it will not be interpreted literally. For example this url regex uses the \ in front of periods so that they will be part of the search instead instead of taken literally.

Author

Abby Castelow
https://github.com/Bodheim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment