Skip to content

Instantly share code, notes, and snippets.

@LunaRossie
Last active September 21, 2022 05:38
Show Gist options
  • Save LunaRossie/8c93b544ff2b57e875326212d07797fc to your computer and use it in GitHub Desktop.
Save LunaRossie/8c93b544ff2b57e875326212d07797fc to your computer and use it in GitHub Desktop.
Regex Tutorial
# Regex Tutorial
Regex or Regular expressions are patterns used to match character combinations in strings. In this tutorial, we will be taking a brief look into what Regex (regular expressions) are and how they work. While regular expressions may seem to be overwhelming at first glance,
just like with any language that can be broken down into its most simple parts and easily understood. A Regular expressions is a sequence of characters that defines a specific search pattern. They are often used to validate input data
## Summary
Regex (short for regular expression) is a string of text that allows you to create search patterns that match, manage, and locate text. An example code snippet of regex shows as following:
```
/[\w._%+-]+@[\w.-]+\.[a-zA-z]{2,4}/
```
* A regular expression used to match an e-mail address
Regular expressions can also be used from the command line and within text-editors. To determine each of the components on how they work.
## Table of Contents
- [Anchors](#anchors)
- [Quantifiers](#quantifiers)
- [Character Classes](#character-classes)
- [Grouping and Capturing](#grouping-and-capturing)
- [Greedy and Lazy Match](#greedy-and-lazy-match)
- [Bracket Expressions](#bracket-expressions)
- [Boundaries](#boundaries)
## Regex Components
### Anchors
Anchors are characters within the regular expression that allow the user to match strings that begin with or ends with (or both) certain characters.
Examples of Anchors are as follows:
* `^` - matches any string that start with the anterior word
* `$` - matches a string that end with preceeding word before the character
* Examples:
```
^Hello matches any string starting with `Hello`
World$ matches any string ending with `World`
^Hello World$ matches exact string
goodbye matches any string that has the exact text `goodbye` in it
```
### Quantifiers
Quantifiers communicate to the regex engine that it must match the quantity of the character or expression to its left. These are the quatifiers that are used in regex:
?, +, *, {n}, {n, }, {n,m}
In the URL matching regex they are used in the following places:
https? Matches 'https', 'http'
[\da-z\.-]+ Matches a single digit, group of letters (a-z), dot (.) or hyphen (-) 1 or more times
[a-z\.]{2,6} Matches 2 to 6 copies of the sequence [a-z\.]
[\/\w \.-]* Matches '/', '.', '-', 'www', '//'
### Character Classes
Character Classes (Character Set) tells the regex engine to match only one out serveral specific characters, such as digits, words, or whitespace
Examples of Character Classes are as follows:
* `\d` - matches a single character that is a digit
* `\w` - matches a word character (any alphanumeric character plus underscore)
* `\s` - matches a whitespace character (including tabs and line brakes)
* `.` - matches any character
* the capital case for any aformentioned characters will inverse the match
* Examples:
```
\d matches a single any digit 0-9
\w matches a single any character that is a-z
\s matches ` `
. matches any character
\D matches a single non-digit character
\W matches a single any non-character that is a-z
\S matches a single non-` `
```
### Grouping and Capturing
The use of grouping expressions is to allow for the extraction of the characters of a given group for validation. The text between paranthesis is a group.
(https?:\/\/) Matches: ' ', 'https://', 'http://'
([\da-z\.-]+) Matches: 'ab.c-7', 'ab'
([a-z\.]{2,6}) Matches: 'ab.', '.ca'
([\/\w \.-]*) Matches: '/', '/ab.'
### Greedy and Lazy Match
Greedy and/or Lazy Matching are quantifies that expand the match as far as possible through the text.
Examples of Greedy and/or Lazy Matching are as follows:
* `* + {}` - any one of these character can be used as a quanitifer for a Greedy or Lazy Match
* Examples:
```
<.+?> matches any character that is one or more times included inside `<` and `>`, and expands as needed.
<[^<>]+> matches any character expects `<` or `>` one or more times included inside `<` and `>`.
```
### Bracket Expressions
Bracket expressions are used between brackets. In this example, we have the following Bracket Expressions.
[\da-z\.-]
[a-z\.]
[\/\w \.-]
### Boundaries
Not to be confused with actual characters, simply put, Boundaries are the places between characters. A Boundary should be thought of as a wall between any adjacent characters.
There are two types of Boundaries, **Word** and ***Non-Word**, each denoted by a specific character.
Examples of Boundaries are as follows:
* `\b` - A position that bounds a word, or where a word starts or ends. It denotes a place between a word and non-word character, at the start and end of a string.
* `\B` - Exact opposite of a word boundary, the negation of `\b` and will match **any position a word boundary doesnt.** *
* `*`Will match between a word and word character, as well as between a non-word and non-word character.
* Examples of Boundaries are as follows:
```
`Hello World` has 12 total Boundaries with 8 Word Boundaries as seen below:
|H|e|l|l|o| |W|o|r|l|d|
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
N W W W W N N W W W W N - N = Nonword Boundary \ W = Word Boundary
\bxyz\b matches a "whole words only search" for the string `xyz`
\Bxyz\B matches only if the pattern is fully surrounded by word characters `txyzt` would match the string `xyz` because it only has word boundaries
```
## Author
Feel free to check out my GitHub repo
[GitHub Profile](https://github.com/LunaRossie)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment