Skip to content

Instantly share code, notes, and snippets.

@Brondchux
Last active September 7, 2021 03:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Brondchux/ba18a4b66b02b6b40d1429794fc09cad to your computer and use it in GitHub Desktop.
Save Brondchux/ba18a4b66b02b6b40d1429794fc09cad to your computer and use it in GitHub Desktop.
Regex for Beginners

Regex for Beginners 101

Do characters like (^"<?>%/-#/$) confuse you? If yes then this gist project was created for you. We will cover what Regular Expression (regex) is about starting from the basics. Excited to learn and understand regex? Let's get started!

Summary

A regular expression (shortened as regexp; also referred to as rational expression or regex) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.

Table of Contents

Regex Components

Anchors

Anchors assert that the engine's current position in the string matches a well-determined location: for instance, the beginning of the string, or the end of a line. See the following example:

let str = "Gospel";
console.log(/^G/.test(str));

Output

true;

The /^G/ match any text that starts with the letter J. It returns true


Quantifiers

Quantifiers match a number of instances of a character, group, or character class in a string.

Exact count {n}. A number in curly braces {n}is the simplest quantifier. When you append it to a character or character class, it specifies how many characters or character classes you want to match.

For example, the regular expression /\d{4}/ matches a four-digit number. It is the same as /\d\d\d\d/:

let str = "ECMAScript 2020";
let re = /\d{4}/;

let result = str.match(re);

console.log(result);

Output

["2020"];

Grouping Constructs

A part of a pattern can be enclosed in parentheses (...). This is called a “capturing group”.

That has two effects:

  1. It allows to get a part of the match as a separate item in the result array.
  2. If we put a quantifier after the parentheses, it applies to the parentheses as a whole.

For Example, email format is: name@domain. Any word can be the name, hyphens and dots are allowed. In regular expressions that’s [-.\w]+.

let regexp = /[-.\w]+@([\w-]+\.)+[\w-]+/g;

console.log("my@mail.com @ his@site.com.uk".match(regexp));

Output

// my@mail.com, his@site.com.uk

Bracket Expressions

Brackets indicate a set of characters to match. Any individual character between the brackets will match, and you can also use a hyphen to define a set.

"elephant".match(/[abcd]/); // -> matches 'a'

You will often see ranges of the alphabet or all numerals. [A-Za-z] [0-9] Remember that these character sets are case sensitive, unless you set the i flag.

"elephant".match(/[a-d]/); // -> matches 'a'
"elephant".match(/[A-D]/); // -> no match
"elephant".match(/[A-D]/i); // -> matches 'a'

Character Classes

A character class allows you to match any symbol from a certain character set. A character class is also called a character set. Suppose that you have a phone number like this:

"+1-(408)-555-0105";

Now, you can turn the phone number into a plain number as follows:

let phone = "+1-(408)-555-0105";
let re = /\d/g;

let numbers = phone.match(re);
let phoneNo = numbers.join("");

console.log(phoneNo); // -> 14085550105

The OR Operator

The OR operator, also known as alternation which is it's term in regular expression.

In a regular expression it is denoted with a vertical line character |

For instance, if we need to find the programming languages: HTML, PHP, Java or JavaScript.

The corresponding regexp: html | php | java(script) ?.

A usage example:

let regexp = /html|php|css|java(script)?/gi;

let str = "First HTML appeared, then CSS, then JavaScript";

console.log(str.match(regexp));

Output

// 'HTML', 'CSS', 'JavaScript'

Flags

Regular expressions may have flags that affect the search. There are only 6 of them in JavaScript:

"i"

With this flag the search is case-insensitive: no difference between A and a.

"g"

With this flag the search looks for all matches, without it – only the first match is returned.

"m"

Multiline mode (covered in the chapter Multiline mode of anchors ^ $, flag "m").

"s"

Enables “dotall” mode, that allows a dot . to match newline character \n.

"u"

Enables full Unicode support. The flag enables correct processing of surrogate pairs.

"y"

“Sticky” mode: searching at the exact position in the text.


Character Escapes

Let’s say we want to find literally a dot. Not “any character”, but just a dot. To use a special character as a regular one, prepend it with a backslash: \.

That’s also called “escaping a character”. For example:

console.log("Chapter 5.1".match(/\d\.\d/)); // 5.1 (match!)
console.log("Chapter 511".match(/\d\.\d/)); // null (looking for a real dot \.)

Parentheses are also special characters, so if we want them, we should use (. The example below looks for a string "g()":

console.log("function g()".match(/g\(\)/)); // "g()"

If we’re looking for a backslash , it’s a special character in both regular strings and regexps, so we should double it.

console.log("1\\2".match(/\\/)); // '\'

Closing Remark

Thank you for reading, I hope this article has improved your knowledge on regex 😃


Author Info

I'll love to meet you digitally and some day in-person. I can be reached via:

Name: Gospel Chukwu

Email: hello@gospelchukwu.com

Portfolio: www.gospelchukwu.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment