Skip to content

Instantly share code, notes, and snippets.

@Ggariv
Last active January 29, 2023 20:54
Show Gist options
  • Save Ggariv/abb1fc81a2138e9fd55c65d5521e698c to your computer and use it in GitHub Desktop.
Save Ggariv/abb1fc81a2138e9fd55c65d5521e698c to your computer and use it in GitHub Desktop.
Regex - Tutorial

17 Computer Science for JavaScript: Regex Tutorial

Developers write code, but they also write about code. Take a moment to search the web for tutorials about any of the subjects you’ve learned so far in this course. You’re likely to find thousands of tutorials written by developers of all skill levels, but especially by junior developers—like you!

Your Challenge this week is to create a tutorial that explains how a specific regular expression, or regex, functions by breaking down each part of the expression and describing what it does.

Before you start, clone the starter code.

User Story

AS A web development student
I WANT a tutorial explaining a specific regex
SO THAT I can understand the search pattern the regex defines

Acceptance Criteria

GIVEN a regex tutorial
WHEN I open the tutorial
THEN I see a descriptive title and introductory paragraph explaining the purpose of the tutorial, a summary describing the regex featured in the tutorial, a table of contents linking to different sections that break down each component of the regex and explain what it does, and a section about the author with a link to the author’s GitHub profile
WHEN I click on the links in the table of contents
THEN I am taken to the corresponding sections of the tutorial
WHEN I read through each section of the tutorial
THEN I find a detailed explanation of what a specific component of the regex does
WHEN I reach the end of the tutorial
THEN I find a section about the author and a link to the author’s GitHub profile

What is a Regex?

A regex, which is short for regular expression, is a sequence of characters that defines a specific search pattern. When included in code or search algorithms, regular expressions can be used to find certain patterns of characters within a string, or to find and replace a character or sequence of characters within a string. They are also frequently used to validate input.

For example, the following regular expression can be used to verify that user input is a valid email address:

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

Each component of this regex has a unique responsibility to make sure that a user enters an email address that begins with an unspecified number of characters preceding the @ symbol, followed by a domain.

Before you get started, watch this introduction to regular expressions video and read this Regex Tutorial to learn how to identify the different components that make up a regex. If you need any additional help, there are many resources on the web. Feel free to do your own research to find one that can help you complete this Challenge.

Once you have a better understanding of what these different parts of a regular expression do, you’ll need to explain what they do for a specific regex.

You can choose one of the following regular expressions or you can choose one that you found on your own (with the exception of the one that was covered in the virtual classes, Matching a Username):

  • Matching a Hex Value – /^#?([a-f0-9]{6}|[a-f0-9]{3})$/

  • Matching an Email – /^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$/

  • Matching a URL – /^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

  • Matching an HTML Tag – /^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

Getting Started

Instead of creating a repository, you’ll publish a GitHub gist. GitHub describes a gist as a simple way to share code snippets with others. It’s also an ideal way to demonstrate a technique, teach a principle, or show off a solution. It functions just like a repository, and you’ll use Markdown to create it, just as you do with your READMEs. Gists can include code, images, links, and anything else you can include in a README.

After you’ve cloned the starter code, learn how to create a gist. You can also watch this video on how to use gists.

Note: Make sure to create a public gist.

The starter code is a template with a title, introductory paragraph, summary, and table of contents. The table of contents should link to sections of the tutorial that describe the functionality of each component in the regex. Be sure to rename the template to a unique name that describes your tutorial.

Note: The regular expression that you choose may not include all of the components outlined in the starter code. After you’ve finished your walkthrough, you can remove any sections that you didn’t use.

Each section that describes a component should include more than just one sentence explaining what it does. It should also include a code snippet of that particular component and some examples that meet the requirements of that component.

Important: Make revisions to your gist in the GitHub Gist UI. This will create a revision history that graders can use to verify that the tutorial content is yours.

Review

You are required to submit the following for review:

  • The URL of the GitHub gist. Give the gist a unique name.

© 2022 IntelSwift llc. brand. Confidential and Proprietary. All Rights Reserved.

Regex Tutorial

The following file is a brief tutorial about regex: its concept and specifics.

Summary

Regex or regular expressions allow users to check a string of characters for matches that specifies a search pattern in a text. Regex can use more than a single search patten can be use when searching a text.

For example, for searching an email, the employed search patterns are both word and non-word characters.

Regex code snippet example mm/dd/yyyy -> /^[0-1]?[0-2](\/|.|-)[0-3]?[0-9](\/|.|-)[1-9]\d{3}$/g

Result(s):

  • 12/01/1993
  • 12.01.1993
  • 12-01-1993

Table of Contents

Regex Components

Anchors

Anchors don't match any characters. Instead, anchors ensure that a regex matches a string at a specific place: the beginning or end of the string or end of a line, or on a word or non-word boundary.

Syntax: Start/End of Line

  • ^abc$ - ^: start / $: end of the string. These matches a position, not a character.
    • ^ Matches the beginning of the string or the beginning of a line.
    • $ Matches the end of the string or the end of a line.
    • Example: catastrophe was named wildcat
      • ^cat Search the "cat" AT THE BEGINNING of a word or sentence. -> catastrophe
      • cat$ Search the "cat" AT THE END of a word or sentence. -> wildcat

Syntax: Word Boundaries

  • \b \B - \b: word / \B: not-word boundary
    • \b Find the "designated pattern (word or character)" at either the beginning or end of a word.
    • \B Find the "designated pattern (word or character)" at the opposite of the beginning or end of a word.
    • Example: I went to the store and bought 5 apples, 4 oranges, and 15 plums.
      • \bt Search the "t" AT THE BEGINNING of a word. -> to & the
      • t\b Search the "t" AT THE END of a word. -> went & bought
      • \Bt Search the "t" that is not AT THE BEGINNING of a word. -> went, store & bought
      • t\B Search the "t" that is not AT THE END of a word. -> to, the & store

Quantifiers

Quantifiers indicate that the preceding string must be matched a certain number of times.

A quantifier can be greedy or lazy.

Syntax:

  • a? Matches the previous token between 0 and 1 times, as many times as possible, giving back as needed (greedy).
  • a+ Matches the previous token between 1 and unlimited times, as many times as possible, giving back as needed (greedy).
  • a* Matches the previous token (single or together) between 0 and unlimited times, as many times as possible, giving back as needed (greedy).
  • a{x} Matches exactly x consecutive a characters.
  • a{x,} Matches at least x consecutive a characters (greedy).
  • a{x,y} Matches between x and y (inclusive) consecutive a characters (greedy).
    • Example: a ba baa aaa ba b aaaa
      • ba? -> Match 1: ba / Match 2: baa / Match 3: ba / Match 4: b
      • ba+ -> Match 1: ba / Match 2: baa / Match 3: ba
      • ba* -> Match 1: ba / Match 2: baa / Match 3: ba / Match 4: b
      • a{3}* -> Match 1: aaa / Match 2: aaaa
      • a{2,}* -> Match 1: baa / Match 2: aaa Match 3: aaaa
      • a{2,3}* -> Match 1: baa / Match 2: aaa / Match 3: aaaa

OR Operator

The OR operator matches either what is BEFORE the "|" OR what is AFTER. It can be either a word or characters.

Syntax:

  • x|y Acts like a boolean OR. Matches the expression before (x) or after (y) the "|".
    • Example: ab cd ac bc
      • a|b -> Match 1: ab / Match 2: ab / Match 3: ac / Match 4: cb

Character Classes

Character classes match a character from a specific set. Certain classes are predefined while the user can also define his/her own sets.

Group Classes' Syntax:

  • [abc] Matches either an a, b or c character.
  • [^abc] Matches ANY CHARACTER EXCEPT FOR an a, b or c.
  • [a-z] Matches any character between a and z (including them).
  • [^a-z] Matches any character except those in the range a-z.
  • [a-zA-Z] Matches any characters between a-z or A-Z. Users can combine them as they please.
  • [0-9] Matches any number between 0 to 9.
  • [^x-y] Matches any non-number between the brackets.

Metacharacters' Syntax:

  • . Matches any character, except newline or line terminator.
  • \w Matches a word character (alphanumeric & underscore).
  • \W Matches any character that is not a word character (alphanumeric & underscore).
  • \d Matches any digit character (0-9)
  • \d Matches any non-digit character (0-9)

And so on...

Flags

Expression flags affect how the search expression is interpreted.

  • g : global flag -> finds all matches. Without it, only the first match is returned.
  • i : insensitive flag -> search is case-insensitive: no difference between A and a.
  • m : multiline flag -> it only affects the behavior of start ^ and end $.
  • u : unicode flag -> pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters.
  • s : single-line flag -> the dot special character (.) should additionally match the following line terminator ("newline") characters in a string, which it would not match otherwise
  • y : sticky flag -> perform a "sticky" search that matches starting at the current position in the target string (it does not attempt to match from any later indexes).

Grouping and Capturing

Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string.

Syntax:

  • (ABC) : Capturing group -> matches "ABC" and remembers the match.
  • (?:ABC): Non-capturing group -> matches "ABC" but does not remember (capture / assign) the match.
  • (?<name>ABC) : Named capturing group -> matches "ABC" and stores it on the groups property of the returned matches under the name specified by .
  • \n : "n" is a positive integer (number)
    • Example: match and capture capture ?
      • match and (capture) -> Match: match and capture capture / Group: capture
    • Example: match this match that
      • match this (?:match that) -> Match: match this match that / Group: none
    • Example: My name is Code
      • (?<name>Code) -> Match: Code / Group: Code

Bracket Expressions

A bracket expression enclosed in square brackets - [] - is a regular expression that can be used to match a single character or collating element.

Syntax:

  • [abcd] : matches any character in the square brackets.
  • [a-d] : matches any character in the range of characters separated by a hyphen (-).
  • [^abcd] : matches any character except those in the square brackets.
  • [.ab.] : matches a multi-character collating element.
  • [=a=] : matches all collating elements with the same primary sort order as that element, including the element itself.

Greedy and Lazy Match

  • 'Greedy' quantifiers matches as many characters as possible.
    • Example: greedy can be dangerous at times
      • /a.*a/ -> Match: an be dangerous a
  • 'Lazy' quantifiers matches as few characters as possible.
    • Example: greedy can be dangerous at times
      • /a.+?/ -> Match : an

Boundaries

A word boundary occurs:

  • At the beginning of the string if the first character is a word character.
  • After the last character in the string if the last character is a word character.
  • Between 2 characters in the string, where one is a word character and the other is not a word character.
    • Example: word boundaries
      • /\bw/ -> Match: word
      • /d\b/ -> Match: word

Back-references

Back-references match the same text as previously matched by a capturing group.

Suppose you want to match a pair of opening and closing HTML tags, and the text in between. By putting the opening tag into a backreference, we can reuse the name of the tag for the closing tag.

  • Example:
    • <([A-Z][0-9]*)\b[^>]*>.*?</\1>
    • This regex contains only one pair of parentheses, which capture the string matched by [A-Z][0-9]* -> This is the opening HTML tag.
    • The backreference \1 references the first capturing group. \1 matches the exact same text that was matched by the first capturing group.
    • The / before it is a literal character. It is simply the forward slash in the closing HTML tag that we are trying to match.

Look-ahead and Look-behind

Sometimes it is necessary to detect merely those matches for a pattern that are preceded and followed by another pattern.

Specific syntaxes are used to meet that goal. They are known as look-ahead and look-behind. Together they are called lookaround.

  • Positive look-ahead

    • Syntax: X(?=Y)
    • Meaning: "look for X, but match only if followed by Y"
    • Example: foobar foobaz
      • foo(?=bar) -> Match: foobar
  • Negative look-ahead

    • Syntax: X(?!Y)
    • Meaning: "search X, but only if not followed by Y"
    • Example: foobar foobaz
      • foo(?!bar) -> Match: foobaz
  • Positive look-behind

    • Syntax: (?<=Y)X
    • Meaning: "matches X, but only if there’s Y before it."
    • Example: foobar fuubar
      • (?<=foo)bar -> Match: foobar
  • Negative look-behind

    • Syntax: (?<!Y)X
    • Meaning: "matches X, but only if there’s no Y before it."
    • Example: not foo but foo
      • (?<!not )foo -> Match: but foo

Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment