Skip to content

Instantly share code, notes, and snippets.

@rubiocode
Created August 21, 2021 19:41
Show Gist options
  • Save rubiocode/d7d36242c27561d0158d56bfdd5b0905 to your computer and use it in GitHub Desktop.
Save rubiocode/d7d36242c27561d0158d56bfdd5b0905 to your computer and use it in GitHub Desktop.
Regex tutorial

REGEX

Summary

Hello there! Did you know that every piece of information we enter into a web form gets validated by running verifications systems using regex? Regex is a special sequence of characters that must be matched in a string or document. By validating user input it can make our accounts secured, or make sure our Amazon packages get delivered to the correct address or even making sure our emails and passwords have the correct symbols, letters, and numbers in them. If you have encounter any of these most likely you have encounter regEx or regExp (Regular Expression).

Table of Contents

Regex Components

Anchors

  • Anchors: caret or hat '^' and dollar sign '$' are anchors that are used to match a text at the start '^' and end of a string '$'. ^Dogs are cooler than cats$ will match the beginning dogs and the end cats.

Quantifiers

  • Quantifiers-Fixed: Denoted with curly braces {}, allow us to indicate the exact quantity of a character we need to match instead of writing \w\w\w\w\w\w\w we could simply write \w{7} which would match the word monkeys. We could also do range of fixed quantifiers for example: \w {3,9} will match a minimum of 3 word characters and a maximum of 9 word character. We could aso use fixed quantifier to match a literal, for example squea{5}k match squeaaaaak.

  • Quantifiers-Optional: Denoted with a question mark ? Optional quantifiers allow us to indicate that a regex is options and that it may appear 0 times to 1 time. By using grouping () we can take advantage of the optional quantifier. For example, The dog played with a (broken )? toy -will match The dog played with a broken toy AS WELL AS The dog played with a toy.

Grouping Constructs

  • Grouping: Allow us to group parts of a regex together and limit alternation to that part regex only. Grouping uses (). For example, I love (NYC|LA)

Bracket Expressions

  • Character sets: Match a single character within the brackets. []. Be careful because [ dog ] matches d, o, g, but will not match the word dog.
  • Ranges: Allow us to specify a character range instead of typing each literal text. For example, [a, b, c, d, e, f, g] can be written [ a-g ]

Character Classes

  • Shorthand Character Classes: Sometimes writing ranges can be a cumbersome, to avoid writing so many ranges we can use shorthand character classes. There are 3 useful shorthand character classes: \w Wrod-Character: matches any character [ A-Za-z0-9_ ] , \d Digit-Character: matches any digit [ 0-9 ], \s Whitesace-Character: represents the regex range [ \t\r\n\f\v] matching a single space, tab, carriage return, line break, form feed or vertical tab. For example: \d\s\w\w\w\w matches 3 cats and matches 9 dogs and even matches 1 duck.

  • Negated Shorthand Character Classes: Opposite of shorthand character classes these characters will match any character that is NOT in the regular shorthand character classes. \W Non-Word-Character: [ ^ A-Za-z0-9_ ] Will match anything that is not within the \w. \D Non-Digit-Character: [ ^ 0-9 ] Will match anything that is not in the regex range \d. Lastly, \S Non-Whitespace Character: [ ^ \t\r\n\f\v] class matches anything that is not in the regex range \s.

The OR Operator

  • Alternation: Matching one set or another. OR |. For example vanilla|chocolate.

Literals

  • Literals: Matching the exact text. For example matching a in bananas.

Caret Symbol

  • Caret symbol: ^ . When placed in from of a character set the ^ negates the set, allowing it to match any character that IS NOT stated in the character set. For example, character set [ dog ] will match ONLY d, o, g but [ ^ dog ] will NOT match d, o, g and match any other letters instead.

Wildcard

  • Wildcard: Matches any single character in a piece of text. Represented by a dot . For example we can do ... and this will match cat, dog, air, eat, etc.

Kleene

  • Kleene Star: Denoted with an asterisk * and is a quantifier. This quantifier allows a character to be matched 0 times (meaning it doesnt have to appear), 1 time (can appear once) or be matched numerous times. For example: meo*w: matches mew, meow, meoooooooooooow.

  • Kleene plus: denoted with plus sign +. Matches the preceding chacacter 1 or more times. For example ho+t matches: hot and hoot as well as hooooot.

Character Escapes

  • Escape character: \ This will escape a character in the regex in order. For example, What time is it? - will match What time is it?

Time to Practice

Let's practice what we learned by analyzing the following example:

Matching a Hex Value: /^#?([a-f0-9]{6}|[a-f0-9]{3})$/

  1. First and foremost we take a deep breath in and try not to panic, then proceed to break the regex expression down section by section.

  2. We then find the use a literal to match the forward slash / at the beginning and at the end of the string /^#?([ a-f0-9]{6}|[ a-f0-9]{3})$/

  3. We use the anchor hat/caret character ^ at the beginning of the hex value to match the beginning of the string. Notice at the end of the string we also have $ to match the end of the string exactly. /^#?([ a-f0-9]{6}|[ a-f0-9]{3})$/

  4. Next: #? is an optional quantifier stating that # would either be there or not. /^#?([ a-f0-9]{6}|[ a-f0-9]{3})$/

  5. Next we have character grouped using parenthesis () /^#?([ a-f0-9]{6}|[ a-f0-9]{3})$/

  6. Inside the parenthesis, we have a range [ a-f0-9 ] along with a fixed quantifier {6}: [ a-f0-9 ]{6} this just means we want the range to be from a TO f and from 0 TO 9 TIMES 6. /^#?([ a-f0-9]{6}|[ a-f0-9]{3})$/

  7. After the fixed quantifier we find the alternation symbol | meaning is one OR the other character set. /^#?([ a-f0-9]{6}|[ a-f0-9]{3})$/

  8. After the | symbol we find another character set [ a-f0-9 ]{3}: in this case is still the same we want a To f and 0 TO 9 but in this case we want it to be TIMES 3 instead of 6. /^#?([ a-f0-9]{6}|[ a-f0-9]{3})$/

  9. Last we close the group with a closing parenthesis.
    /^#?([ a-f0-9]{6}|[ a-f0-9]{3})$/

  10. Voila! We just deciphered the regex expression without breaking a sweat! Great Job!

Want to learn more about Regular Expressions? Check out this cool page:

Author

Rubidia Rubio @rubiocode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment