the-vampiire/RegExp Tutorial.md

## RegExp Tutorial.md

      
    Raw
  

              RegExp Tutorial.md
            
          
    Ok, so, you might know about JavaScript regular expressions. Well, here is a tutorial about them, but written by a 13 year old, so it isn't actually any good!
Regular expressions go between / characters. Here is an exampe, /hi/.
Ok, now then. Let's learn how to match the string abc. Well, that's quite simple.
/abc/. Yey! So putting letters next to each other makes them match one after the other.
Ok, now, after the second / we can put a g to make it match globally, that is, we can extract abc from xyzabcghi.
/abc/g.
Great! Eh?
Character class

Basics

What if we want to match either an A, a B, or a C, but not all three one after the other? Well, if you put them into these square brackets ([]) then you create what's called a character class, which is a group of characters, where any of them could be matched! Great, right?
/[abc]/g, now that matches a, b, and c. Great, right?
Shortcuts.

They are great, aren't they?
Well, I want to match a single digit! This should be easy, we already know how to make a character class.
/[0123456789]/g. Done, right?
Well, yes, it works. But it's a bit long, isn't it?
I wish there was a way of saying "a number between 0 and 9". Well, it turns out there is! Yey!
/[0-9]/g. Wow, that's much shorter. What if I want to match a digit, or a decimal point? Well, we can do that! /[0-9.]g/.
Huh, that looks a bit weird? What happened. Well, remember that [0-9] means [0123456789] so [0-9.] means [0123456789.].
That makes sense.
Can I do that, but without using all the numbers. Let's say I have a regular expression, [34567]. How can we shorten that?
Well, [3-7] is the answer! Yey!
What about letters, can we do the alphabet? Yes! [a-z] WOW!
Case Insensitive

So, now that we're going for letters, we might want to be able to not care about whether a letter is uppercase or lowercase.
The way we do that, is by putting an i after the /. So, let's say we have /abc/g which matches abc ONLY. If we do /abc/gi/ (or /abc/ig/, it doesn't matter), then we can match

abc (still)
abC
aBc
aBC
Abc
AbC
ABc
ABC
That's way more possibilities!

The Backslash \

Introduction

Never, ever, underestimate the backslash. What it does, is, it gives characters that don't have special meaning a special meaning, and take away the special meaning from characters that do.
Removing special meaning

Let's do a quick example! /abc\[/g matches "abc[". Usually, [ means the beggining of a character class, but not if you put a \ before it!
And \, it has a special meaning, so if you want to match the string "abc\[" then you need to escape both the \ and the [.
So, we get, /abc\\\[/g. abc for the abc, \\ for the \ and \[ for the [.
Adding special meaning

So, we have already shortened our digit-matching code to [0-9]. Can we get shorter? As it turns out, if you put \d then the d gets some special meaning! It means "digit".
Let's try it out /abc\d/ is the same as /abc[0-9]/. Isn't this great? I, personally, think this is.
Even cooler, if you make the d a capital letter, then it negates its meaning. So, for example, \d means digit, \D means NOT a digit.

\b, a word boundary, that is, the end or start of a string; or the point before or after a space character that must be before or after a word-character (see \w about word-characters). **Important: ** word boundaries are points of length zero where the change between words and word-boundaries occurs, and they don't match characters!
\B, anything that isn't a word boundary
\c..., it's complicated, and I wouldn't worry about it 😄. Note that this doesn't have a negative, and also that there are two characters after the backslash, which is unusual.
\d a digit, /[0-9]/ is the workaround
\D anything that isn't a digit
\f form feed. This is a character. It doesn't have a negative.
\n is a newline character. It's what seperates lines on most operating systems. Doesn't have a negative.
\r is a carrige return, it's a bit like the \n charater.
\s is a space-character, and it includes the tab character, the space character, the newline character, the carriage return character, and many more.
\S is everything that isn't a space character
\t is the tab character, you know, the one that takes out about 4 spaces worth of gap.
\v something called a "vertical tab". I know, right?
\w, a word-character! This is the same as /[a-z0-9_]/i (or, /[a-zA-Z0-9_]/).
\W everything that isn't a word character
\<number goes here> we'll cover these later!
\0 is a NUL character, which you shouldn't need to worry about.
there are a couple more, but we will cover those later.