Skip to content

Instantly share code, notes, and snippets.

@the-vampiire
Forked from joker314/RegExp Tutorial.md
Created March 24, 2017 03:09
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save the-vampiire/fbc5c625d21dd193f2f9801c812b782b to your computer and use it in GitHub Desktop.
Save the-vampiire/fbc5c625d21dd193f2f9801c812b782b to your computer and use it in GitHub Desktop.

Ok, so, you might know about JavaScript regular expressions. Well, here is a tutorial about them, but written by a 13 year old, so it isn't actually any good!

Regular expressions go between / characters. Here is an exampe, /hi/.

Ok, now then. Let's learn how to match the string abc. Well, that's quite simple.

/abc/. Yey! So putting letters next to each other makes them match one after the other.

Ok, now, after the second / we can put a g to make it match globally, that is, we can extract abc from xyzabcghi.

/abc/g.

Great! Eh?

Character class

Basics

What if we want to match either an A, a B, or a C, but not all three one after the other? Well, if you put them into these square brackets ([]) then you create what's called a character class, which is a group of characters, where any of them could be matched! Great, right?

/[abc]/g, now that matches a, b, and c. Great, right?

Shortcuts.

They are great, aren't they?

Well, I want to match a single digit! This should be easy, we already know how to make a character class.

/[0123456789]/g. Done, right?

Well, yes, it works. But it's a bit long, isn't it?

I wish there was a way of saying "a number between 0 and 9". Well, it turns out there is! Yey!

/[0-9]/g. Wow, that's much shorter. What if I want to match a digit, or a decimal point? Well, we can do that! /[0-9.]g/.

Huh, that looks a bit weird? What happened. Well, remember that [0-9] means [0123456789] so [0-9.] means [0123456789.].

That makes sense.

Can I do that, but without using all the numbers. Let's say I have a regular expression, [34567]. How can we shorten that?

Well, [3-7] is the answer! Yey!

What about letters, can we do the alphabet? Yes! [a-z] WOW!

Case Insensitive

So, now that we're going for letters, we might want to be able to not care about whether a letter is uppercase or lowercase.

The way we do that, is by putting an i after the /. So, let's say we have /abc/g which matches abc ONLY. If we do /abc/gi/ (or /abc/ig/, it doesn't matter), then we can match

  • abc (still)
  • abC
  • aBc
  • aBC
  • Abc
  • AbC
  • ABc
  • ABC That's way more possibilities!

The Backslash \

Introduction

Never, ever, underestimate the backslash. What it does, is, it gives characters that don't have special meaning a special meaning, and take away the special meaning from characters that do.

Removing special meaning

Let's do a quick example! /abc\[/g matches "abc[". Usually, [ means the beggining of a character class, but not if you put a \ before it!

And \, it has a special meaning, so if you want to match the string "abc\[" then you need to escape both the \ and the [.

So, we get, /abc\\\[/g. abc for the abc, \\ for the \ and \[ for the [.

Adding special meaning

So, we have already shortened our digit-matching code to [0-9]. Can we get shorter? As it turns out, if you put \d then the d gets some special meaning! It means "digit".

Let's try it out /abc\d/ is the same as /abc[0-9]/. Isn't this great? I, personally, think this is.

Even cooler, if you make the d a capital letter, then it negates its meaning. So, for example, \d means digit, \D means NOT a digit.

  • \b, a word boundary, that is, the end or start of a string; or the point before or after a space character that must be before or after a word-character (see \w about word-characters). **Important: ** word boundaries are points of length zero where the change between words and word-boundaries occurs, and they don't match characters!
  • \B, anything that isn't a word boundary
  • \c..., it's complicated, and I wouldn't worry about it 😄. Note that this doesn't have a negative, and also that there are two characters after the backslash, which is unusual.
  • \d a digit, /[0-9]/ is the workaround
  • \D anything that isn't a digit
  • \f form feed. This is a character. It doesn't have a negative.
  • \n is a newline character. It's what seperates lines on most operating systems. Doesn't have a negative.
  • \r is a carrige return, it's a bit like the \n charater.
  • \s is a space-character, and it includes the tab character, the space character, the newline character, the carriage return character, and many more.
  • \S is everything that isn't a space character
  • \t is the tab character, you know, the one that takes out about 4 spaces worth of gap.
  • \v something called a "vertical tab". I know, right?
  • \w, a word-character! This is the same as /[a-z0-9_]/i (or, /[a-zA-Z0-9_]/).
  • \W everything that isn't a word character
  • \<number goes here> we'll cover these later!
  • \0 is a NUL character, which you shouldn't need to worry about.
  • there are a couple more, but we will cover those later.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment