#JavaScript - Regular Expressions (Regex)
Regular expressions are a language for describing patterns in string data. It is available in many languages including JavaScript.
Regular expressions are denoted by slashed (/) instead of quotes. ie /Hello/. Also Regular expressions are objects in JavaScript and have a number of methods including test which returns true or false if the pattern is found.
###Search Search return the index of the start of the text (like IndexOf), if it is found. Remember, the first location (ie h) is 0. An result of -1 indicates that the Regex was not found.
"hello there".search(/th/);
//6
"hello there".search(/zz/);
//-1
If you want to search for something containing a slash then you need to escape it or it will be taken to be the end of the string. For example if the are looking for the expression "a/c" we could express it as /a/c/. Most punctuation characters (ie {,*, etc) should be escaped.
var searchText = /a\/c/
"The following a/c's are in the list".search(searchText);
//14
###Sets of Characters
If we want to search for something or something else we can do it with enclosing them in [].
var searchText = /[JS]]/
"The person called John is also called Smith".search(searchText);
//18 1st instance ie L
You can also use dot (.) to indicate any character that is not a line break character. An escaped d (\d) indicates any digit. an escaped w (\w) indicates any word (ie alphanumeric and underscore). An escaped s (\s) matches any whitespace character (ie space, tab or newline).
You can replace \d, \w and \s characters with their capitals to negate their meanings. For example \S mathces any character which is not a whitespace character. You can invert a patterns by starting with ^ and putting it in []. For example:
var searchText = /[^ABC]]/
"ABCBACCBBADABC".search(searchText);
//10 because D is the first character not to be either A,B or C
###Word and String Boundaries The character ^ indicates the start of the string and $ indicates the end.
/a/.test("blah");
//true
/^a$/.test("blah");
//false because a would have to be at the start and end of the string.
/^a$/.test("a");
//true
The \b escape character denotes the word boundry, which can be a punctuation, whitespace or the start or end of a string.
/\bdog\b/.test("Our dog is the best doggy around");
//true
/\bdog\b/.test("Our doggy is the best doggy around");
//false becuase there has to be a word boundry each side of the word dog
###Repeating Patterns It is also possible to catch repeating patterns bu putting the asterix (*) after the character. The plus (+) sign is similar but differs in that it requires the pattern to occur at least once. The question mark (?) means that the element can appear zero or one time, which means that it is optional.
var parenthethicText = /\(.*\)/;
"Its (the sloth's) claws were gigantic!".search(parenthethicText);
//4 because it accepts any number of characters between two parentheses
Remember that .* will match any number of characters. You can also use curley braces to specify an exact number {5} ie .{5} will match any 5 character string.
/.{5}/.test("doggy");
//true
If two numbers are specified ie {3,5} then the first number is the minimum number of times the pattern must exist and the second number is the maximum. Following this {3,} means that it must occur three or more times. {,5} means that it can only occur up to five times.
###Subexpressions
It is possible to use special characters line * and + on more than one character in the regular expression. Note: you can apply options such as i (case insensitive) after the rexular expression.
var cartoonCrying = /boo(hoo+)+/i;
cartoonCrying.test("Boohoooohoohooo");
//true
cartoonCrying.test("BOOooooHOoHooo");
//true because the i indicates that the pattern is case insensitive
###Or
There are times when you will want to see a defined number of permutaions.
var personName = /\b(Mr.|Mrs.|Ms.) (Smith|Doe)\b/i;
personName.test("Ms. Doe");
//true
###Matching
If a .match find the pattern then it will return it. If it cannot find it then it returns null.
"foo".match(/bar/i);
//null - No match
"foobar".match(/bar/i);
//bar - Match found
###Replace
You can also use .replace to replace the match text in the string with another string element. Note: the g option means global and it means replace all instances of the pattern and not just the first
"Foobar is the best bar in town.".replace(/bar/g, "buzz");
//"Foobuzz is the best buzz in town."
"Foobar is the best bar in town.".replace(/bar/, "buzz");
//"Foobuzz is the best bar in town."
###Summary
<tr><td colspan="4"></td></tr>
<tr><td><strong>Options:</strong></td><td colspan="3">i case insensitive, m make dot match newlines, x ignore whitespace in regex, o perform #{...} substitutions only once
</td></tr>
[abc] | A single character of: a, b or c | . | Any single character |
[^abc] | Any single character except: a, b, or c | \s | Any whitespace character |
[a-z] | Any single character in the range a-z | \S | Any non-whitespace character |
[a-zA-Z] | Any single character in the range a-z or A-Z | \d | Any digit |
^ | Start of line | \D | Any non-digit |
$ | End of line | \w | Any word character (letter, number, underscore) |
\A | Start of string | \W | Any non-word character |
\z | End of string | \b | Any word boundary |
(...) | Capture everything enclosed | (a|b) | a or b |
a?) | Zero or one of a | a* | Zero or more of a |
a+ | One or more of a | a\{3\} | Exactly 3 of a |
a\{3,\} | 3 or more of a | a\{3,6\} | Between 3 and 6 of a |