- Introduction
- Modes
- Literal characters
- Metacharacters
- Backreferences
- Special Characters
- Useful Expressions
Regular expressions are symbols representing a text pattern. They are used for matching, searching and replacing text.
The goal in regular expressions is to match both what you want and only what you want!
- Standard -
/re/
- Global -
/re/g
- Case-insensitive -
/re/i
- Multiline Anchors -
/re/m
- Dot-matches-all -
/re/s
Modes are defined after the last /
of the regular expression, and could be used together.
For example, using both global and case insensitive modes: /re/gi
/car/
matches "car"
/car/
matches the first three letters of "carnival"
- Case sensitive by default (best practice)
For example:/car/
doesn't match anything in "Carnival" - Standatd (non-global) matching - earliest (leftmost) match is always prefered.
word: "pazzazz"
/zz/
- will match pazz
azz
/zz/g
- will match pazz
azz
There are only few metacharacters to learn:
\ . * + - { } [ ] ^ & $ | ? ( ) : ! =
.
- Any character except newline
Examples:
/h.t/
- matches "hot" , "hat" , "hit" but not "heat"/.a.a.a/
- matches "banana" , "#aga!a" , " a asa"
Notice for common mistake:
/9.00/
- matches "9.00", "9500" and "9-00"
\
- Escape the next metacharacters
Note that literal characters shouldn't be escaped
Examples:
/9\.00/
- matches "9.00" but not "9500" or "9-00"/\/home\/usr\/doc\.txt/
- matches "/home/usr/doc.txt"
-
[
,]
- Defining a character set (begin and end), but only one
Order of characters does not matter
Note: Metacharacters shouldn't be escaped inside a character set - they are already escaped (Except]
,-
,^
,\
) Examples:/gr[ea]y/
- matches both "grey" and "gray"/gr[ea]t/
- doesn't match "great"/h[abc.xyz]t/
- matches "hat" and "h.t" - the.
is already escaped./var[[(][0-9][)\]]/
- matches "var(3)" and "var(4)"/file[0\-\\_]1/
- matches "file01", "file-1", "file\1" and "file_1"
Shorten character set:
\d
- all digits (same as[0-9]
)\w
- work character (same as[a-zA-Z0-9_]
)\s
- whitespace (same as[ \t\r\n]
)\D
- not digits (same as[^0-9]
)\W
- not work character (same as[^a-zA-Z0-9_]
)\S
- not whitespace (same as[^ \t\r\n]
)
-
-
- Range of characters - represents all characters between two characters
Only inside a character set
Examples:/[0-9]/
- matches for any digit/[A-Za-z]/
- matches for all letters/[a-ek-ou-y]/
- any letter in the specified range
Caution:
/[50-99]/
- is not all numbers from 50 to 99
^
- Negate a character set - adding it as the first of character set
Still represents only one character
Examples:/see[^mn]/
- matches "seek" and "sees" but not "seem" or "seen"
Caution:
/see[^mn]/
- matches "see " but not "see"
-
*
- Preceding item zero or more times
Examples:/apples*/
- matches "apple", "apples" and "applessss"/\d\d\d\d*/
- matches numbers with three digits or more
-
+
- Preceding item one or more times
Examples:/apples+/
- matches "apples" and "applessss", but not "apple"/<[^>]+>/
- matches any HTML tag
-
?
- Preceding item zero or one time
Note that literal characters shouldn't be escaped
Examples:/apples*/
- matches "apple", "apples" but not "applessss"/colou?r/
- matches "color" and "colour"
-
{
,}
- Starting and ending quantified repetition of preceding item
Getting{min,max}
- positive numbers. Min must always be included (can be zero). Max is optional.
Examples:/\d{4,8}/
- matches numbers with four or eight digits/\d{4}/
- matches numbers exactly four digits/\d{4,}/
- matches numbers with four or more digits (max is infinite)
-
(
,)
- Grouping metacharacters
Makes the expressions easier to read. Cannot be used inside character set.
Examples:/(abc)+/
- matches "abc" and "abcabcabc"/(in)?dependent/
- matches "independent" and "dependent"/run(s)?/
- is the same as/runs?/
-
|
- Match previous or next expression
Examples:/apple|orange/
- matches "apple" and "orange"/w(ei|ie)rd/
- matches "weird" and "wierd"/(AA|BB|CC){6}/
- matches "AABBAACCAABB" and more../(\d\d|[A-Z][A-Z]){3}/
- matches "112233", "AA66ZZ", "11AA44" and more..
-
Anchors Metacharacters:
Anchors refers to a position, not an actual character. They are zero-width.^
: Start of string / line. (Not the same as at start of a character set)
$
: End of string / line
Examples:/^apple/
- matches "apple" only if it's on a beginning of a string/line/apple$/
- matches "apple" only if it's on a end of a string/line
Stores the matched portion in parentheses.
/a(p{2}l)l/
matches "apple" and stores "ppl". It is done automatically by default.
Refer to first backreference with \1
.
\1
through \9
- backreferences for positions 1 to 9.
Usage:
- Can be used in the same expression as the group.
- Can be accessed after the match is complete (programming language needed).
Examples:
/(apples) to \1/
- matches "apples to apples"/(ab)(cd)(ed)\3\2\1/
- matches "abcdefefcdab"/<(i|em)>.+?</\1>/
- matches "Hello" and "Hello"
- Spaces - space is a regular character
- Tabs - tabs are matchable by
\t
- Line -
\r
,\n
,\r\n
** depends on your file mode - Non-printable characters:
- bell
\a
- escape
\e
- bell
- Names:
/^\w+/
- Not that good solution/^[A-Z][a-z.']+ [A-Z][a-z.']+/
- Matches first name and last
- Email Adresses:
/^[\w.\-]+@[\w.\-]+\.[A-Za-z]{2,3}$/
- Matches email
- URLs:
/^(http|https):\/\/[\w.\-]+(\.[\w\-]+)+[/#?]?.*$/
- IPs:
/^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/m
- It is long, but assures that we won't get higher than 255 for each number.
- HTML tags:
/<([^>]+)>(.*?)</\1>/