Skip to content

Instantly share code, notes, and snippets.

@belous
Created February 20, 2017 11:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save belous/d373656c9e6b83914a56aab4a54cc033 to your computer and use it in GitHub Desktop.
Save belous/d373656c9e6b83914a56aab4a54cc033 to your computer and use it in GitHub Desktop.
Regex Cheat Sheet
  • Any metacharacter can be escaped using a backslash, \. This turns it back into a literal. So the regular expression c\.t means "find a c, followed by a full stop, followed by a t".
  • The backslash is a metacharacter, which means that it too can be escaped using a backslash. So the regular expression c\\t means "find a c, followed by a backslash, followed by a t".
  • The regular expression c[aeiou]t means, "find a c followed by a vowel followed by a t". In a piece of text, this will find cat, cet, cit, cot and cut.
  • The regular expression [0123456789] means "find a digit".
  • The regular expression [a] means the same as a: "find an a".
  • The regular expression [ ] means "find a space".
  • \[a\] means "find a left square bracket followed by an a followed by a right square bracket".
  • [\[\]ab] means "find a left square bracket or a right square bracket or an a or a b".
  • [\\\[\]] means "find a backslash or a left square bracket or a right square bracket". (Urgh!)
  • [b-f] is the same as [bcdef] and means "find a b or a c or a d or an e or an f".
  • [A-Z] is the same as [ABCDEFGHIJKLMNOPQRSTUVWXYZ] and means "find an upper-case letter".
  • [1-9] is the same as [123456789] and means "find a non-zero digit".
  • [0-9.,] means "find a digit or a full stop or a comma".
  • [0-9a-fA-F] means "find a hexadecimal digit".
  • [a-zA-Z0-9\-] means "find an alphanumeric character or a hyphen".
  • [^a] means "find any character other than an a".
  • [^a-zA-Z0-9] means "find a non-alphanumeric character".
  • [\^abc] means "find a caret or an a or a b or a c".
  • [^\^] means "find any character other than a caret". (Ugh!)
  • The regular expression \d means the same as [0-9]: "find a digit". (To find a backslash followed by a d, use the regular expression \\d.)
  • \w means the same as [0-9A-Za-z_]: "find a word character".
  • \s means "find a space character (space, tab, carriage return or line feed)".
  • \D means [^0-9]: "find a non-digit".
  • \W means [^0-9A-Za-z_]: "find a non-word character".
  • \S means "find a non-space character".
  • The regular expression a{1} is the same as a and means "find an a".
  • a{3} means "find an a followed by an a followed by an a".
  • a{0} means "find the empty string". By itself, this appears to be useless. If you use this regular expression on any piece of text, you will immediately get a match, right at the point where you started searching. This remains true even if your text is the empty string!
  • a\{2\} means "find an a followed by a left brace followed by a 2 followed by a right brace".
  • Braces have no special meaning inside character classes. [{}] means "find a left brace or a right brace".
  • Multipliers have no memory. The regular expression [abc]{2} means "find a or b or c, followed by a or b or c". This is the same as "find aa or ab or ac or ba or bb or bc or ca or cb or cc". It does not mean "find aa or bb or cc"!
  • x{4,4} is the same as x{4}.
  • colou{0,1}r means "find colour or color".
  • a{3,5} means "find aaaaa or aaaa or aaa".
  • a{1,} means "find one or more as in a row". Your multiplier will still be greedy, though. After finding the first a, it will try to find as many more as as possible.
  • .{0,} means "find anything". No matter what your input text is - even the empty string - this regular expression will successfully match the entire text and return it to you.
  • - ? means the same as {0,1}. For example, colou?r means "find colour or color".
  • * means the same as {0,}. For example, .* means "find anything", exactly as above.
  • + means the same as {1,}. For example, \w+ means "find a word". Here a "word" is a sequence of 1 or more "word characters", such as _var or AccountName1.
  • \?\*\+ means "find a question mark followed by an asterisk followed by a plus sign".
  • [?*+] means "find a question mark or an asterisk or a plus sign".
  • \d{4,5}? means "find \d\d\d\d or \d\d\d\d\d". This has exactly the same behaviour as \d{4}.
  • colou??r is colou{0,1}?r which means "find color or colour". This has the same behaviour as colou?r.
  • ".*?" means "find a double quote, followed by as few characters as possible, followed by a double quote". This, unlike the two examples above, is actually useful.
  • cat|dog means "find cat or dog".
  • red|blue| and red||blue and |red|blue all mean "find red or blue or the empty string".
  • a|b|c is the same as [abc].
  • cat|dog|\| means "find cat or dog or a pipe".
  • [cat|dog] means "find a or c or d or g or o or t or a pipe".
  • To find a day of the week, use (Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day.
  • (\w*)ility is the same as \w*ility. Both mean "find a word ending in ility". For why the first form might be useful, see later...
  • \(\) means "find a left parenthesis followed by a right parenthesis".
  • [()] means "find a left parenthesis or a right parenthesis".
  • (red|blue|) means "find red or blue or the empty string".
  • abc()def means the same as abcdef.
  • (red|blue)? means the same as (red|blue|).
  • \w+(\s+\w+)* means "find one or more words separated by whitespace".
  • The regular expression \b means "find a word boundary".
  • \b\w\w\w\b means "find a three-letter word".
  • a\ba means "find a, followed by a word boundary, followed by a". This regular expression will never successfully find a match, no matter what the input text.
  • Word boundaries are not characters. They have zero width. The following regular expressions are identical in behaviour:
    • (\bcat)\b
    • (\bcat\b)
    • \b(cat)\b
    • \b(cat\b)
  • The regular expression ^ means "find a start-of-line".
  • The regular expression $ means "find an end-of-line".
  • ^$ means "find an empty line".
  • ^.*$ will find your entire text, because a line break is a character and . will find it. To find a single line, use a non-greedy multiplier, ^.*?$.
  • \^\$ means "find a caret followed by a dollar sign".
  • [$] means "find a dollar sign". However, [^] is not a valid regular expression. Remember that the caret has a different special meaning inside square brackets! To put a caret in a character class, use [\^].
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment