Skip to content

Instantly share code, notes, and snippets.

@regexyl
Last active February 25, 2022 13:43
Show Gist options
  • Save regexyl/f465d8362c2b7c77284b1455b1f8c5ed to your computer and use it in GitHub Desktop.
Save regexyl/f465d8362c2b7c77284b1455b1f8c5ed to your computer and use it in GitHub Desktop.

Regex Cheatsheet

The JavaScript version.

Frequent Examples

Search for: 1

  1. "/example/": /\/example\/[a-z]+/i
  2. Switch words in a string
let re = /(\w+)\s(\w+)/;
let str = 'John Smith';
let newstr = str.replace(re, '$2, $1');
console.log(newstr);  // Smith, John
  1. Using an inline function that modifies the matched characters
function styleHyphenFormat(propertyName) {
  function upperToHyphenLower(match, offset, string) {
    return (offset > 0 ? '-' : '') + match.toLowerCase();
  }
  return propertyName.replace(/[A-Z]/g, upperToHyphenLower);
}
console.log(styleHyphenFormat('borderTop')) // border-top
  1. Converting Fahrenheit to Celsius
function f2c(x) {
  function convert(str, p1, offset, s) {
    return ((p1 - 32) * 5/9) + 'C';
  }
  let s = String(x);
  let test = /(-?\d+(?:\.\d*)?)F\b/g; // (?:...) is a non-capturing group
  return s.replace(test, convert);
}
  1. Capturing the matched pattern
const regexChars = /[\\^$.*+?()[\]{}|]/g;
const str = 'as[b*';
console.log(str.replace(regexChars, `\\$&`)) // 'as\\[b\\*'

Possible Trip-Ups

\b\ and \B: Matching [non-]word boundaries

A word boundary (\b) is a zero width match that can match:

  • Between a word character (\w) and a non-word character (\W) or
  • Between a word character and the start or end of the string.

\B is the inverse of \b, also zero width. It can match:

  • Between two word characters.
  • Between two non-word characters.
  • Between a non-word character and the start or end of the string.
  • The empty string.

Finding a non-word boundary? Just find the word boundaries, remove them, and everything left are basically non-word boundaries

2

Syntax

  • Metacharacters
    • .: Any one character except newline, same as [^\n].
    • \d, \D: Any one digit/non-digit character (where digits are [0-9]).
    • \w, \W: Any one word/non-word character. For ASCII, word characters are [a-zA-Z0-9_].
    • \s, \S: Any one space/non-space character. For ASCII, whitespace characters are [ \n\r\t\f].
  • Occurrence Indicators
    • +: One or more, e.g. [0-9]+ matches 1 or more digits, such as "123", "0000".
    • *: Zero or more (accepts the above + empty strings).
    • ?: Zero or one (optional), e.g., [+-]? matches an optional "+", "-", or an empty string.
    • {}
      • {m,n}: m to n (both inclusive).
      • {m}: Exactly m times.
      • {m,}: m or more times (m+).
  • Position Anchors
    • ^: Start of line, e.g. ^[0-9]$ matches a numeric string.
    • $: End of line
    • \b: Boundary of word, i.e., start-of-word or end-of-word. E.g., \bcat\b matches the word "cat" in the input string.
    • \B: Inverse of \b, i.e. non-start-of-word or non-end-of-word.
  • Parenthesized Back References (Capture Group)
    • (): Creates a capture group for extracting a substring or using a back reference.
    • Use $1, $2, ... (JS, Java, Perl), or \1, \2, ... (Python) to retrieve the back references in sequential order.
    • (?:...): A non-capturing group; creates a capture group that will be omitted from the resulting list of captures. 3
  • Character Class (or Bracket List)
    • []
    • [...]: Accept any one of the character within the bracket.
    • [.-.]: Accept any one of the characters in the range, e.g. [0-9], [A-Za-z].
    • [^...]: Rejects any one of the character, e.g. [^0-9] matches any non-digit.
    • Only ^, -, ], \ require escape sequence inside the bracket list.
  • |: OR operator, e.g. four|4 accepts "four" or "4".
  • \: Escape sequence to accept a char with special meaning in regex.
    • Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.
  • Laziness
    • *?, +?, ??, {m,n}?, {m,}?: Curbs greediness for repetition operators.
  • Capturing matched pattern
    • $&: Represents the matched word.

4

Awesome Resources

Footnotes

  1. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#examples

  2. https://stackoverflow.com/questions/4541573/what-are-non-word-boundary-in-regex-b-compared-to-word-boundary

  3. Lu, S. (2014, January 29). Use of capture groups in String.split(). Stack Overflow. https://stackoverflow.com/questions/21419530/use-of-capture-groups-in-string-split

  4. https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment