regexyl/regex.md

## regex.md

      
    Raw
  

              regex.md
            
          
    Regex Cheatsheet

The JavaScript version.
Frequent Examples

Search for: ¹

"/example/": /\/example\/[a-z]+/i
Switch words in a string

let re = /(\w+)\s(\w+)/;
let str = 'John Smith';
let newstr = str.replace(re, '$2, $1');
console.log(newstr);  // Smith, John

Using an inline function that modifies the matched characters

function styleHyphenFormat(propertyName) {
  function upperToHyphenLower(match, offset, string) {
    return (offset > 0 ? '-' : '') + match.toLowerCase();
  }
  return propertyName.replace(/[A-Z]/g, upperToHyphenLower);
}
console.log(styleHyphenFormat('borderTop')) // border-top

Converting Fahrenheit to Celsius

function f2c(x) {
  function convert(str, p1, offset, s) {
    return ((p1 - 32) * 5/9) + 'C';
  }
  let s = String(x);
  let test = /(-?\d+(?:\.\d*)?)F\b/g; // (?:...) is a non-capturing group
  return s.replace(test, convert);
}

Capturing the matched pattern

const regexChars = /[\\^$.*+?()[\]{}|]/g;
const str = 'as[b*';
console.log(str.replace(regexChars, `\\$&`)) // 'as\\[b\\*'
Possible Trip-Ups

\b\ and \B: Matching [non-]word boundaries

A word boundary (\b) is a zero width match that can match:

Between a word character (\w) and a non-word character (\W) or
Between a word character and the start or end of the string.

\B is the inverse of \b, also zero width. It can match:

Between two word characters.
Between two non-word characters.
Between a non-word character and the start or end of the string.
The empty string.

Finding a non-word boundary? Just find the word boundaries, remove them, and everything left are basically non-word boundaries
²
Syntax


Metacharacters

.: Any one character except newline, same as [^\n].
\d, \D: Any one digit/non-digit character (where digits are [0-9]).
\w, \W: Any one word/non-word character. For ASCII, word characters are [a-zA-Z0-9_].
\s, \S: Any one space/non-space character. For ASCII, whitespace characters are [ \n\r\t\f].


Occurrence Indicators

+: One or more, e.g. [0-9]+ matches 1 or more digits, such as "123", "0000".
*: Zero or more (accepts the above + empty strings).
?: Zero or one (optional), e.g., [+-]? matches an optional "+", "-", or an empty string.
{}

{m,n}: m to n (both inclusive).
{m}: Exactly m times.
{m,}: m or more times (m+).


Position Anchors

^: Start of line, e.g. ^[0-9]$ matches a numeric string.
$: End of line
\b: Boundary of word, i.e., start-of-word or end-of-word. E.g., \bcat\b matches the word "cat" in the input string.
\B: Inverse of \b, i.e. non-start-of-word or non-end-of-word.


Parenthesized Back References (Capture Group)

(): Creates a capture group for extracting a substring or using a back reference.
Use $1, $2, ... (JS, Java, Perl), or \1, \2, ... (Python) to retrieve the back references in sequential order.
(?:...): A non-capturing group; creates a capture group that will be omitted from the resulting list of captures. ³


Character Class (or Bracket List)

[]
[...]: Accept any one of the character within the bracket.
[.-.]: Accept any one of the characters in the range, e.g. [0-9], [A-Za-z].
[^...]: Rejects any one of the character, e.g. [^0-9] matches any non-digit.
Only ^, -, ], \ require escape sequence inside the bracket list.


|: OR operator, e.g. four|4 accepts "four" or "4".
\: Escape sequence to accept a char with special meaning in regex.

Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.


Laziness

*?, +?, ??, {m,n}?, {m,}?: Curbs greediness for repetition operators.


Capturing matched pattern

$&: Represents the matched word.


⁴
Awesome Resources


https://riptutorial.com/regex

Footnotes


https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#examples ↩


https://stackoverflow.com/questions/4541573/what-are-non-word-boundary-in-regex-b-compared-to-word-boundary ↩


Lu, S. (2014, January 29). Use of capture groups in String.split(). Stack Overflow. https://stackoverflow.com/questions/21419530/use-of-capture-groups-in-string-split ↩


https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit ↩