The JavaScript version.
Search for: 1
- "/example/":
/\/example\/[a-z]+/i
- Switch words in a string
let re = /(\w+)\s(\w+)/;
let str = 'John Smith';
let newstr = str.replace(re, '$2, $1');
console.log(newstr); // Smith, John
- Using an inline function that modifies the matched characters
function styleHyphenFormat(propertyName) {
function upperToHyphenLower(match, offset, string) {
return (offset > 0 ? '-' : '') + match.toLowerCase();
}
return propertyName.replace(/[A-Z]/g, upperToHyphenLower);
}
console.log(styleHyphenFormat('borderTop')) // border-top
- Converting Fahrenheit to Celsius
function f2c(x) {
function convert(str, p1, offset, s) {
return ((p1 - 32) * 5/9) + 'C';
}
let s = String(x);
let test = /(-?\d+(?:\.\d*)?)F\b/g; // (?:...) is a non-capturing group
return s.replace(test, convert);
}
- Capturing the matched pattern
const regexChars = /[\\^$.*+?()[\]{}|]/g;
const str = 'as[b*';
console.log(str.replace(regexChars, `\\$&`)) // 'as\\[b\\*'
A word boundary (\b
) is a zero width match that can match:
- Between a word character (
\w
) and a non-word character (\W
) or - Between a word character and the start or end of the string.
\B
is the inverse of \b
, also zero width. It can match:
- Between two word characters.
- Between two non-word characters.
- Between a non-word character and the start or end of the string.
- The empty string.
Finding a non-word boundary? Just find the word boundaries, remove them, and everything left are basically non-word boundaries
- Metacharacters
.
: Any one character except newline, same as[^\n]
.\d
,\D
: Any one digit/non-digit character (where digits are[0-9]
).\w
,\W
: Any one word/non-word character. For ASCII, word characters are[a-zA-Z0-9_]
.\s
,\S
: Any one space/non-space character. For ASCII, whitespace characters are[ \n\r\t\f]
.
- Occurrence Indicators
+
: One or more, e.g.[0-9]+
matches 1 or more digits, such as "123", "0000".*
: Zero or more (accepts the above + empty strings).?
: Zero or one (optional), e.g., [+-]? matches an optional "+", "-", or an empty string.{}
{m,n}
:m
ton
(both inclusive).{m}
: Exactlym
times.{m,}
:m
or more times (m+
).
- Position Anchors
^
: Start of line, e.g.^[0-9]$
matches a numeric string.$
: End of line\b
: Boundary of word, i.e., start-of-word or end-of-word. E.g., \bcat\b matches the word "cat" in the input string.\B
: Inverse of\b
, i.e. non-start-of-word or non-end-of-word.
- Parenthesized Back References (Capture Group)
()
: Creates a capture group for extracting a substring or using a back reference.- Use
$1
,$2
, ... (JS, Java, Perl), or\1
,\2
, ... (Python) to retrieve the back references in sequential order. (?:...)
: A non-capturing group; creates a capture group that will be omitted from the resulting list of captures. 3
- Character Class (or Bracket List)
[]
[...]
: Accept any one of the character within the bracket.[.-.]
: Accept any one of the characters in the range, e.g.[0-9]
,[A-Za-z]
.[^...]
: Rejects any one of the character, e.g.[^0-9]
matches any non-digit.- Only ^, -, ], \ require escape sequence inside the bracket list.
|
: OR operator, e.g.four|4
accepts "four" or "4".\
: Escape sequence to accept a char with special meaning in regex.- Regex recognizes common escape sequences such as
\n
for newline,\t
for tab,\r
for carriage-return,\nnn
for a up to 3-digit octal number,\xhh
for a two-digit hex code,\uhhhh
for a 4-digit Unicode,\uhhhhhhhh
for a 8-digit Unicode.
- Regex recognizes common escape sequences such as
- Laziness
*?
,+?
,??
,{m,n}?
,{m,}?
: Curbs greediness for repetition operators.
- Capturing matched pattern
$&
: Represents the matched word.
Footnotes
-
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#examples ↩
-
https://stackoverflow.com/questions/4541573/what-are-non-word-boundary-in-regex-b-compared-to-word-boundary ↩
-
Lu, S. (2014, January 29). Use of capture groups in String.split(). Stack Overflow. https://stackoverflow.com/questions/21419530/use-of-capture-groups-in-string-split ↩
-
https://www3.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html#:~:text=On%20the%20other%20hand%2C%20the,%5CD%20or%20non%2Ddigit ↩