This is a strawperson for the addition of multiple Regular Expression features popular in various languages and parsers. The primary influences for this proposal come from prior art in the following languages and regular expression engines:
Table of Contents
-
n
- Explicit capture mode. Does not capture unnamed capture groups:(subexpression)
is treated like(?:subexpression)
, but(?<name>subexpression)
is treated as normal. -
x
- Ignore pattern whitespace mode. Eliminates whitespace in a regular expression, and enables "x-mode" comments at the end of a line (comments starting with#
).
-
(?#...comment...)
- Inline comments. All content between(?#
and the next (non-escaped))
is eliminated from the pattern. -
(?imnsx-imnsx)
- Enables or disables specific RegExp flags from this position until the end of the current group ()
) or the end of the pattern. This is very useful when parsing regular expressions specified in other formats, such as in JSON configuration files or TextMate Language files. -
(?imnsxu-imnsxu:subexpression)
- Non-capturing group that enables or disables specific RegExp flags for the providedsubexpression
. This is very useful when parsing regular expressions specified in other formats, such as in JSON configuration files or TextMate Language files. -
(?(expression)yes|no)
,(?(name)yes|no)
,(?(number)yes|no)
- Conditional matching based on an expression or named or numbered backreference. Ifexpression
isDecimalDigits
, it is treated as a numeric backreference. Ifexpression
is the name of an existing capture group, it is treated as a named backreference. Forname
andnumber
, the expression tests whether the last evaluation of the capture group was a match. Forexpression
, the expression is treated as a zero-width assertion and is treated as(?(?=expression)yes|no)
. The|no
part of the expression may be omitted and is treated as(?(expression)yes|)
. -
(?<name1-name2>subexpression)
- Balancing groups. Deletes a previously-named group (name2
) and stores in the current group (name1
) the interval between the previous group and the new group. If noname2
group is defined, the match backtracks. Useful for matching balanced parentheses or brackets.- Examples:
-
Matches:
new RegExp(` ^ # Start at beginning of string. [^<>]* # Match zero or more characters that are not angle brackets. ( ((?<Open><)[^<>]*)+ # Match one or more open angle brackets followed by zero or # more non-bracket characters. ((?<Close-Open>>)[^<>]*)+ # Match one or more close angle brackets followed by zero # or more non-bracket characters. The substring between # Open and Close is stored in Close, and the previous Open # match is deleted. )* (?(Open)(?!)) # If any Open groups still remain, fail the entire match # using a zero-width negative lookahead. $ # Stop at end of string. `, "x") // Ignore whitespace to improve readability
<abc><mno<xyz>>
. Does not match:<
,>
,<<>
,<>>
-
- Prior Art:
- Examples:
-
(?>subexpression)
- Atomic groups. Non-capturing group that disables backtracking in the subexpression.
\g<name>
,\g<number>
- Reexecute the subexpression of the named or numbered capture group at the current-position. Allows reusing a capture group's subexpression without rewriting the capture group.- Examples:
-
Matches:
new RegExp(` (?((?!)) # Failing conditional to define reusable groups. (?<Year>\d{4}) (?<Month>\d{2}) (?<Day>\d{2}) (?<WeekOfYear>W\d{2}) (?<DayOfWeek>\d) (?<DayOfYear>\d{3}) (?<CalendarDate>\g<Year>-\g<Month>-\g<Day>) # YYYY-MM-DD (?<WeekDate>\g<Year>-\g<WeekOfYear>-\g<DayOfWeek>) # YYYY-Www-DD (?<OrdinalDate>\g<Year>-\G<DayOfYear>) # YYYY-DDD (?<Date>\g<CalendarDate>|\g<WeekDate>|\g<OrdinalDate>) ) \g<Date> `, "x") // Ignore whitespace to improve readability
2020-01-01
,2020-W1-6
,2020-200
-
- Prior Art:
- Examples: