Suppose we have a RegExp.escape
which escapes:
- every ASCII punctuator except _, i.e.
(){}[]|,.?*+-^$=<>\/#&!%:;@~'"`
. - whitespace
- 0-9 if at the start of the string (with a hex escape)
And we make \-
and other currently-illegal escape sequences which would be produced by this function legal in u/v-mode RegExps (to mean the unescaped char), including inside of character classes.
And you don't put the output in a place where it would obviously mean something else, i.e. not
- immediately after
\x
,\x0
,\u00
,\c
, etc - immediately after an odd number of backslashes
- in
(?${here}:asdf)
(because of regexp modifiers)
Then escape
is safe, i.e. it cannot lead to context escapes.
Specifically, we have the following contexts:
context | cannot leave context because |
---|---|
"base" context | trivial |
character class | can't output unescaped ] , ^ , - , & , \ (etc) |
(...) group |
can't output unescaped ) or ? |
\u{...} |
can't output unescaped } |
\k<...> |
can't output unescaped > |
(?<...>) |
can't output unescaped > |
foo{...} |
can't output unescaped } or , |
\p{...} |
can't output unescaped } or = |
\q{...} |
can't output unescaped } or | |
after \1 |
numbers at the start of strings are escaped |
And the following proposed future contexts:
context | cannot leave context because |
---|---|
(?#...) |
can't output unescaped ) |
#... line comments |
can't output unescaped line terminator |
x -mode regexps |
can't output unescaped whitespace |
(?(...)...) conditions |
can't output unescaped ) or | |
This would be a commitment to only entering/exiting new contexts using whitespace or ASCII punctuators. That seems like it will not be a significant impediment to language evolution.