There are two string values being updated as we go along:
RegularExpressionNode#unescaped
RegularExpressionNode#source
unescaped
is supposed to be the source string according to the interface. However, it doesn't adapt to many situations.
Regex
CRuby 3.3.0 Source
Prism RegularExpressioNode#unescaped
(Pre-changes)
/\x00/
"\\x00"
"\\x00"
/x0/
"\\x0"
"\\x0"
/\xa/
"\\xa"
"\\xa"
/\M-\C-?/
"\xFF"
(invalid multibyte escape
)
"\\xFF"
/u{80}/
"\\u{80}"
"\\u{80}"
/u0080/
"\\u0080"
"\\u0080"
/\x80\u{80}/
"\x80\u{80}"
(invalid multibyte escape
)
"\\x80\\u{80}"
/\\x0/
"\\x0"
"\\x0"
bin/prism parse -e ' /\x00/'
bin/prism parse -e ' /\x0/'
bin/prism parse -e ' /\xa/'
bin/prism parse -e ' /\M-\C-?/'
bin/prism parse -e ' /\u{80}/'
bin/prism parse -e ' /\u0080/'
bin/prism parse -e ' /\x80\u{80}/'
bin/prism parse -e ' /\\x0/'
After my first round of changes to better track the byte values behind the source strings in RegularExpressionNode
, we ended up with:
Regex
Prism RegularExpressioNode#unescaped
Prism RegularExpressioNode#source
/\x00/
"\u0000"
"\\x00"
/x0/
""\u0000"
"\\x0"
/\xa/
"\n"
"\\xa"
/\M-\C-?/
"\xFF"
"\\xFF"
/u{80}/
"\u0080"
"\\u{80}"
/u0080/
"\u0080"
"\\u0080"
/\x80\u{80}/
"\x80\u0080"
"\\x80\\u{80}"
/\\x0/
"\u0000"
"\\x0"
Regexp
Encoding Modifiers
/u
UTF-8
/e
EUC / EUC-JP
/s
SJIS / Windows-31 J
/n
ASCII-8BIT
Source Encoding: US-ASCII
Regexp
Encoding
/garçon/
invalid multibyte char (US-ASCII) (SyntaxError)
/garçon/u
invalid multibyte char (US-ASCII) (SyntaxError)
/garçon/e
invalid multibyte char (US-ASCII) (SyntaxError)
/garçon/s
invalid multibyte char (US-ASCII) (SyntaxError)
/garçon/n
invalid multibyte char (US-ASCII) (SyntaxError)
Regexp
Encoding
/\x80/
ASCII-8BIT
/\x80/u
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/e
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/s
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/n
ASCII-8BIT
/gar\xC3\xA7on/
ASCII-8BIT
/gar\xC3\xA7on/u
UTF-8
/gar\xC3\xA7on/e
EUC-JP
/gar\xC3\xA7on/s
Windows-31J
/gar\xC3\xA7on/n
ASCII-8BIT
Regexp
Encoding
/gar\u{E7}on/
UTF-8
/gar\u{E7}on/u
UTF-8
/gar\u{E7}on/e
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/s
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/n
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
Source Encoding: ASCII-8BIT
Regexp
Encoding
/garçon/
ASCII-8BIT
/garçon/u
regexp encoding option 'u' differs from source encoding 'ASCII-8BIT' (SyntaxError)
/garçon/e
regexp encoding option 'e' differs from source encoding 'ASCII-8BIT' (SyntaxError)
/garçon/s
regexp encoding option 's' differs from source encoding 'ASCII-8BIT' (SyntaxError)
/garçon/n
ASCII-8BIT
Regexp
Encoding
/\x80/
ASCII-8BIT
/\x80/u
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/e
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/s
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/n
ASCII-8BIT
/gar\xC3\xA7on/
ASCII-8BIT
/gar\xC3\xA7on/u
UTF-8
/gar\xC3\xA7on/e
EUC-JP
/gar\xC3\xA7on/s
Windows-31J
/gar\xC3\xA7on/n
ASCII-8BIT
Regexp
Encoding
/gar\u{E7}on/
UTF-8
/gar\u{E7}on/u
UTF-8
/gar\u{E7}on/e
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/s
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/n
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
Regexp
Encoding
/garçon/
UTF-8
/garçon/u
UTF-8
/garçon/e
regexp encoding option 'e' differs from source encoding 'UTF-8' (SyntaxError)
/garçon/s
regexp encoding option 's' differs from source encoding 'UTF-8' (SyntaxError)
/garçon/n
regexp encoding option 'n' differs from source encoding 'UTF-8' (SyntaxError) /.../n has a non escaped non ASCII character in non ASCII-8BIT script: /garçon/
Regexp
Encoding
/\x80/
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/u
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/e
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/s
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/n
ASCII-8BIT
/gar\xC3\xA7on/
UTF-8
/gar\xC3\xA7on/u
UTF-8
/gar\xC3\xA7on/e
EUC-JP
/gar\xC3\xA7on/s
Windows-31J
/gar\xC3\xA7on/n
ASCII-8BIT
Regexp
Encoding
/gar\u{E7}on/
UTF-8
/gar\u{E7}on/u
UTF-8
/gar\u{E7}on/e
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/s
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/n
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
Regexp
Encoding
/garçon/
EUC-JP
/garçon/u
regexp encoding option 'u' differs from source encoding 'EUC-JP' (SyntaxError)
/garçon/e
EUC-JP
/garçon/s
regexp encoding option 's' differs from source encoding 'EUC-JP' (SyntaxError)
/garçon/n
regexp encoding option 'n' differs from source encoding 'EUC-JP' (SyntaxError) /.../n has a non escaped non ASCII character in non ASCII-8BIT script: /gar\x{C3A7}on/
Regexp
Encoding
/\x80/
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/u
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/e
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/s
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/n
ASCII-8BIT
/gar\xC3\xA7on/
EUC-JP
/gar\xC3\xA7on/u
UTF-8
/gar\xC3\xA7on/e
EUC-JP
/gar\xC3\xA7on/s
Windows-31J
/gar\xC3\xA7on/n
ASCII-8BIT
Regexp
Encoding
/gar\u{E7}on/
UTF-8
/gar\u{E7}on/u
UTF-8
/gar\u{E7}on/e
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/s
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/n
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
Source Encoding: Windows-31J
Regexp
Encoding
/garçon/
Windows-31J
/garçon/u
regexp encoding option 'u' differs from source encoding 'Windows-31J' (SyntaxError)
/garçon/e
regexp encoding option 'e' differs from source encoding 'Windows-31J' (SyntaxError)
/garçon/s
Windows-31J
/garçon/n
regexp encoding option 'n' differs from source encoding 'Windows-31J' (SyntaxError) /.../n has a non escaped non ASCII character in non ASCII-8BIT script: /gar\xC3\xA7on/
Regexp
Encoding
/\x80/
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/u
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/e
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/s
invalid multibyte escape: /\x80/ (SyntaxError)
/\x80/n
ASCII-8BIT
/gar\xC3\xA7on/
Windows-31J
/gar\xC3\xA7on/u
UTF-8
/gar\xC3\xA7on/e
EUC-JP
/gar\xC3\xA7on/s
Windows-31J
/gar\xC3\xA7on/n
ASCII-8BIT
Regexp
Encoding
/gar\u{E7}on/
UTF-8
/gar\u{E7}on/u
UTF-8
/gar\u{E7}on/e
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/s
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)
/gar\u{E7}on/n
incompatible character encoding: /gar\u{E7}on/ (SyntaxError)