elm-lang/core/#378 surfaced an interesting tension:
- An unsafe regex compiling function like the current
regex
function can crash at runtime if given an invalid sytnax. This is a serious problem when you want to support arbitrary regexes coming in from end users. - A version that returned
Result
would neatly handle that case, but would be very inconvenient in the common case where you're hardcoding the regex and know it will definitely compile. You would either have to unsafely extract theResult
or else do a lot of unnecessary defensive programming for a case that can't come up.
There's a third option: do what other languages do and offer regex literals. The validity of their syntax can be checked at compile time.
isWhitespace : Regex
isWhitespace =
/^\s+$/
Given the ability to easily create a Regex
that cannot crash, there is no downside to making the regex
function return a Result
, which neatly solves elm-lang/core/#378.
- Emitting JavaScript RegExp literals would improve performance according to MDN.
- Hardcoded regexes can no longer throw runtime exceptions. Although those exceptions typically arrive promptly on startup, like a port error, they might not if the regular expression is instantiated deep in some nested conditionals.
- Given verified literals, using
Result
to solve elm-lang/core/#378 has no downside. - Syntax highlighters for regex litereals can improve source readability. See for example the highlighting in the above snippet.
- It's one more feature, increasing language complexity.
- Given that this feels like a minor pain point so far, it's significant that the implementation time would mean other language features aren't being worked on instead.
- It might be difficult to check JS regexp syntax at compile time with sufficient accuracy to guarantee that it won't fail to compile at runtime in any relevant browsers.
Regular expressions are a common enough tool in industry programming that many languages offer first-class support for them, such as:
A solution like seems sensible to me, leaving
regex
to be used to handle dynamic regexs and get error messages, and/[abc]/
syntax for static regexs. This syntax would undoubtedly replace usage ofregex
in most cases and for most users.If Elm continues using Javascript's regex engine (and there's no reason not to), then the error messages should be as helpful as the current runtime error - they should not differ at all.
There are some issues with using slashes as the syntax for regexs, for example
A naive implementation may try to grab
= True && isZombie
as a RegExp, which is valid in JS, but a well defined syntax will be able to deal with that.