Skip to content

Instantly share code, notes, and snippets.

@tabatkins
Last active March 24, 2021 15:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tabatkins/5c336285c7f6d7d6522561e97f4d98fb to your computer and use it in GitHub Desktop.
Save tabatkins/5c336285c7f6d7d6522561e97f4d98fb to your computer and use it in GitHub Desktop.
Matcher clauses

JS Pattern-Matching Construct

Building on the previous work by Kat Marchán.

Comparison with Python's newly-accepted pattern-match syntax.

Core Syntax

match (<expr>) {
	when <matcher> -> <match-body>;
	when <matcher> if (<expr>) -> <match-body>;
	if(<expr>) -> <match-body>;
	else -> <match-body>;
};

The <expr> is any JS expression. It's evaluated first, and its resultant value (the "match value") is compared to each of the when statements until one is found that matches.

<matcher>s are described below. Additionally, each clause can have an if (<expr>) suffix on its matcher; if the <matcher> has succeeded, the <expr> is evaluated after the matchers have fully run (with the same bindings as described below for the <match-body>), and the <matcher> only succeeds if the expression result is truthy.

An if (<expr>) can also be used on its own, without a preceding when <matcher> clause; if its <expr> is truthy, the match succeeds.

<match-body> is an expression (possibly a do-expr, to handle multi-line bodies). The first <matcher> that successfully matches causes its corresponding <match-body> to be executed, and the result of the <match-body> evaluation is returned as the value of the case block.

Within the <match-body>, additional variable bindings from the <matcher>s are introduced, to give the <match-body> access to parts of the matched value.

If no matchers match, but there's an else clause at the end, its <match-body> is executed. If no matchers match and there is not an else clause, the statement raises an error (type TBD).

(Precise syntax of the match block is not important here, and I'm happy to alter the spelling of virtually anything here.)

Matcher Clauses

There are several types of matcher clauses.

"Leaf clauses", that just operate directly on the match value:

  1. Primitive (and near-primitive) matchers, like 1 or false or undefined or -Infinity or "foo". These match if the match value is SameValueZero with them. They do not introduce a binding.

    (The set of near-primitive matchers is predefined. It's not an arbitrary expression: -Infinity is allowed, but -1 * Infinity is not.)

    These can be followed by an as ... clause, as detailed below, to bind the value to a variable.

  2. The nil matcher, spelled _. This always matches, but does not introduce a binding.

    (This is a nice-to-have, and could be dropped from v1, but it's a feature common to every pattern-matching feature I'm aware of.)

    (I'm also open to other spellings.)

  3. Ident matchers, like foo. These always match, and bind the match value to the given name.

  4. Regex-literal matchers, taking a regex literal. The match value is stringified, and this clause matches if the regex matches. If the regex defines named capture groups, the names are automatically bound to the matched substrings.

    (This is a nice-to-have, enabling a common use-case with simple-to-understand binding ergonomics. It can be dropped from v1.)

"Nesting clauses", letting you destructure a value and match sub-parts of it with more matcher clauses:

  1. Array-literal matchers, like [a,b] or [0, ...c] or [1,2,...].

    These contain a comma-separated list of zero or more matcher clauses, possibly ending in either a ... or ...ident.

    This matcher first verifies that the match value is iterable, then obtains and consumes the entire iterator. If the result doesn't have enough values for the provided match clauses, the match fails. If the matcher doesn't end with a ... pattern and the iterator has leftover values after the provided match clauses, the match fails (so [a,b] only matches things with exactly two items). If then recursively applies the match clauses to the corresponding items from the iterator, matching only if all of the child clauses match.

    (See Potential Issues, below, for more discussion of exactly how array-literal matchers should work with iterators.)

    It accumulates the bindings from each child matcher clause, and if it ends in a ...ident, binds the remainder of the iterator's values in a fresh Array to that ident as well.

    These can be followed by an optional as <ident>, which binds the entire consumed iterator, repackaged into an array, to the <ident> like when [head, ...tail] as list -> ....

    (The naked ... is a nice-to-have, as a simple way to relax the length requirement that array-literal patterns naturally express. It could be written as ..._ instead, so this could be dropped from v1.)

    (Should as ... clauses be limited to an ident, or allow a full destructuring pattern?)

  2. Object-literal matchers, like {a, b} or {op: "+", lhs, rhs}.

    Contains a comma-separated list of <ident> or <key>: <matcher> entries. A key is either an ident, like {foo:...}, or a computed-key expression, like {[Symbol.foo]:...}.

    An <ident> key by itself (not followed by a :<matcher>), is treated as if it was followed by an ident matcher of the same name: {foo} and {foo:foo} are equivalent.

    The matcher requires the key to exist on the match value; if it's missing, the match fails. The value of the key is then matched against the <matcher> provided after the key; if that fails, the match fails.

    Like array-literal matchers, the object-literal matcher can also contain a final ...ident matcher, which creates a fresh Object containing all the keys of the match value that weren't explicitly matched, and binds it to the provided ident. (Or is this subsumed by the generic as clause, letting you still get access to the original match value?)

    It accumulates bindings from each <matcher> clause.

    Like array-literal matchers, these can be followed by an optional as ... clause, which binds the entire matched object like when {prop, ...rest} as obj -> ....

  3. Custom matchers, indicated with a ^ sigil prefix.

    Idents, dotted or bracketed exprs, and/or function calls can be nakedly prefixed: ^foo, ^foo(), ^foo.bar[0], etc. Anything more complex must be wrapped in parens and prefixed: ^(foo + 1), ^(await foo), ^(new Foo()), etc. (I don't know the correct grammar terms to express this, but they should exist.) This evalutes the prefixed expression.

    If the result is a primitive type (a string, number, etc), does a SameValueZero test against the match value, same as with a primitive literal matcher. This lets you easily test against the value of a variable:

    const LF = 0x0a;
    const CR = 0x0d;
    case(ch) {
      when ^LF | ^CR -> {...handle newlines...}
      ...
    }

    This form introduces no bindings.

    Otherwise it invokes the matcher protocol on the result: it fetches the [Symbol.matcher] key from the result, verifies it's a function, and calls it with the match value. (If it's not a function, it throws a runtime error.)

    If the result is null/undefined, fails the match. (I'm currently using null/undef as the sentinel meaning "doesn't match", on the argument that it's simple and it's unlikely that a successful match will want to return that as the match result. But I'm okay with making this a more formal thing like iteration results, where you return a {match:true|false, value:any} object, if the committee prefers. The current design makes it fairly easy to reuse existing functions as matchers without requiring them to be explicitly redesigned for this purpose, tho.) Otherwise, the match is conditionally successful.

    In either case, no bindings are introduced by the expression, tho the as and with clauses following it can generate bindings.

    Following the sigil'd expression, you can use an optional as ... clause, interpreted the same as for previous matchers, to bind the match value before it's consumed and transformed by the match protocol. (Same as how [a,b] as c binds the match value before it's destructured as a sequence.)

    Following that, you can use an optional with <matcher> clause to continue evaluating matchers against the match value after it's consumed and transformed by the match protocol. If that succeeds, the match is successful, introducing bindings as normal; if the <matcher> clause fails, the overall match fails.

    For example:

    class FirstLast {
    	[Symbol.matcher](matchVal) {
    		const pieces = matchVal.split(" ");
    		if(pieces.length == 2) {
    			return pieces;
    		}
    	}
    }
    
    match("Tab Atkins-Bittner") {
    	when ^FirstLast as x with [first, last] -> ...;
    }

    In this example, x ends up bound to "Tab Atkins-Bittner", the match value at the start of the custom matcher, and then the result of running the custom matcher (a 2-element array) is matched against an array-matcher which binds first to "Tab" and last to "Atkins-Bittner" However:

    match("Tab Atkins-Bittner") {
    	when ^FirstLast with {first, last} -> /* fails */;
    }

    would fail to match; the custom matcher protocol succeeds, returning a 2-element array, but then the with {first, last} fails because that array does not have first or last properties on it.

    As another example, regex objects could be a custom matcher, which just returns the .exec() method as its [Symbol.matcher]: when (MyCustomRegex("foo(b.+)")) with [_, bplus] -> bplus matches "foobar", binding "bar" to the bplus variable, but would fail against "fooqux".

    (Note that I also have regex literals as a leaf matcher, to address the most common case, but it's probably useful to allow getting at the match object this way, too.)

"Multi-clauses", letting you join multiple matchers together:

  1. <matcher> | <matcher> runs each child matcher in order until it finds one that succeeds; if at least one succeeds, the matcher clause as a whole is treated as succeeding.

    Bindings are accumulated from all of the child matchers, even if they're not run because an earlier one succeeded, but values are taken only from the successful matcher; the rest are bound to undefined.

  2. <matcher> & <matcher> runs each child matcher in order until it finds one that fails; if they all succeed, the matcher clause as a whole is treated as succeeding.

    Bindings are accumulated from all of the child matchers, with later matchers overriding same-named bindings from earlier ones.

Either of these can be wrapped in parentheses and optionally followed by an as ... clause, like when ["go", ("north" | "south" | "east" | "west") as direction] -> ...;.

(Without the parens, the as binds more tightly; when {first} | [first] as x -> ...; is interpreted as when {first} | ([first] as x) -> ...;, only binding the match value to x when it's a sequence, not when it's an object.)

There is no precedence relationship between | and &, so they cannot be mixed at the same expression level; when 1 | 2 & 3 -> ...; is a syntax error, and must instead be written as either when (1 | 2) & 3 -> ...; or when 1 | (2 & 3) -> ...;.

(Multi-clauses are a strong nice-to-have; they can be dropped from v1 if necessary.)

Potential Issues

  1. Currently, if is only allowed at the top-level of when, as that gives an easy answer for what bindings should be visible to it (same as for the RHS). There's reasonable use-cases to put guards on individual matchers, but I'm unsure what bindings they would see.

  2. For optional/unused bindings that nevertheless get accumulated, should they be bound to undefined or TDZ'd? I think undefined is what we want, so you can actually use the binding without risking an error, right?

  3. Should the thing after ... in array/object literal matchers be allowed to be any matcher clause, rather than just an ident matcher? Would allow for "if there's more values, destructure it like this", which could be cool, but might be weird? Like, allowing the last two items to be optional and binding them would be spelled [a, b, ...[c, ...[d]].)

  4. Related, should object-literal matchers gain the ability to mark a key as optional, like {foo?:...}? If the key exists on the match value, the <matcher> is run and can fail the match, but if not, that's okay?

    Destructuring has a closely-related ability where you can put an = <default> following the key/val pattern, and it'll do further nested destructuring based on the default value if the key was missing. That doesn't quite work here, but it's the same idea.

  5. Precisely how should the array-literal matcher semantics work for iterators? You can destructure an iterator into an array destructuring pattern, so it has to work in matchers as well, but precisely how?

    Notably, you can have multiple array patterns in a match statement: you don't want the first [a,b,c] pattern to consume the first three elements and a following [a,b] to consume the fourth and fifth. This implies we need to cache results across a match statement. Do we only do this caching for top-level patterns? Or do we somehow do it for nested patterns too? (I don't see how we could do it for nested patterns that descend from a custom matcher, since the matcher protocol can return anything.)

    Also, how much should we consume from the iterator? I think it's a requirement that [a,b] be an implied "length is exactly 2" test; this is different from destructuring, but every pattern-matching feature I know of works this way. To tell this, we need to consume one more than the length, to know whether there's more left or not.

    Also, should we ensure a constant amount of consumption? If I have when {foo} -> ...; when [a,b,c] -> ...;, and my iterator object matches the first clause, will any element be consumed from it? If so, that means the amount of consumption depends on how far it gets thru the pattern, which could be weird. Maybe it's better to pull off the maximum amount needed for any pattern, cache that, and use the cache. This would also ensure that all the consumption happens at one point, rather than observably spread out over time as we attempt to match each clause, with other computations possibly happening between them.

    This ties into the final question: when [head, ...rest] -> rest instanceof Array should always return true, right? That is, a ... item ensures we consume the entire list and stash it into an array for the binding, rather than somehow setting rest to the partially-consumed iterator? After all, I have no idea how we would do that with something like when [a,b,c,d,e] -> ...; when [head, ...rest] -> ...; - you've already consumed six items from the iterator to test the first clause, you can't "put them back" for the second.

    And building on that, should a naked ... (no binding ident) consume anything? By itself, it just relaxes the length constraint: [a,b] means "must have exactly two items", while [a,b, ...] means "must have two or more items", so technically it doesn't require us to consume a third item at all. Could be useful to mandate this, to help with controlling the precise amount consumed from iterators.

  6. Currently we have a divide between naming bindings directly with ident matchers and naming bindings with as ... clauses.

    This divide is good because it lets the array/object matcher patterns look as much like destructuring as possible. It's bad because there are two ways to do things, and it technically prevents us from ever adding new named primitives to JS. (Currently, we could add new names for predefined primitive like DoubleInfinity by making them overridable, like undefined is. This is no longer possible if the proposed matcher syntax is adopted, as the name will already be possibly poisoned by usage as an ident matcher. We'd have to exclude that new primitive from being used directly, and instead mandate that it be done as an expr-matcher like ^DoubleInfinity)

    One way to fix the divide is to follow PEP 642's (rejected) syntax of making all bindings done thru as patterns, so to bind the first value of an array to foo you'd write when [as foo, ...]. This breaks the nice symmetry with destructuring patterns, tho.

    Another way to fix it is to remove primitive matchers and just handle that use-case via expr-matchers. Instead of when {code:200} -> ..., do when {code:^200} -> .... We might want to slightly expand the grammar of what's allowable without parens, so ^-1 and even ^-Infinity are allowed.

    Or we just leave it as it is, because the set of predefined names for primitives is fixed in practice anyway and we don't expect to ever extend it.

  7. Async match clauses seem reasonable. If used in an async function, await in the match-body is already allowed, of course, but having a match(){} that evaluates to a promise, a la async-do expressions, seems great to have outside of async functions as well. (As with async-do, this is sugar over wrapping it in an IIAFE, but I'm always of the opinion that this is a worthwhile sugar, as IIFE/IIAFEs are terrible.)

    Two possibilities here: an async match(){} rule, where the match bodies can all use await if they want, and we Promise.resolve() the result; or a match(){ when ... -> async do {...};, just relying on async-do to exist and handle this for us. The former has the benefit that it allows for async match(await foo()){...}, while the latter requires another wrapping async-do to accomplish await-in-head.

  8. A top-level unconditional match without an if() clause (like when foo -> ...;) will always match, same as an else branch. Since else is limited to occurring only once and must be at the end, should we restrict the unconditional-match case the same way?

    I'm lightly opposed to this, because it means that you can't, in the course of authoring or debugging your rule, ever temporarily change a branch to be just when foo temporarily (so you can console.log() it, for example, to see what's failing the preceding branches). You'd have to remember to write it as when foo if(true) -> ...; which seems gratuitous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment