Skip to content

Instantly share code, notes, and snippets.

@tabatkins
Last active September 29, 2023 22:43
Show Gist options
  • Save tabatkins/51f35f88d7eea61d9ecbe3e82da817a5 to your computer and use it in GitHub Desktop.
Save tabatkins/51f35f88d7eea61d9ecbe3e82da817a5 to your computer and use it in GitHub Desktop.
Pattern Matching, 2023-07-10

Pattern Matching Rewrite

Intro

This proposal has several parts:

  1. A new syntax construct, the "matcher pattern", which is an elaboration on (similar to but distinct from) destructuring patterns.

    Matcher patterns allow testing the structure of an object in various ways, and recursing those tests into parts of the structure to an unlimited depth (similar to destructuring).

    Matcher syntax intentionally resembles destructuring syntax but goes well beyond the abilities and intention of simple destructuring.

  2. A new binary boolean operator, is, which lets you test values against matchers. If the matcher establishes bindings, this also pulls those bindings out into the operator's scope.

  3. A new syntax construct, the match() expression, which lets you test a value against multiple patterns and resolve to a value based on which one passed.

Matcher Patterns

Destructuring matchers:

  • array matchers:

    • [<matcher>, <matcher>] exactly two items, matching the patterns
    • [<matcher>, <matcher>, ...] two items matching the patterns, more allowed
    • [<matcher>, <matcher>, ...let <ident>] two items matching the patterns, with remainder collected into a list bound to <ident>. (Can use const or var as well; see "binding matchers". Only binding matchers allowed in that position; not anything else.)
  • object matchers:

    • {<ident>, <ident>} has the ident keys (in its proto chain, not just own keys), and binds the value to that ident. Can have other keys. (aka {a} is identical to {a: let a})
    • {<ident>: <matcher>, <ident>: <matcher>} has the ident keys, with values matching the patterns. Can have other keys.
    • {<ident>: <matcher>, ...let <ident2>} has the ident key, with values matching the pattern. Remaining own keys collected into an object bound to <ident2>.
  • binding matchers:

    • let <ident>/const <ident>/var <ident>. Binds the matchable to the ident. (That is, [let a, let b] doesn't test the items in the array, just exposes them as a and b bindings.)
    • (To bind a matchable and apply more matchers, use and to chain them: let a and [b, c].)

Value-testing matchers:

  • literal matchers:

    • 1,
    • "foo",
    • etc. All the primitives, plus (untagged only?) template literals.
    • also unary plus/minus
    • -0 and +0 test for the properly-signed zero, 0 just uses === equality.
    • NaN tests for NaN properly.
  • variable matchers

    • <plain-or-dotted-ident> evaluates the name.

      If the name has a custom matcher (see below), it passes the matchable to the custom matcher function and matches if that succeeds. Otherwise, it just matches based on equality. (Uses === semantics, except that NaN is matched properly.)

    • <plain-or-dotted-ident>(<matcher-list>) evaluates the ident, grabs its Symbol.matcher property, then invokes it on the matchable. (Throws if it doesn't have a Symbol.matcher property, or it's not a function.) If that succeeds, it further matches the result against the arglist, as if it was an array matcher.

       Option.Some(foo) examples goes here
      
  • regex matchers:

    • /foo/ matches if the regex matches. Named capture groups establish let bindings.
    • /foo/(<matcher-list>) is identical to custom matcher - if the regex matches, then the match result (the regex match object) is further destructured by the matcher list.

Boolean matcher logic:

  • <matcher> and <matcher>: Tests the matchable against both matchers (in order), succeeds only if both succeed. Accumulates bindings from both. If first fails, short-circuits.
  • <matcher> or <matcher>: Tests the matchable against both matchers (in order), succeeds if either succeeds. Accumulates bindings from both, but values only from the first successful matcher (other bindings become undefined). If first succeeds, short-circuits.
  • not <matcher>: Tests the matchable against the matcher, succeeds only if the matcher fails. No bindings.
  • Matchers can be parenthesized, and must be if you're using multiple keywords; there is no precedence relationship between the keywords, so it's a syntax error to mix them at the same level.

Using Matchers

  • New match(){} expression:

     match(<val-expr>) { 
     	when <matcher>: <result-expr>; 
     	default: <result-expr>;
     }

    Find the first "arm" whose matcher passes, given the val. Evaluates to the corresponding result for that arm. The matcher can produce bindings that are visible within the matcher and within the result; they don't escape the arm they're established in. (Are var matchers allowed or disallowed?)

    default arm always matches. If no arm matches, throws.

  • New is operator

     <val-expr> is <matcher>

    Evaluates to true/false if val passes the matcher or not. If the matcher has binding patterns, within the matcher they behave as normal; see below for behavior outside of the matcher.

    Doing it manually with match() would be:

     let passes = match(<val-expr>) {
     	when <matcher>: true;
     	default: false;
     }
  • When is is used and the matcher establishes bindings:

    • In if(), the bindings are lifted to a scope immediately outside the if() block, encompassing the following else as well. (Likely, we define an analogous scope to what for(of) uses.) Lexical bindings are TDZ if the matcher doesn't match. var bindings simply don't set a value if the matcher doesn't match.

      (Bindings will often not be useful in the else, but will be in cases like if(!(x is <matcher>)){...}else{...}, where the matcher successfully matches but the if fails.)

    • In while() and do{}while(), same behavior. (In do{}while(), lexical bindings are TDZ on the first iteration.)

    • In for-of, the bindings exist in the current outer for scope, same as any other bindings established in the for head.

      (TODO: write an example of for-of usage; I'm not clear how it's supposed to work.)

(We've lost matchers in plain let/etc statements, which I guess also means we lose matchers in function arglists. Unfortunate.)

@rbuckton
Copy link

  • can probably remove the "named capture groups create bindings" from regexes, matching {groups} is easy (this would mean that there's no difference between regex literals and regex variables, which is nice)

I would feel comfortable with "named capture groups create bindings" for regexes, but only if it were somehow opt-in, i.e., via a special RegExp flag that is only valid in pattern matching (but ignored or an error in a normal RegExp literal or via the RegExp constructor). That way a refactor of:

// NOTE: using 'L' as a substitute for the flag to mean "introduce a 'let' binding"
if (x is /(?<foo>foo|bar)/L) {
  foo; // either "foo" or "bar"
}

const reFoo = /(?<foo>foo|bar)/L; // error, cannot use `L` flag
if (x is reFoo) {
  foo; // error
}

As we don't want RegExp patterns to become the new with and arbitrarily change the lexical environment based on an expression.

@rbuckton
Copy link

// NOTE: using 'L' as a substitute for the flag to mean "introduce a 'let' binding"
if (x is /(?<foo>foo|bar)/L) {
  foo; // either "foo" or "bar"
}

const reFoo = /(?<foo>foo|bar)/L; // error, cannot use `L` flag
if (x is reFoo) {
  foo; // error
}

I could imagine an automatic refactoring in an editor that would refactor this via two steps:

  1. Refactor /(?<foo>foo|bar)/L into an extractor pattern:
    // from:
    // if (x is /(?<foo>foo|bar)/L) ...
    // to:
    if (x is /(?<foo>foo|bar)/({ groups: { foo: let foo } })) ...
  2. Extract the RegExp into a local variable:
    // from:
    // if (x is /(?<foo>foo|bar)/({ groups: { foo: let foo } })) ...
    // to:
    const reFoo = /(?<foo>foo|bar)/;
    if (x is reFoo({ groups: { foo: let foo } })) ...

@tabatkins
Copy link
Author

Results from 21-08-2023 discussion:

  • talked about predicate vs the default class matcher - default class is intended to be installed on Function.prototype right now, which blocks us form invoking functions directly.

    • option 1 - have class syntax install a default matcher, like it installs a default constructor. Then functions without a Symbol.matcher method are just invoked directly.
    • option 2 - bless one of the cases directly, have a keyword that invokes the other
    • Plan is to put 1 in the spec, with 2 listed as an alternate in case of objections
  • Jordan concerned about if-bindings being visible to else, we had some pushback on previous attempts with doing bindings in if heads tc39/proposal-Declarations-in-Conditionals#3

    • Tab thinks this is the right behavior, both for matchers and for if-head bindings in general, so we should push for this.
    • Jordan is okay with this but we should be prepared to have to change things.
  • Bikeshedding name of Symbol.matcher

    • Jordan hates Symbol.unapply
    • Symbol.extract is a maybe (since this is also used by extractors), tho extractors are the one use of this that doesn't have a name showing up in the syntax
    • Symbol.is is too close to Object.is
    • Settled on Symbol.customMatcher for now - unambiguous and not too long. Will leave a bikeshedding note in the spec with other option names.
  • Inclined to drop the "named groups establish bindings" from regexes, now that they're easy to get from the extractor syntax, since the committee gave some pushback last time.

    • Jordan conditionally okay. Wants to see examples of both numberd and named groups in the spec.

@nmn
Copy link

nmn commented Sep 19, 2023

I wrote a long, almost identical proposal in an issue on the original repo. Now that I've seen this, I would like to propose two small modifications and a few questions:

Let's use ... in Object matchers too?

[<matcher>, <matcher>] matches arrays with exactly two elements. [<matcher>, <matcher>, ...] can be used to match any array with at least two elements.

However, {<ident>: <?matcher>, <ident>: <?matcher>} matches any object with at least those two keys. I understand things get a bit complicated with prototypes, but I feel like this matcher should not match for objects that have additional "own" keys.

e.g. {name: 'John Doe', age: 30} should not match the matcher { name } since it has additional "owned" and "enumerable" keys. { name, ... } should be allowed to match objects that may contain extra keys.

This change makes the whole system more consistent IMO.

Syntax for matching instances of classes

We should support a matcher that looks like Person { name } which works exactly the same as the object matcher { name } but it also checks that the value being matched is instance of Person.

All proposed changes

Here's what I would add to the proposal above:

  • object matchers:

    • {<ident>, <ident>} has the ident keys (in its proto chain, not just own keys), and binds the value to that ident. Can not have other own keys. (aka {a} is identical to {a: let a})
    • {<ident>, <ident>, ...} has the ident keys (in its proto chain, not just own keys), and binds the value to that ident. Can have other own keys
    • {<ident>: <matcher>, <ident>: <matcher>} has the ident keys, with values matching the patterns. Can not have other own keys.
      • This should also work with "getter function" keys
    • {<ident>: <matcher>, <ident>: <matcher>, ...} has the ident keys, with values matching the patterns. Can have other own keys.
    • {<ident>: <matcher>, ...let <ident2>} has the ident key, with values matching the pattern. Remaining own keys collected into an object bound to .
  • class matchers:

    • <ident> <object-matcher> matches the <object-matcher and is an instance of <ident>

@ljharb
Copy link

ljharb commented Sep 19, 2023

Patterns are meant to mimic destructuring as much as possible; it doesn't make sense to me to ever care if an object doesn't have extra keys, especially since you can foo: not x or similar to ban a specific key.

instanceof semantics are terrible and should never be further cemented into the language; the current plan is to make class syntax create a default matcher that approximates the semantics of ensuring a private field exists on the receiver.

@nmn
Copy link

nmn commented Sep 19, 2023

Patterns are meant to mimic destructuring as much as possible; it doesn't make sense to me to ever care if an object doesn't have extra keys.

I don't disagree and this was my proposal in the long issue I wrote. My concern is that I think destructuring should be consistent across Arrays and objects.

Another solution is to change Array matchers to allow extra elements by default:

  • array matchers:

    • [, , void] exactly two items, matching the patterns
    • [, ] two items matching the patterns, more allowed
    • [, , ...let ] two items matching the patterns, with remainder collected into a list bound to . (Can use const or var as well; see "binding matchers". Only binding matchers allowed in that position; not anything else.)

instanceof semantics are terrible and should never be further cemented into the language

I'm not sure I agree. Would you elaborate your reasons for essentially deprecating instanceof?

the current plan is to make class syntax create a default matcher that approximates the semantics of ensuring a private field exists on the receiver.

Even if this is the plan, I believe the Person { name } syntax should be adopted (if viable) instead of Person(let name) that I have seen above. How it works behind the scenes is less important. If it feels like instanceof it won't really matter what it really does to make things work.

@nmn
Copy link

nmn commented Sep 19, 2023

Some other questions about syntax:

  1. Is there no way to re-use parts of the switch-statement syntax? Instead of match {when} could we use match { case <matcher> } instead? Does doing something like this enforce it to become a statement or something?

  2. I like using void to suggest the absence of something!

  3. Would { let: let let } be a valid matcher?

In if(), the bindings are lifted to a scope immediately outside the if() block,

Why are we not scoping the bindings to within the if() {...} block? You can't currently create new bindings to variables within an if condition so there's no prior art here about how a variable should be scoped. Is there a syntactic limitation?

{a} becoming {a: let a} still makes it awkward to just test for property existence.

Let's not make { a } become {a: let a} then? Let { a } simply check for the existence of a key a and require the use of {a: let a} when a binding is needed? I don't think our goal should be the most terse syntax possible. We should try to avoid confusion. Let's not have any object key punning in object matchers at all:

  • { a } checks if the key a exists. That's it.
  • { a: a } checks if the key a exists and is equal to the value of the variable a
  • { a: let a } checks if the key a exists and captures its value in a new variable a

@ljharb
Copy link

ljharb commented Sep 20, 2023

instanceof can be easily faked via Symbol.hasInstance, and it doesn't provide accurate results for cross-realm builtins.

The proposal explicitly and intentionally avoids reusing any part of switch syntax, to increase googleability, and so that switch can finally be put to rest.

@nmn
Copy link

nmn commented Sep 20, 2023

so that switch can finally be put to rest.

It can never be put to rest since JS is append-only. And I suggested match-case instead of switch-case to reduce creating two new keywords and just create one. case as a word makes just as much sense as when so unless there's a technical reason for implementation, I think we should try to minimize the number of new things we introduce.

instanceof can be easily faked via Symbol.hasInstance ... cross-realm builtins

Fair enough, let's not use instanceof semantics, but the Person { name } syntax can still work regardless.

@ljharb
Copy link

ljharb commented Sep 20, 2023

with has been put to rest, despite that it will never be removed from the language. switch will too, since it's horrifically terrible.

Have you read the "priorities" in the readme?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment