Skip to content

Instantly share code, notes, and snippets.

@bkietz
Last active December 18, 2023 16:10
Show Gist options
  • Save bkietz/5a1f2027b7f548c619ad052f44f9767a to your computer and use it in GitHub Desktop.
Save bkietz/5a1f2027b7f548c619ad052f44f9767a to your computer and use it in GitHub Desktop.
A sketch of a simple pattern matching syntax for c++

Pattern Matching

The current proposal for pattern matching (P0095R1) seems overly heavy to me. The following recombines existing syntax and concepts into a smaller modification providing a fairly expressive solution for handling variants.

💭
Instead of addressing discriminated unions with a full language level variant, let it be a concept which types like std::variant<...> implement.
// the interface for interacting with discriminated unions is visit()
template <typename T>
concept Visitable = requires(T a) { visit([](auto&&){}, a); };
💭
Add a new statement union switch for visiting variants with less boilerplate.

This will just be syntactic sugar around visit(), in much the same way range-based-for is sugar around begin() and end().

union-switch-statement:

union switch ( variant-expression…​ ) { union-case-statement…​ }

union-case-statement:

case case-template-parameters opt parameter-declaration-clause : statement

case-template-parameters:

< template-parameter-list >

variant-expression:

⇒ satisfies Visitable

(C# has a very similar control structure: switch statement with type pattern.)

Rather than a constant expression, case labels are followed by optionally templated parameter lists. These are used to unpack the variant-expression(s) using the robust mechanism of overload resolution. The union switch statement is evaluated exactly as if the variant were visited by a closure with an overloaded call operator for each union-case-statement. Each call operator’s parameter list and body would be identical to the union-case-statement's parameter list and labeled statement.

✏️
For example:
void obligatory_and_artificial(std::variant<int, double, std::vector<int>, std::string, std::wstring> v)
{
  extern int &evil_counter();

  union switch (v)
  {
  case std::vector<int> &v:
    v.push_back(evil_counter()++);
    std::cout << "the vector was " << v.size() << " long" << std::endl;

  case<typename Char> std::basic_string<Char> const &s:
    std::cout << "the string was " << s.size() << " long" << std::endl;

  case auto int_or_double:
    std::cout << "it was a number: " << int_or_double << std::endl;
  }
}
💩
This could be considered equivalent to:
struct closure
{
  void operator()(std::vector<int> &v) const
  {
    v.push_back(evil_counter()++);
    std::cout << "the vector was " << v.size() << " long" << std::endl;
  }

  template<typename Char>
  void operator()(std::basic_string<Char> const &s) const
  {
    std::cout << "the string was " << s.size() << " long" << std::endl;
  }

  template <typename T>
  void operator()(T int_or_double) const
  {
    std::cout << "it was a number: " << int_or_double << std::endl;
  }

  using evil_counter_t = int&();
  evil_counter_t &evil_counter;
};

void obligatory_and_artificial(std::variant<int, double, std::vector<int>, std::string, std::wstring> v)
{
  extern int &evil_counter();

  visit(closure{ evil_counter }, v);
}

Unlike a switch statement, there is no fall through between cases in union switch; exactly one case’s statement will be executed. As a consequence, a union switch wherein a visited argument list does not match is ill formed (so a union switch with no cases would necessarily be ill formed). As a nice side effect, the lack of fall through means we wouldn’t have to wrap declaration statements in {} because there would be no transfer of control into the scope of variables from different cases.

Unlike a switch statement, a default case should probably not be allowed because it may be impossible to declare a parameter list which matches all visited argument lists but matches them less than any of the explicit parameter lists. The default case would have to be equivalent to case ... or case auto&&... and would conflict with those cases if they were explicitly provided. IMHO the slightly increased expressiveness of a default case does not warrant the added subtlety of remembering the parameter list it implies.

💭
Since Visitable is a concept, we can easily define unboxing for existing types which act like discriminated unions. For example, we could expose nlohmann::json as a discriminated union:
template <typename Visitor>
void visit(Visitor &&vis, json &j)
{
  switch (j.type())
  {
  case json::value_t::object:
    vis(j.get<json::object_t>());
    break;
  case json::value_t::array:
    vis(j.get<json::array_t>());
    break;
  // ...
  }
}

// now we can use union switch on jsons

void print_one_line_summary(std::ostream &os, json &j)
{
  union switch (j)
  {
  case json::object_t const &o:
    os << "object { size=" << o.size() << " }";
  case json::array_t const &a:
    os << "array { size=" << a.size() << " }";
  case json::string_t const &s:
    auto eol = s.find('\n');
    if (eol == json::string_t::npos)
    {
      os << std::quoted(s);
    }
    else
    {
      auto extra_line_count = std::count(s.begin(), s.end(), '\n');
      auto first_line = s.substr(0, eol);
      os << std::quoted(first_line) << " (" << extra_line_count << " more lines)";
    }
  case auto &&other_scalar:
    os << other_scalar;
  }
}

json concatenate(json const &left, json const &right)
{
  json ret;
  union switch (left, right)
  {
  case json::object_t const &left, json::object_t const &right:
    ret = left;
    for (auto it = right.begin(); it != right.end(); ++it)
    {
      ret[it->key()] = it->value();
    }
  case json::array_t const &left, json::array_t const &right:
    ret = left;
    ret.insert(ret.end(), right.begin(), right.end());
  case json::string_t const &left, json::string_t const &right:
    ret = left + right;
  case ...:
    ret = nullptr;
  }
  return ret;
}

Pattern matching like this would also expedite querying for interfaces. For example, if your class hierarchy is wired with LLVM-style RTTI you could implement visit() to take a pointer to the base and call getKind() then switch on the result and cast to a pointer to the appropriate derived class. After this, you can use union switch to down cast a pointer at whatever level you need. For example, if you needed to

union switch (decl)
{
case clang::NamedDecl *ND:
  // do something NamedDecl-specific
  llvm::raw_os_ostream OS(std::cout);
  ND->printQualifiedName(OS);

case clang::TypeDecl *TD:
  // TypeDecl inherits NamedDecl so TD is-a NamedDecl but
  // this case overrides the above case for NamedDecl
  auto T = TD->getTypeForDecl();
  T->dump();

case ...:
  decl->dump();
}
What this doesn’t address
  • Pattern matching is capable of much more than just unboxing variants but the above proposal is not.

  • return and break will require some reinterpretation inside a union-case-statement. If the case statement is transposed unchanged to the body of a call operator as described above return; will terminate the union switch and non-void return would be ill formed. Also break; would not be allowed. This is highly non-intuitive and we’d need to replace return; with a break; at the very least: union switch (v) { case ...: break; }struct closure { void operator()(...) const { return; } };

  • Duplicated types- I feel this is best handled by tags. Actually that’s part of why I wrote rekt

@hyperswine
Copy link

interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment