Skip to content

Instantly share code, notes, and snippets.

@evancz
Last active October 27, 2023 12:50
Show Gist options
  • Save evancz/769bba8abb9ddc3bf81d69fa80cc76b1 to your computer and use it in GitHub Desktop.
Save evancz/769bba8abb9ddc3bf81d69fa80cc76b1 to your computer and use it in GitHub Desktop.
The reasoning behind removing user-defined operators

User-Defined Operators

Elm 0.19 removes the syntax for user-defined operators. So it is no longer possible to define things like @@@|> or <#>.

This document presents the reasoning behind this decision.

Theory of Operator Meaning

It is extremely difficult to come up with good operators. I think all of the successful ones have two things in common:

  1. They have some visual relation to their meaning. For example, </> and <?> from the URL parsing library exactly mimic the / and ? symbols in URLs. And |> indicates directionality really clearly. The math operators like + and / are directly related to the math operators taught in every school in the world.

  2. They use the visually simple symbols. I have never seen # or % or @ or $ used to create an excellent operator. I think it is partly that they are so busy visually, mixing curves and lines haphazardly. It is also partly that they do not have super strong cultural meanings as infix operators in general. E.g. $4 means four dollars, but what is 3$4?

Between these two things, there are not actually very many viable possibilities, and all of them are taken. Visual bracketing like </> or |+| expands things a bit, allowing reuse of symbols that already have cultural meaning, but at that point, you are basically constrained to making operators for vector math or something.

Usage in Packages

Elm has had this feature for a while, so I studied how it has been used so far in packages.

As of this writing, there are 1031 packages published for Elm 0.18 and 66 of them have ever defined operators. Core developers who agree with this choice account for about 15 of those, so there are 51 unaccounted for. That means that under 5% of packages are affected.

Usage Trends

I want to highlight some of the usage trends with that 5%:

  1. Haskel Operators - Some authors really like the following operators: >>=, <$>, <*>, <*, *>, >=>, etc. This seems to be one of the more popular uses.

  2. Invented Operators - A fairly small fraction of authors invent very elaborate operators. For example, one package contains |-~->, |-~>, |=~->, |=~>, |~->, |~>, and many others. Another has @@@|>, ?|>, !+>, and many others. This is pretty uncommon.

  3. Math Operators - Some packages are for math. Vector math. Matrix Math. They generally use operators like |+| or |*| that match the cultural norms. This is not as common as I would have hoped actually! More math!

  4. Parser Operators - Parsers in the ML-family of languages can be really lovely. All of the parsing packages I have seen in Elm have special syntax of some sort.

I may be missing some scenarios, but those were the ones that stood out to me.

Root Design Goal

Each of these categories stems from a reasonable design goal. I will try to outline the design goal, and then point a path towards acheiving that goal in a nicer way:

  1. "I want it to be easier to chain tasks." Haskell has this special syntax called do-notation for sequencing tasks. It is pretty neat once you know it, but it it also a major barrier to entry. I struggled with it for about six months at least. Haskell also has a set of operators like >>= that is connected to this special syntax, so I suspect folks settle for having that as a personal compromise. Now, I think other languages have actually accomplished the root goal of "chaining tasks is easy" in nicer ways. For example, F# has computation expressions that are a bit more flexible and do not require integration with a type class mechanism. And C# introduced async/await syntax that gives the same capabilities, but integrates with the language way more cleanly. And Idris generalized that with bang-notation. So I feel that andThen vs >>= is missing the bigger picture here.

  2. I want to write fewer characters. I think this explains the "invented operators" case, but the line between "consise" and "cryptic" is a matter of taste. Is APL concise or cryptic? Is Ruby concise or cryptic? Is Haskell concise or cryptic? Depends who you ask! In Elm, having explicit and readable code is a major design goal. elm-format helps with that. Using qualified values like List.map and Set.map helps with that. So even if you have the |-~-> operator, Elm is not really designed for minimizing character count and will clash with your root goals in other ways.

  3. I want to do math! I really like this goal. I think languages like Julia have done an excellent job at overloading + and * in a reasonable way. Their approach is really lovely, but we would have to lose Elm’s type system to match them. Point being, rather than making |+| and |*| as a stopgap, perhaps it is possible to think about the broader question in a comprehensive way. Should there be a way to overload + for vector and matrix math? How would that work? How would you multiply a vector by a scalar? Perhaps the best design is to restore user-defined operators for bracketed math operations like |+| and |-| with certain types? Or maybe it can just be done in a really nice way with a library. Worth exploring!

  4. I want to parse! Writing parsers can be tricky, and operators are one way to help make things easier. Most parsing packages replicate the Haskell operators, and all of the logic in case (1) applies. Separate from that, it seems like </> and <?> have been quite successful in elm-lang/url, and it seems that |. and |= have been quite successful in elm-lang/parser. These operators are getting special cased like + and -. What is the deal with that?

    Well, many languages special case parsers (e.g. regex in Perl, JS, and Ruby) with very specific costs and benefits. The cost is that if regex cannot handle your scenario, it is very annoying. The benefit is that there is specific knowledge that transfers between different codebases and languages. I think Haskell is the best example of a language that does not special case parsers, also providing specific costs and benifits. The cost is that everyone has to pick between parsec for okay error messages and attoparsec for better perf. Any project I ever created ended up using both parsec and attoparsec through transitive dependencies. (That was frustrating when I was trying to get elm binaries smaller, but it is a much bigger deal for JS bundles where size is super important!) Even though the API for parsec and attoparsec are pretty much identical at a high-level, code does not transfer between because the details are slightly different. On the other hand, the benefit is that you can get a parser tailored for your exact performance or error message needs, and if there is some new insight, someone can make a new parser library around it.

    The design of elm-lang/parser uses the same performance insights from attoparsec and improves upon the error message quality of parsec. It appears to be possible to have both under one API. I also think it makes the most sense for the ecosystem to have one option that distills the best known path. Exploration can still happen (I do my exploration in Haskell because it is great for that) and insights can be brought back without fragmenting the Elm ecosystem.

The broader message here is that we have some fairly specific design problems, and user-defined operators are often stopgap measures. I think it is important to think about languages on a timescale of decades, and by looking at each case directly, I think we can end up with something nicer in the long run.

Usage in Applications

I know some folks also define operators in their application code. Some people really love custom operators. Some people really hate them. We have found with elm-format that just making a choice is an effective way to help teams minimize these debates and focus more on the application.

This case is a bit borderline for me though, especially if you are working on your own. One thing I learned from discovering The Elm Architecture is that it is really lovely to be able to show up in any codebase and know what is going on. I think custom operators detract from that enough that they are not worth it for the whole ecosystem, even if they are great for specific individuals.

History

But why was this feature added in the first place? As far as I can remember, this is how I implemented + and - while I was working on my thesis. This was a naturally exploratory time, and features did not undergo as much scrutiny as they do now.

When assessing features from that time, I ask myself, "If someone proposed adding this feature today, would it get in?" I cannot see user-defined operators getting in. All of the considerations in this document seem to point to there being more specific problems that would benefit from a more specific designs.

Conclusion

I know not everyone agrees with this choice, but I hope this document clarifies some of the thinking behind it.

@wongjiahau
Copy link

wongjiahau commented Feb 15, 2019

@evancz I think this decision is not really justifiable. Firstly, I would like to define what readability is.

To me, there are two types of readability:

  1. Syntactical readability

  2. Semantical readability

And in this situation, what you are stressing here is syntactical readability. However, I want to show you that, syntactical readability and semantical readability are reciprocally related, meaning that if one increases, the other decreases and vice versa.

For example, consider the following algebra equation:

2x + 5 = 3x - 99

If I had written it in a more syntactical readable form (which is readable for people that never learnt algebra), it would be much harder to think about its semantic, and definitely not easier to solve, because more brain power is used for processing syntax rather than semantics.

((2 times x) plus 5) equals ((3 times x) minus 99)

So, I want to stress the fact that people define symbolic operators because they want to improve semantical readability, they want to think at a higher level.

It's the very same reason why you use operational semantic notation to describe the type system of Elm in your thesis(see Figure 2) rather than plain English.

Thus, the fact that you provided a set of predefined operators like + , -, * ,= etc but do not allow users to defined their own operators is a very unbrilliant assumption, where you assumed users would only need Mathematical abstractions, and they won't build their own symbolic abstraction in their codebase.

Moreover, since you pointed out that only 5% of the packages are using custom operators, then why bother to remove this feature? Unless you could prove to me that this feature is being abused in a wrong way in more than 50% of the packages and is making Elm a worse platform, otherwise it is not justifiable to revoke this privilege from the minorities who didn't misuse custom operators at all.

In a nutshell, I hope that you could revise this decision as this is not a necessary breaking change, but rather an opinionated conclusion.

@mzero
Copy link

mzero commented Jul 31, 2019

Of all the changes in 0.19, this is the one that most hurt my code: I have parser combinator library, and used just two custom operators, for the very reasons that Evan points out in at the top.

Now I learn that elm/parser can, and does, define two operators for parsing, for the same reasons my library had done so. There are indeed times when custom embedded languages with custom operators are worth the mental effort on the programming staff. Parsing is one of them, which Evan acknowledges, and indeed uses in elm/parser.

However, it is not realistic to assume that elm/parser will become the only parsing package we ever need. For one, it only works on String. Parsing over byte arrays is quite common, (and what mine did). Even if elm/parse had been parameterized on the stream type - there are still differing implementation and functionality tradeoffs in parsers (backtracking, error tracking, error recovery, etc..) that make different parser libraries useful even they support the same stream type.

@dancojocaru2000
Copy link

If you think about it, users can write unreadable code by writing code. In the next release, let's make elm accept only blank text files.

@Microtribute
Copy link

@evanc I want it back! or I will call the police! :) It's been a while but yeah, that's a nice to have feature. the custom operator is one of the reasons that made me fall in love with Elm. If that does not make your life difficult, why bother removing it? Removing it will break the packages that were written in pre-0.19 versions.

@Reenuay
Copy link

Reenuay commented Mar 27, 2020

Instead of constraining the possibilities of the language you should extend it, imho.

@themaxhero
Copy link

themaxhero commented Jun 13, 2020

You could leave that decision up to the project owner by making a compiler flag.

Copy link

ghost commented Aug 4, 2020

Haskell* (there is typo in above post)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment