Skip to content

Instantly share code, notes, and snippets.

@evancz
Last active October 27, 2023 12:50
  • Star 17 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save evancz/769bba8abb9ddc3bf81d69fa80cc76b1 to your computer and use it in GitHub Desktop.
The reasoning behind removing user-defined operators

User-Defined Operators

Elm 0.19 removes the syntax for user-defined operators. So it is no longer possible to define things like @@@|> or <#>.

This document presents the reasoning behind this decision.

Theory of Operator Meaning

It is extremely difficult to come up with good operators. I think all of the successful ones have two things in common:

  1. They have some visual relation to their meaning. For example, </> and <?> from the URL parsing library exactly mimic the / and ? symbols in URLs. And |> indicates directionality really clearly. The math operators like + and / are directly related to the math operators taught in every school in the world.

  2. They use the visually simple symbols. I have never seen # or % or @ or $ used to create an excellent operator. I think it is partly that they are so busy visually, mixing curves and lines haphazardly. It is also partly that they do not have super strong cultural meanings as infix operators in general. E.g. $4 means four dollars, but what is 3$4?

Between these two things, there are not actually very many viable possibilities, and all of them are taken. Visual bracketing like </> or |+| expands things a bit, allowing reuse of symbols that already have cultural meaning, but at that point, you are basically constrained to making operators for vector math or something.

Usage in Packages

Elm has had this feature for a while, so I studied how it has been used so far in packages.

As of this writing, there are 1031 packages published for Elm 0.18 and 66 of them have ever defined operators. Core developers who agree with this choice account for about 15 of those, so there are 51 unaccounted for. That means that under 5% of packages are affected.

Usage Trends

I want to highlight some of the usage trends with that 5%:

  1. Haskel Operators - Some authors really like the following operators: >>=, <$>, <*>, <*, *>, >=>, etc. This seems to be one of the more popular uses.

  2. Invented Operators - A fairly small fraction of authors invent very elaborate operators. For example, one package contains |-~->, |-~>, |=~->, |=~>, |~->, |~>, and many others. Another has @@@|>, ?|>, !+>, and many others. This is pretty uncommon.

  3. Math Operators - Some packages are for math. Vector math. Matrix Math. They generally use operators like |+| or |*| that match the cultural norms. This is not as common as I would have hoped actually! More math!

  4. Parser Operators - Parsers in the ML-family of languages can be really lovely. All of the parsing packages I have seen in Elm have special syntax of some sort.

I may be missing some scenarios, but those were the ones that stood out to me.

Root Design Goal

Each of these categories stems from a reasonable design goal. I will try to outline the design goal, and then point a path towards acheiving that goal in a nicer way:

  1. "I want it to be easier to chain tasks." Haskell has this special syntax called do-notation for sequencing tasks. It is pretty neat once you know it, but it it also a major barrier to entry. I struggled with it for about six months at least. Haskell also has a set of operators like >>= that is connected to this special syntax, so I suspect folks settle for having that as a personal compromise. Now, I think other languages have actually accomplished the root goal of "chaining tasks is easy" in nicer ways. For example, F# has computation expressions that are a bit more flexible and do not require integration with a type class mechanism. And C# introduced async/await syntax that gives the same capabilities, but integrates with the language way more cleanly. And Idris generalized that with bang-notation. So I feel that andThen vs >>= is missing the bigger picture here.

  2. I want to write fewer characters. I think this explains the "invented operators" case, but the line between "consise" and "cryptic" is a matter of taste. Is APL concise or cryptic? Is Ruby concise or cryptic? Is Haskell concise or cryptic? Depends who you ask! In Elm, having explicit and readable code is a major design goal. elm-format helps with that. Using qualified values like List.map and Set.map helps with that. So even if you have the |-~-> operator, Elm is not really designed for minimizing character count and will clash with your root goals in other ways.

  3. I want to do math! I really like this goal. I think languages like Julia have done an excellent job at overloading + and * in a reasonable way. Their approach is really lovely, but we would have to lose Elm’s type system to match them. Point being, rather than making |+| and |*| as a stopgap, perhaps it is possible to think about the broader question in a comprehensive way. Should there be a way to overload + for vector and matrix math? How would that work? How would you multiply a vector by a scalar? Perhaps the best design is to restore user-defined operators for bracketed math operations like |+| and |-| with certain types? Or maybe it can just be done in a really nice way with a library. Worth exploring!

  4. I want to parse! Writing parsers can be tricky, and operators are one way to help make things easier. Most parsing packages replicate the Haskell operators, and all of the logic in case (1) applies. Separate from that, it seems like </> and <?> have been quite successful in elm-lang/url, and it seems that |. and |= have been quite successful in elm-lang/parser. These operators are getting special cased like + and -. What is the deal with that?

    Well, many languages special case parsers (e.g. regex in Perl, JS, and Ruby) with very specific costs and benefits. The cost is that if regex cannot handle your scenario, it is very annoying. The benefit is that there is specific knowledge that transfers between different codebases and languages. I think Haskell is the best example of a language that does not special case parsers, also providing specific costs and benifits. The cost is that everyone has to pick between parsec for okay error messages and attoparsec for better perf. Any project I ever created ended up using both parsec and attoparsec through transitive dependencies. (That was frustrating when I was trying to get elm binaries smaller, but it is a much bigger deal for JS bundles where size is super important!) Even though the API for parsec and attoparsec are pretty much identical at a high-level, code does not transfer between because the details are slightly different. On the other hand, the benefit is that you can get a parser tailored for your exact performance or error message needs, and if there is some new insight, someone can make a new parser library around it.

    The design of elm-lang/parser uses the same performance insights from attoparsec and improves upon the error message quality of parsec. It appears to be possible to have both under one API. I also think it makes the most sense for the ecosystem to have one option that distills the best known path. Exploration can still happen (I do my exploration in Haskell because it is great for that) and insights can be brought back without fragmenting the Elm ecosystem.

The broader message here is that we have some fairly specific design problems, and user-defined operators are often stopgap measures. I think it is important to think about languages on a timescale of decades, and by looking at each case directly, I think we can end up with something nicer in the long run.

Usage in Applications

I know some folks also define operators in their application code. Some people really love custom operators. Some people really hate them. We have found with elm-format that just making a choice is an effective way to help teams minimize these debates and focus more on the application.

This case is a bit borderline for me though, especially if you are working on your own. One thing I learned from discovering The Elm Architecture is that it is really lovely to be able to show up in any codebase and know what is going on. I think custom operators detract from that enough that they are not worth it for the whole ecosystem, even if they are great for specific individuals.

History

But why was this feature added in the first place? As far as I can remember, this is how I implemented + and - while I was working on my thesis. This was a naturally exploratory time, and features did not undergo as much scrutiny as they do now.

When assessing features from that time, I ask myself, "If someone proposed adding this feature today, would it get in?" I cannot see user-defined operators getting in. All of the considerations in this document seem to point to there being more specific problems that would benefit from a more specific designs.

Conclusion

I know not everyone agrees with this choice, but I hope this document clarifies some of the thinking behind it.

@joonazan
Copy link

joonazan commented Aug 21, 2018

I'd solve the issue of vector libraries etc. by allowing overloading like in Idris. That gives a lot of the benefits of type classes without introducing new concepts. ++ currently works for String and List, so why not allow that elsewhere? That would also get rid of comparable, appendable, number and compappend!

EDIT: Then functions would have to be written separately for Int and Float, but as very few functions want to work on both, that would be the most Elm way to do it. Idris is a decent language for the web, so it makes sense for Elm to be as stripped down as possible to compete in a different niche.

@kspeakman
Copy link

Looking through our code base, we have a handful of files that use >>= and <!> to mean a flipped Result.andThen and Result.map respectively. These are primarily used for validation on Elm. However, we use these same operators heavily in F# on the server side (for Async Result), so they are very familiar to us. And they were themselves chosen specifically since they were commonly accepted operators for these functions -- not chosen arbitrarily.

We will certainly give Elm 0.19 a try, but my first reaction to this change is one of disappointment. The effect on our code is small compared to other breaking changes in this release. But more importantly, just like you explored with operators in your thesis, how can Elm devs explore and discover use cases like those in the URL parser if the freedom to do so is absent? Has every meaningful operator been discovered already? Should no one else even try? Much like the way Elm's github issues are managed, this change feels very controlling and unnecessary.

To balance out the above, I feel compelled to say. I really appreciate the improvements 0.19 has made and all the effort that went into it. Elm remains the best front-end dev experience known to me. Especially when it comes to maintenance and refactoring. Elm has forever impacted the way I go about development (not just the front end). And I'm not shy about saying so in various dev communities.

@tryshchenko
Copy link

Hi, thank you (and community) for your efforts. Overall impression of 0.19 is mostly positive.

Considering the fact you totally break compatibility between 0.18 and 0.19 wouldn't you consider making next (20) version a kind of major release? Just to setup the expectations. We found custom operators quite handy and I would definitely miss them. Also lack of them would postpone migration for a while.

However I would partially agree with your arguments:

  • It was so painful to learn Elm as it's not clear from the first place, what are custom operators, and what was imported without explicit declaration. Getting rid of them may reduce overhead for newcomers.
  • <|--->>>==> is bad.

@antarestrader
Copy link

Someday I would like to see a moderated panel discussion that included Edward Kmett and Evan Czaplicki. Maybe Uncle Bob Martin could be the moderator.

@yuri-martynov
Copy link

R<|>P elm-community/parser-combinators

I don't want Elm looks like Regex or Perl, but I have rather big DSL with tons of unit tests. So I have to rewrite my parser.
What do you recommend to use in long run?

@yuri-martynov
Copy link

Lets get rid of JS double ==, F# is quite happy with just single =
if y = x then

@fosskers
Copy link

fosskers commented Aug 23, 2018

You might consider megaparsec as an alternative to parsec and attoparsec, as it has good performance, a good API, and good errors.

I'm also generally not thrilled with the removal of custom operators. It's been my experience that humans enjoy symbols more than written words. We can recognize them faster, and with enough familiarity, can "feel" the meaning. (APL follows this belief.) Do any of the core devs have experience with East Asian languages? Letter-centric programming is only more meaningful up-front for English speakers.

Otherwise, great job improving the compiler, a lot of people will benefit.

@Mouvedia
Copy link

Mouvedia commented Aug 23, 2018

I don't mind the change but it shouldn't be part of a release that removes an operator. This is an overly dogmatic move.
The right way to go about it would have been to first replace the modulo operator and then—once you have collected feedback—pitch the removal of user-defined operator for 0.20. This is brutal and provocative: not smart.

@fstiffo
Copy link

fstiffo commented Aug 25, 2018

The module elm/parser can use operators, because is core ... a very strange concept of language syntax rules and language semantics

@madnight
Copy link

s/Haskel/Haskell/gI

@wires
Copy link

wires commented Aug 26, 2018

I have never seen # or % or @ or $ used to create an excellent operator. I think it is partly that they are so busy visually, mixing curves and lines haphazardly. It is also partly that they do not have super strong cultural meanings as infix operators in general. E.g. $4 means four dollars, but what is 3$4?

Euh ($) : (a -> b) -> a -> b? This whole reasoning is moot. But anyway, not that I care anymore, we are already porting our Elm codebase to Purescript, Idris and ReasonML.

@Bastes
Copy link

Bastes commented Aug 29, 2018

Using lenses a lot with monocle eases the pain of nest/optional/converted fields a lot, thanks in part to tu composition operator, that are less obvious in their meaning but more readable to someone used to them...

If it was possible to use generic lens composition (i.e. like it is in haskell, using simple function composition) it would be less of a pain, but as is codebases using it will suffer a great deal :/

@Erudition
Copy link

@fosskers says "humans enjoy symbols more than written words. We can recognize them faster, and with enough familiarity [emphasis added], can 'feel' the meaning." Sure, for things that aren't inherently familiar (like +), with with enough familiarity, we can "feel the meaning" of any symbol, even non-ascii, while saving keystrokes and recognizing faster - so here's a crazy idea: how about we allow custom operators only if they're composed of single character operators from non-ascii Unicode? We should all be encoding files in unicode anyway, and it's widely supported now. Sure, devs will have to set up macros to type them quickly, since they're not on the keyboard - but look at how much new territory there is! We can use more meaningful symbols like ∮ instead of "contourIntegral", and someone foreign to the code can look at it and instantly know "that's a non-core custom operator". No obnoxious lengths. No conflicting with other languages. The worst that could happen is that you don't know what it means, and that could happen with explicit functions too. Is this too crazy of an idea?

@Erudition
Copy link

Erudition commented Aug 30, 2018

One Small Step for Elm, One Giant Leap for Readability!

I may be one of the only ones to applaud this change, but here I go. Disclaimer: I'm new to Elm. (but not programming!)

The Good

  • |-~->, |-~>, |=~->, |=~>, |~->, and @@@|> are just plain obnoxious. I won't be so diplomatic as OP: Good riddance.
  • I'd go so far as to say that any operator of more than three characters has little chance at being intuitive. Probably also true for most custom operators with more than one character, honestly (unless it's just a single character surrounded with decoration like |+|).
  • Regarding modulus "I have never seen .. % .. used to create an excellent operator" and "[good operators] have some visual relation to their meaning" are both excellent reasons against the ubiquitous % operator - I never understood the motivation. mod is so short and accurate, and % already has a cultural meaning(/definition?) of "out of a hundred" - but the operator never seems to be used that way in programming. If it was assigned to a function that multiplied the preceding value by 0.01, that'd make a lot more sense.
  • Finally, and most importantly, making sense of a stranger's Elm code without knowing the whole codebase is definitely going to be improved by this. Encountering custom operators are definitely the biggest head-scratcher currently.

Problems

  • |+| and |*| actually make a ton of sense for matrix operations - I haven't seen that before. While I'd prefer to just use + and *, these seem like the next best thing (especially over something like <| matrixAdd now in its place). Since you seem to have no qualms with restricting features to a whitelist, perhaps such operators should have one too.
  • <| and |> have actually been a huge stumbling block for me learning Elm - and I come from Haskell. While I may be one of the few learners of both languages, in Haskell I quickly learned to read $ as "picture a ( here and a ) at the end of everything". While maybe $ isn't the best symbol for that, I have to contradict you and opine that |> is actually worse, precisely because it doesn't do a good job at unambiguously "indicating directionality". It's clear that <| "points" left, but does that mean "the function to the left consumes everything to the right", or "everything to the left gets consumed by the function on the right"? To this day I'm still looking up which is which, because it's now the only way I can make functions infix (since backticks are no longer allowed either). But hey, maybe that's just me. Also, the learning guide didn't mention these operators.
  • "How would you multiply a vector by a scalar?" Um, the way you always do? Multiply the scalar by every cell in the matrix - why should we do it any differently than we do it in math? Is there something I'm missing here?
  • Might want to proofread this announcement. A lot.

@regellosigkeitsaxiom
Copy link

In our small company we have small 100+ LoC library called "Sugar" for constantly used stuff.
It has about 30 functions inside, of which about 10 are operators. Two of them are really important, as they dramatically increase code readability:

(?) f x = f ( toString x )
infixr 0 ?

(?.) f x = f ( toString x ++ "px" )
infixr 0 ?.

Across two of my signifiicant active projects of about 15000 LoC in total, these occur more than 1500 times. They are mostly used with SVG:

rect
    [ width ? w
    , height ? h - 50
    , x ? 0
    , y ? 50
    ] []

And we use SVG a lot.

As I see it, replacing them with any prefix function will make things worse, because when I will be looking at code, the first thing I will see will be this semantically useless function, not meaningful code:

rect
    [ _s width w
    , _s height <| h - 50
    , _s x 0
    , _s y 50
    ] []

And please don't suggest width <| toString w or width <| _s w, because it's tedious to type. I want to type working code, not formal greetings.

For our current projects 0.18 is good enough and we will seriously consider not moving to 0.19.

@Erudition
Copy link

Erudition commented Sep 8, 2018

@regellosigkeitsaxiom I understand you like typing less, but if I stumbled upon your code and tried to understand what was going on, there's only one case where that would happen: width <| toString w. All the others would require research.
Now, maybe you don't care about readability (or "foreign" readability if that helps), and maybe you're not writing libraries or stuff that anyone else would use, but others will be - and your experience kind of confirms Evan's observation of "if they can, they will"...

For me, width <| toString w would actually be easier to type (because I write code in entire words at a time), but if the tedium is your primary concern, perhaps a simple macro?

@JulianLeviston
Copy link

@Erudition maybe it helps that a mneumonic is <| and |> indicate the flow of the data. So, blah |> x means blah is some data and it's being fed into the function on the right, so x must be a function in that case. f <| x means f is here a function that's having x fed into it as data. So even when we have f <| g y x we must know that f is a function, and g y x evaluates to the data being fed into it.

That's probably why it's hard for you to understand. Everything in terms of function application in Haskell (and even Elm actually) is the other way around... that is data moves conceptually from right to left when functions are applied, . behaves by composing functions where the data flows from right to left. In Elixir, it's the same as Elm... that is... |> indicates the direction of data flow, but there it's a macro so it's something else.

@nilslindemann
Copy link

nilslindemann commented Oct 26, 2018

First, i love how you extensively help new users, keeping the language pure at the same time. I support this because i am a new user and because i am interested in learning how to solve problems functionally. Haskell is not easy enough to use for me – especially the API docs are not helpful. I love how you tell people to provide concrete practical code examples. How you ask them to use one word for one concept. I love that after the installation there is just an elm.exe, an uninstaller, an icon and two scripts for adding/removing it to the PATH. I love that elm init brings me to a site where everything is explained ('this file does this, that file does that, put your files here'), especially all the (political but very true) statements after 'How do I structure my directories?'. I hate setting up things as well as premature modularisation. It is a big plus to keep things simple for the beginners. The only solution to complexity is simplicity (Well, and grouping dependent things together in folders [1] :-) ).

However, regarding this issue (i want to have shortcuts and infix notation for frequently used functions, but i also want it make easy for new users to read the code) the following looks like a better solution to me:

How to define operators

Illegal syntax (legal in 0.18), just the operator but no readable function name provided:

(=>) : a -> b -> ( a, b )
a => b =
    ( a, b )

Legal syntax, the readable function name must come first:

makeTuple, (=>) : a -> b -> ( a, b )
makeTuple a b =
    ( a, b )

Illegal syntax, the readable function name must be used during the definition. It helps new readers to understand the code:

makeTuple, (=>) : a -> b -> ( a, b )
a => b =
    ( a, b )

Usage

This code ...

a => b

... formatted with elm format <file> just keeps the core operators:

makeTuple a b

With elm format <file> --use-operators it will use all available operators:

a => b

Usage on websites

On websites like elm packages: in code examples: top right: [x] use operators (when disabled (default), formats the source without --use-operators). Alternatively, show the code formatted without --use-operators when hovering an operator.

Python has it

Python (created by Guido van Usability) has operator overloading in the language. One may not like the Python (i do) but one has to appreciate that the Pythonists (or at least their VIPs) try to keep the language simple to use and easy to learn. So if operator overloading is allowed in Python it must have a point. The use case given by Valentin Shirokov above is one of those points. I can not think of a more elegant way to do it, except using macros. Also, despite the fact that this feature is possible, it is not widely used in Python libraries. No one complains about Python being difficult to read because of operator overloading, at best they do because it could happen, but it doesnt. If you want to restrict usage of cryptic operators – very understandable, see below, this in my opinion includes some operators you use in the core language – just allow a restricted set of operators to be created/overloaded, like those seen in Python plus my exceptions from below, plus . (function composition), plus (to make Valentin happy) ?, whatever this operator means.

Personal, language indepent observations about operators

Dont claim that operators like |> <| </> <?> are readable. I wont believe you. To me no operator having more than one character, which has one of <>?|^´§%*~@ in it, is readable. Exceptions are ==> (from this follows, evaluates to), >> or << (shove into), >>> (print), <<< (user input), <type name>, :: (has type) and the usual boolean, comparision and assignment operators. I wont even include -> or <- (returns thing) in this list of exceptions, it is too generic. It just points to the direction but contains no other information (Edit: actually they fit well into Haskell style type descriptions). Further, please, everybody, always, everywhere, avoid the usage of the following operators: $ (except for currency) ? (except maybe for not implemented), ~ (except maybe for cast? I dont even like it in boolean logic). Further, avoid the usage of % in string formatting, i hate it. Use {} instead. Further, i dont like |s at line start and \ or / at line end. This should, if possible, be done with indentation and with line break and indent consuming separators and keywords like ,, and, or). Further, brackets are underrated. They are the ultimative tool for documenting a context. I love them and i put them on single lines (f you, schemers :-) ). I would never use a $ when i can use brackets. a $ foo $ bar instead of bar(foo(a)) is not more readable. a >> foo >> bar however is.

[1] use version 4.9 of that editor, it is the last which supports indentation sensitive languages.

@osmarks
Copy link

osmarks commented Feb 2, 2019

I don't like this. It seems that Elm is becoming more and more like Go, where "regular users" can't be "trusted" with useful features which are sometimes a bit misused.

@wongjiahau
Copy link

wongjiahau commented Feb 15, 2019

@evancz I think this decision is not really justifiable. Firstly, I would like to define what readability is.

To me, there are two types of readability:

  1. Syntactical readability

  2. Semantical readability

And in this situation, what you are stressing here is syntactical readability. However, I want to show you that, syntactical readability and semantical readability are reciprocally related, meaning that if one increases, the other decreases and vice versa.

For example, consider the following algebra equation:

2x + 5 = 3x - 99

If I had written it in a more syntactical readable form (which is readable for people that never learnt algebra), it would be much harder to think about its semantic, and definitely not easier to solve, because more brain power is used for processing syntax rather than semantics.

((2 times x) plus 5) equals ((3 times x) minus 99)

So, I want to stress the fact that people define symbolic operators because they want to improve semantical readability, they want to think at a higher level.

It's the very same reason why you use operational semantic notation to describe the type system of Elm in your thesis(see Figure 2) rather than plain English.

Thus, the fact that you provided a set of predefined operators like + , -, * ,= etc but do not allow users to defined their own operators is a very unbrilliant assumption, where you assumed users would only need Mathematical abstractions, and they won't build their own symbolic abstraction in their codebase.

Moreover, since you pointed out that only 5% of the packages are using custom operators, then why bother to remove this feature? Unless you could prove to me that this feature is being abused in a wrong way in more than 50% of the packages and is making Elm a worse platform, otherwise it is not justifiable to revoke this privilege from the minorities who didn't misuse custom operators at all.

In a nutshell, I hope that you could revise this decision as this is not a necessary breaking change, but rather an opinionated conclusion.

@mzero
Copy link

mzero commented Jul 31, 2019

Of all the changes in 0.19, this is the one that most hurt my code: I have parser combinator library, and used just two custom operators, for the very reasons that Evan points out in at the top.

Now I learn that elm/parser can, and does, define two operators for parsing, for the same reasons my library had done so. There are indeed times when custom embedded languages with custom operators are worth the mental effort on the programming staff. Parsing is one of them, which Evan acknowledges, and indeed uses in elm/parser.

However, it is not realistic to assume that elm/parser will become the only parsing package we ever need. For one, it only works on String. Parsing over byte arrays is quite common, (and what mine did). Even if elm/parse had been parameterized on the stream type - there are still differing implementation and functionality tradeoffs in parsers (backtracking, error tracking, error recovery, etc..) that make different parser libraries useful even they support the same stream type.

@dancojocaru2000
Copy link

If you think about it, users can write unreadable code by writing code. In the next release, let's make elm accept only blank text files.

@Microtribute
Copy link

@evanc I want it back! or I will call the police! :) It's been a while but yeah, that's a nice to have feature. the custom operator is one of the reasons that made me fall in love with Elm. If that does not make your life difficult, why bother removing it? Removing it will break the packages that were written in pre-0.19 versions.

@Reenuay
Copy link

Reenuay commented Mar 27, 2020

Instead of constraining the possibilities of the language you should extend it, imho.

@themaxhero
Copy link

themaxhero commented Jun 13, 2020

You could leave that decision up to the project owner by making a compiler flag.

Copy link

ghost commented Aug 4, 2020

Haskell* (there is typo in above post)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment