Instantly share code, notes, and snippets.

@evancz /operators.md Secret
Last active Nov 5, 2018

Embed
What would you like to do?
The reasoning behind removing user-defined operators

User-Defined Operators

Elm 0.19 removes the syntax for user-defined operators. So it is no longer possible to define things like @@@|> or <#>.

This document presents the reasoning behind this decision.

Theory of Operator Meaning

It is extremely difficult to come up with good operators. I think all of the successful ones have two things in common:

  1. They have some visual relation to their meaning. For example, </> and <?> from the URL parsing library exactly mimic the / and ? symbols in URLs. And |> indicates directionality really clearly. The math operators like + and / are directly related to the math operators taught in every school in the world.

  2. They use the visually simple symbols. I have never seen # or % or @ or $ used to create an excellent operator. I think it is partly that they are so busy visually, mixing curves and lines haphazardly. It is also partly that they do not have super strong cultural meanings as infix operators in general. E.g. $4 means four dollars, but what is 3$4?

Between these two things, there are not actually very many viable possibilities, and all of them are taken. Visual bracketing like </> or |+| expands things a bit, allowing reuse of symbols that already have cultural meaning, but at that point, you are basically constrained to making operators for vector math or something.

Usage in Packages

Elm has had this feature for a while, so I studied how it has been used so far in packages.

As of this writing, there are 1031 packages published for Elm 0.18 and 66 of them have ever defined operators. Core developers who agree with this choice account for about 15 of those, so there are 51 unaccounted for. That means that under 5% of packages are affected.

Usage Trends

I want to highlight some of the usage trends with that 5%:

  1. Haskel Operators - Some authors really like the following operators: >>=, <$>, <*>, <*, *>, >=>, etc. This seems to be one of the more popular uses.

  2. Invented Operators - A fairly small fraction of authors invent very elaborate operators. For example, one package contains |-~->, |-~>, |=~->, |=~>, |~->, |~>, and many others. Another has @@@|>, ?|>, !+>, and many others. This is pretty uncommon.

  3. Math Operators - Some packages are for math. Vector math. Matrix Math. They generally use operators like |+| or |*| that match the cultural norms. This is not as common as I would have hoped actually! More math!

  4. Parser Operators - Parsers in the ML-family of languages can be really lovely. All of the parsing packages I have seen in Elm have special syntax of some sort.

I may be missing some scenarios, but those were the ones that stood out to me.

Root Design Goal

Each of these categories stems from a reasonable design goal. I will try to outline the design goal, and then point a path towards acheiving that goal in a nicer way:

  1. "I want it to be easier to chain tasks." Haskell has this special syntax called do-notation for sequencing tasks. It is pretty neat once you know it, but it it also a major barrier to entry. I struggled with it for about six months at least. Haskell also has a set of operators like >>= that is connected to this special syntax, so I suspect folks settle for having that as a personal compromise. Now, I think other languages have actually accomplished the root goal of "chaining tasks is easy" in nicer ways. For example, F# has computation expressions that are a bit more flexible and do not require integration with a type class mechanism. And C# introduced async/await syntax that gives the same capabilities, but integrates with the language way more cleanly. And Idris generalized that with bang-notation. So I feel that andThen vs >>= is missing the bigger picture here.

  2. I want to write fewer characters. I think this explains the "invented operators" case, but the line between "consise" and "cryptic" is a matter of taste. Is APL concise or cryptic? Is Ruby concise or cryptic? Is Haskell concise or cryptic? Depends who you ask! In Elm, having explicit and readable code is a major design goal. elm-format helps with that. Using qualified values like List.map and Set.map helps with that. So even if you have the |-~-> operator, Elm is not really designed for minimizing character count and will clash with your root goals in other ways.

  3. I want to do math! I really like this goal. I think languages like Julia have done an excellent job at overloading + and * in a reasonable way. Their approach is really lovely, but we would have to lose Elm’s type system to match them. Point being, rather than making |+| and |*| as a stopgap, perhaps it is possible to think about the broader question in a comprehensive way. Should there be a way to overload + for vector and matrix math? How would that work? How would you multiply a vector by a scalar? Perhaps the best design is to restore user-defined operators for bracketed math operations like |+| and |-| with certain types? Or maybe it can just be done in a really nice way with a library. Worth exploring!

  4. I want to parse! Writing parsers can be tricky, and operators are one way to help make things easier. Most parsing packages replicate the Haskell operators, and all of the logic in case (1) applies. Separate from that, it seems like </> and <?> have been quite successful in elm-lang/url, and it seems that |. and |= have been quite successful in elm-lang/parser. These operators are getting special cased like + and -. What is the deal with that?

    Well, many languages special case parsers (e.g. regex in Perl, JS, and Ruby) with very specific costs and benefits. The cost is that if regex cannot handle your scenario, it is very annoying. The benefit is that there is specific knowledge that transfers between different codebases and languages. I think Haskell is the best example of a language that does not special case parsers, also providing specific costs and benifits. The cost is that everyone has to pick between parsec for okay error messages and attoparsec for better perf. Any project I ever created ended up using both parsec and attoparsec through transitive dependencies. (That was frustrating when I was trying to get elm binaries smaller, but it is a much bigger deal for JS bundles where size is super important!) Even though the API for parsec and attoparsec are pretty much identical at a high-level, code does not transfer between because the details are slightly different. On the other hand, the benefit is that you can get a parser tailored for your exact performance or error message needs, and if there is some new insight, someone can make a new parser library around it.

    The design of elm-lang/parser uses the same performance insights from attoparsec and improves upon the error message quality of parsec. It appears to be possible to have both under one API. I also think it makes the most sense for the ecosystem to have one option that distills the best known path. Exploration can still happen (I do my exploration in Haskell because it is great for that) and insights can be brought back without fragmenting the Elm ecosystem.

The broader message here is that we have some fairly specific design problems, and user-defined operators are often stopgap measures. I think it is important to think about languages on a timescale of decades, and by looking at each case directly, I think we can end up with something nicer in the long run.

Usage in Applications

I know some folks also define operators in their application code. Some people really love custom operators. Some people really hate them. We have found with elm-format that just making a choice is an effective way to help teams minimize these debates and focus more on the application.

This case is a bit borderline for me though, especially if you are working on your own. One thing I learned from discovering The Elm Architecture is that it is really lovely to be able to show up in any codebase and know what is going on. I think custom operators detract from that enough that they are not worth it for the whole ecosystem, even if they are great for specific individuals.

History

But why was this feature added in the first place? As far as I can remember, this is how I implemented + and - while I was working on my thesis. This was a naturally exploratory time, and features did not undergo as much scrutiny as they do now.

When assessing features from that time, I ask myself, "If someone proposed adding this feature today, would it get in?" I cannot see user-defined operators getting in. All of the considerations in this document seem to point to there being more specific problems that would benefit from a more specific designs.

Conclusion

I know not everyone agrees with this choice, but I hope this document clarifies some of the thinking behind it.

@joonazan

This comment has been minimized.

joonazan commented Aug 21, 2018

I'd solve the issue of vector libraries etc. by allowing overloading like in Idris. That gives a lot of the benefits of type classes without introducing new concepts. ++ currently works for String and List, so why not allow that elsewhere? That would also get rid of comparable, appendable, number and compappend!

EDIT: Then functions would have to be written separately for Int and Float, but as very few functions want to work on both, that would be the most Elm way to do it. Idris is a decent language for the web, so it makes sense for Elm to be as stripped down as possible to compete in a different niche.

@kspeakman

This comment has been minimized.

kspeakman commented Aug 22, 2018

Looking through our code base, we have a handful of files that use >>= and <!> to mean a flipped Result.andThen and Result.map respectively. These are primarily used for validation on Elm. However, we use these same operators heavily in F# on the server side (for Async Result), so they are very familiar to us. And they were themselves chosen specifically since they were commonly accepted operators for these functions -- not chosen arbitrarily.

We will certainly give Elm 0.19 a try, but my first reaction to this change is one of disappointment. The effect on our code is small compared to other breaking changes in this release. But more importantly, just like you explored with operators in your thesis, how can Elm devs explore and discover use cases like those in the URL parser if the freedom to do so is absent? Has every meaningful operator been discovered already? Should no one else even try? Much like the way Elm's github issues are managed, this change feels very controlling and unnecessary.

To balance out the above, I feel compelled to say. I really appreciate the improvements 0.19 has made and all the effort that went into it. Elm remains the best front-end dev experience known to me. Especially when it comes to maintenance and refactoring. Elm has forever impacted the way I go about development (not just the front end). And I'm not shy about saying so in various dev communities.

@tryshchenko

This comment has been minimized.

tryshchenko commented Aug 22, 2018

Hi, thank you (and community) for your efforts. Overall impression of 0.19 is mostly positive.

Considering the fact you totally break compatibility between 0.18 and 0.19 wouldn't you consider making next (20) version a kind of major release? Just to setup the expectations. We found custom operators quite handy and I would definitely miss them. Also lack of them would postpone migration for a while.

However I would partially agree with your arguments:

  • It was so painful to learn Elm as it's not clear from the first place, what are custom operators, and what was imported without explicit declaration. Getting rid of them may reduce overhead for newcomers.
  • <|--->>>==> is bad.
@antarestrader

This comment has been minimized.

antarestrader commented Aug 23, 2018

Someday I would like to see a moderated panel discussion that included Edward Kmett and Evan Czaplicki. Maybe Uncle Bob Martin could be the moderator.

@yuri-martynov

This comment has been minimized.

yuri-martynov commented Aug 23, 2018

R<|>P elm-community/parser-combinators

I don't want Elm looks like Regex or Perl, but I have rather big DSL with tons of unit tests. So I have to rewrite my parser.
What do you recommend to use in long run?

@yuri-martynov

This comment has been minimized.

yuri-martynov commented Aug 23, 2018

Lets get rid of JS double ==, F# is quite happy with just single =
if y = x then

@fosskers

This comment has been minimized.

fosskers commented Aug 23, 2018

You might consider megaparsec as an alternative to parsec and attoparsec, as it has good performance, a good API, and good errors.

I'm also generally not thrilled with the removal of custom operators. It's been my experience that humans enjoy symbols more than written words. We can recognize them faster, and with enough familiarity, can "feel" the meaning. (APL follows this belief.) Do any of the core devs have experience with East Asian languages? Letter-centric programming is only more meaningful up-front for English speakers.

Otherwise, great job improving the compiler, a lot of people will benefit.

@Mouvedia

This comment has been minimized.

Mouvedia commented Aug 23, 2018

I don't mind the change but it shouldn't be part of a release that removes an operator. This is an overly dogmatic move.
The right way to go about it would have been to first replace the modulo operator and then—once you have collected feedback—pitch the removal of user-defined operator for 0.20. This is brutal and provocative: not smart.

@fstiffo

This comment has been minimized.

fstiffo commented Aug 25, 2018

The module elm/parser can use operators, because is core ... a very strange concept of language syntax rules and language semantics

@madnight

This comment has been minimized.

madnight commented Aug 25, 2018

s/Haskel/Haskell/gI

@wires

This comment has been minimized.

wires commented Aug 26, 2018

I have never seen # or % or @ or $ used to create an excellent operator. I think it is partly that they are so busy visually, mixing curves and lines haphazardly. It is also partly that they do not have super strong cultural meanings as infix operators in general. E.g. $4 means four dollars, but what is 3$4?

Euh ($) : (a -> b) -> a -> b? This whole reasoning is moot. But anyway, not that I care anymore, we are already porting our Elm codebase to Purescript, Idris and ReasonML.

@Bastes

This comment has been minimized.

Bastes commented Aug 29, 2018

Using lenses a lot with monocle eases the pain of nest/optional/converted fields a lot, thanks in part to tu composition operator, that are less obvious in their meaning but more readable to someone used to them...

If it was possible to use generic lens composition (i.e. like it is in haskell, using simple function composition) it would be less of a pain, but as is codebases using it will suffer a great deal :/

@Erudition

This comment has been minimized.

Erudition commented Aug 30, 2018

@fosskers says "humans enjoy symbols more than written words. We can recognize them faster, and with enough familiarity [emphasis added], can 'feel' the meaning." Sure, for things that aren't inherently familiar (like +), with with enough familiarity, we can "feel the meaning" of any symbol, even non-ascii, while saving keystrokes and recognizing faster - so here's a crazy idea: how about we allow custom operators only if they're composed of single character operators from non-ascii Unicode? We should all be encoding files in unicode anyway, and it's widely supported now. Sure, devs will have to set up macros to type them quickly, since they're not on the keyboard - but look at how much new territory there is! We can use more meaningful symbols like ∮ instead of "contourIntegral", and someone foreign to the code can look at it and instantly know "that's a non-core custom operator". No obnoxious lengths. No conflicting with other languages. The worst that could happen is that you don't know what it means, and that could happen with explicit functions too. Is this too crazy of an idea?

@Erudition

This comment has been minimized.

Erudition commented Aug 30, 2018

One Small Step for Elm, One Giant Leap for Readability!

I may be one of the only ones to applaud this change, but here I go. Disclaimer: I'm new to Elm. (but not programming!)

The Good

  • |-~->, |-~>, |=~->, |=~>, |~->, and @@@|> are just plain obnoxious. I won't be so diplomatic as OP: Good riddance.
  • I'd go so far as to say that any operator of more than three characters has little chance at being intuitive. Probably also true for most custom operators with more than one character, honestly (unless it's just a single character surrounded with decoration like |+|).
  • Regarding modulus "I have never seen .. % .. used to create an excellent operator" and "[good operators] have some visual relation to their meaning" are both excellent reasons against the ubiquitous % operator - I never understood the motivation. mod is so short and accurate, and % already has a cultural meaning(/definition?) of "out of a hundred" - but the operator never seems to be used that way in programming. If it was assigned to a function that multiplied the preceding value by 0.01, that'd make a lot more sense.
  • Finally, and most importantly, making sense of a stranger's Elm code without knowing the whole codebase is definitely going to be improved by this. Encountering custom operators are definitely the biggest head-scratcher currently.

Problems

  • |+| and |*| actually make a ton of sense for matrix operations - I haven't seen that before. While I'd prefer to just use + and *, these seem like the next best thing (especially over something like <| matrixAdd now in its place). Since you seem to have no qualms with restricting features to a whitelist, perhaps such operators should have one too.
  • <| and |> have actually been a huge stumbling block for me learning Elm - and I come from Haskell. While I may be one of the few learners of both languages, in Haskell I quickly learned to read $ as "picture a ( here and a ) at the end of everything". While maybe $ isn't the best symbol for that, I have to contradict you and opine that |> is actually worse, precisely because it doesn't do a good job at unambiguously "indicating directionality". It's clear that <| "points" left, but does that mean "the function to the left consumes everything to the right", or "everything to the left gets consumed by the function on the right"? To this day I'm still looking up which is which, because it's now the only way I can make functions infix (since backticks are no longer allowed either). But hey, maybe that's just me. Also, the learning guide didn't mention these operators.
  • "How would you multiply a vector by a scalar?" Um, the way you always do? Multiply the scalar by every cell in the matrix - why should we do it any differently than we do it in math? Is there something I'm missing here?
  • Might want to proofread this announcement. A lot.
@regellosigkeitsaxiom

This comment has been minimized.

regellosigkeitsaxiom commented Sep 7, 2018

In our small company we have small 100+ LoC library called "Sugar" for constantly used stuff.
It has about 30 functions inside, of which about 10 are operators. Two of them are really important, as they dramatically increase code readability:

(?) f x = f ( toString x )
infixr 0 ?

(?.) f x = f ( toString x ++ "px" )
infixr 0 ?.

Across two of my signifiicant active projects of about 15000 LoC in total, these occur more than 1500 times. They are mostly used with SVG:

rect
    [ width ? w
    , height ? h - 50
    , x ? 0
    , y ? 50
    ] []

And we use SVG a lot.

As I see it, replacing them with any prefix function will make things worse, because when I will be looking at code, the first thing I will see will be this semantically useless function, not meaningful code:

rect
    [ _s width w
    , _s height <| h - 50
    , _s x 0
    , _s y 50
    ] []

And please don't suggest width <| toString w or width <| _s w, because it's tedious to type. I want to type working code, not formal greetings.

For our current projects 0.18 is good enough and we will seriously consider not moving to 0.19.

@Erudition

This comment has been minimized.

Erudition commented Sep 8, 2018

@regellosigkeitsaxiom I understand you like typing less, but if I stumbled upon your code and tried to understand what was going on, there's only one case where that would happen: width <| toString w. All the others would require research.
Now, maybe you don't care about readability (or "foreign" readability if that helps), and maybe you're not writing libraries or stuff that anyone else would use, but others will be - and your experience kind of confirms Evan's observation of "if they can, they will"...

For me, width <| toString w would actually be easier to type (because I write code in entire words at a time), but if the tedium is your primary concern, perhaps a simple macro?

@JulianLeviston

This comment has been minimized.

JulianLeviston commented Sep 17, 2018

@Erudition maybe it helps that a mneumonic is <| and |> indicate the flow of the data. So, blah |> x means blah is some data and it's being fed into the function on the right, so x must be a function in that case. f <| x means f is here a function that's having x fed into it as data. So even when we have f <| g y x we must know that f is a function, and g y x evaluates to the data being fed into it.

That's probably why it's hard for you to understand. Everything in terms of function application in Haskell (and even Elm actually) is the other way around... that is data moves conceptually from right to left when functions are applied, . behaves by composing functions where the data flows from right to left. In Elixir, it's the same as Elm... that is... |> indicates the direction of data flow, but there it's a macro so it's something else.

@heronils

This comment has been minimized.

heronils commented Oct 26, 2018

First, i love how you extensively help new users, keeping the language pure at the same time. I support this because i am a new user and because i am interested in learning how to solve problems functionally. Haskell is not easy enough to use for me – especially the API docs are not helpful. I love how you tell people to provide concrete practical code examples. How you ask them to use one word for one concept. I love that after the installation there is just an elm.exe, an uninstaller, an icon and two scripts for adding/removing it to the PATH. I love that elm init brings me to a site where everything is explained ('this file does this, that file does that, put your files here'), especially all the (political but very true) statements after 'How do I structure my directories?'. I hate setting up things as well as premature modularisation. It is a big plus to keep things simple for the beginners. The only solution to complexity is simplicity (Well, and grouping dependent things together in folders [1] :-) ).

However, regarding this issue (i want to have shortcuts and infix notation for frequently used functions, but i also want it make easy for new users to read the code) the following looks like a better solution to me:

How to define operators

Illegal syntax (legal in 0.18), just the operator but no readable function name provided:

(=>) : a -> b -> ( a, b )
a => b =
    ( a, b )

Legal syntax, the readable function name must come first:

makeTuple, (=>) : a -> b -> ( a, b )
makeTuple a b =
    ( a, b )

Illegal syntax, the readable function name must be used during the definition. It helps new readers to understand the code:

makeTuple, (=>) : a -> b -> ( a, b )
a => b =
    ( a, b )

Usage

This code ...

a => b

... formatted with elm format <file> just keeps the core operators:

makeTuple a b

With elm format <file> --use-operators it will use all available operators:

a => b

Usage on websites

On websites like elm packages: in code examples: top right: [x] use operators (when disabled (default), formats the source without --use-operators). Alternatively, show the code formatted without --use-operators when hovering an operator.

Python has it

Python (created by Guido van Usability) has operator overloading in the language. One may not like the Python (i do) but one has to appreciate that the Pythonists (or at least their VIPs) try to keep the language simple to use and easy to learn. So if operator overloading is allowed in Python it must have a point. The use case given by Valentin Shirokov above is one of those points. I can not think of a more elegant way to do it, except using macros. Also, despite the fact that this feature is possible, it is not widely used in Python libraries. No one complains about Python being difficult to read because of operator overloading, at best they do because it could happen, but it doesnt. If you want to restrict usage of cryptic operators – very understandable, see below, this in my opinion includes some operators you use in the core language – just allow a restricted set of operators to be created/overloaded, like those seen in Python plus my exceptions from below, plus . (function composition), plus (to make Valentin happy) ?, whatever this operator means.

Personal, language indepent observations about operators

Dont claim that operators like |> <| </> <?> are readable. I wont believe you. To me no operator having more than one character, which has one of <>?|^´§%*~@ in it, is readable. Exceptions are ==> (from this follows, evaluates to), >> or << (shove into), >>> (print), <<< (user input), <type name>, :: (has type) and the usual boolean, comparision and assignment operators. I wont even include -> or <- (returns thing) in this list of exceptions, it is too generic. It just points to the direction but contains no other information (Edit: actually they fit well into Haskell style type descriptions). Further, please, everybody, always, everywhere, avoid the usage of the following operators: $ (except for currency) ? (except maybe for not implemented), ~ (except maybe for cast? I dont even like it in boolean logic). Further, avoid the usage of % in string formatting, i hate it. Use {} instead. Further, i dont like |s at line start and \ or / at line end. This should, if possible, be done with indentation and with line break and indent consuming separators and keywords like ,, and, or). Further, brackets are underrated. They are the ultimative tool for documenting a context. I love them and i put them on single lines (f you, schemers :-) ). I would never use a $ when i can use brackets. a $ foo $ bar instead of bar(foo(a)) is not more readable. a >> foo >> bar however is.

[1] use version 4.9 of that editor, it is the last which supports indentation sensitive languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment