Welcome in the Sum Type Ménagerie ! Feel free to skip the explanations and tour the specimens, or alternatively, you know... study the explanation.
Each program, or "specimen" as we'll call them here solves the exact same little problem. I'll explain the problem in terms of set theory, although, the concept can be understood from a type theory viewpoint as being a basic implementation of tagged unions, sum type, or even coproduct (yes, the concept has a few names).
Now, let's pick two sets. We'll call them A and B. I want to construct a set that contains both these sets, in such a way that I can "remember" where the elements of this bigger set came from. That is, if some element is a member of both A and B, and I see that element in my combined set, I want to be able to tell either "It came from the A set", or, "It came from the B set". This implies my combined set will contain two distinct values for the values that are contained in both sets.
I'll call this set Either(A, B) for some arbitrary A and B. For every element a of A, I'll write down the corresponding element in Either(A, B) as Left(a). Likewise, every element b of B in Either(A, B) will be denoted as Right(b).
The first basic thing our specimen will have to do is to provide a universal programming representation for any Either set; that is, it should let us pick arbitrary As and Bs and construct values from the Either(A, B) set.
Next, in order to be able to do something useful with our Either(A, B) values, we should get some means of extracting values from it. We'll want to be able to pick any arbitrary set C, any arbitrary function f from A to C and an arbitrary function g from B to C, and somehow, combine f and g into a h function from Either(A, B) to C. We'll write h = Cata(f, g). Cata is a shorthand for catamorphism, from the greek words meaning downward and shape. "To shape downward", since we are going from members of the Either(A, B) set to members of the C set. *
In our specimens, this Cata(f, g) function will generally be called match (because it allows us to conditionally pick either function depending on which shape a value matches with, and matches is somehow a word many programmers prefer to read). Or, if the programming languages already reserves the match word to mean something else, it will be called cata.
By convention, in each specimen, we'll create the value Right("lol") from the Either(Text, Text) set. Then, we'll apply a catamorphism to it, where the left function is a function that evaluates to the empty text for any input text, and the right function will be the function that evaluates to the text that was passed down to it, for any input text. The result of this operation should be displayed on the screen. That is, the program should display "lol".
*: I'm slightly abusing the language here, since catamorphisms in general are related to recursive data types and Either(A, B) is not a recursive type. What I'm describing is still technically a catamorphism though, since a non-recursive data type can be considered a special case of recursive data types. There are great articles detailing the concept online. Have a peek with Google if you're interested.
In a haskell-like language:
data Settlement
= Paypal (Amount USD)
| Check (Amount USD)
| LoyaltyPoints Int
| ...
messages :: Settlement -> List Message
messages (Check amount) | amount < Amount 20 USD = [warn "Check payments of less than $20 incur a $0.50 processing fee"]
messages (LoyaltyPoints _) = [info "Thanks for your loyalty to ACME services !"]
messages _ = []
In a ML-like language (ie, OCaml or F#):
type Settlement =
| Paypal of UsdAmount
| Check of UsdAmount
| LoyaltyPoints of Int
| ...
;;
let messages : Settlement -> Message list = function
| Check(usdAmount) when usdAmount < 20 -> [warn "Check payments of less than $20 incur a $0.50 processing fee"]
| LoyaltyPoints(_) -> [info "Thanks for your loyalty to ACME services !"]
| _ -> []
;;
The specimens fall roughly in two cases (there may be more than one specimen for a given programming language):
- The base case, with a file named
Either.lang
. These are the somewhat "boring" implementations that use the language's idiomatic mechanism for defining and using data types. - The Higher Order Function-based ones, with a file named
Either-hof.lang
. These are the implementation, that, for better or worse, do not use the host language's data definition idioms. They rely on the language's context capture mechanism instead (or in the very specific case of the C language, manually mimics context capture). Most of the time, this leads to shorter code.
- Adga
- C
- C++
- C#
- Ceylon
- Clojure
- CoffeeScript
- D
- Erlang
- F#
- Go
- Haskell
- Idris
- Java
- JavaScript (ES6)
- Kotlin
- LiveScript
- Lua
- Mercury
- OCaml
- PHP
- Perl
- PowerShell
- Prolog
- PureScript
- Python 2 and 3 (specimens use a compatible subset)
- Ruby
- Rust
- Racket
- Scala
- Swift
- Tulip
- TypeScript
- Untyped lambda calculus (encoded through prolog terms and tested with an interpreter I wrote)
- VB.NET: I'm not really looking forward to that one, but if I find myself in lack of a more interesting language to learn about, I'll certainly do this.
- Vala, Genie: Last time I looked, these two compiled down to C with reference counted GObjects. I wonder what their type systems look like and what would be available for me to implement specimens.
- Delphi: I've noticed commercial accounting software sold in France, which uses that language. May be interesting to dig in a bit.
- Groovy: The reasons for its recent apparent rise in popularity on the JVM are somewhat unclear to me. Is it much more than an optionally typed Java with syntactic sugar and meta-programmable closures ? Writing a specimen won't be enough to answer that question, though.
- Nix: A specimen should be trivial to write. But testing it would be an occasion to learn more about Nix tooling instead of just having a dormant NixOS setup on my laptop.
- ATS: I've read very interesting things about this language. But currently, it simply is a language I can't decypher.
- Eiffel: A verified language. The PL design seems dated and somewhat impractical (i.e., it seems to rely heavily on subtyping). It would be interesting to see what is possible within that language's limits, though.
- Emacs Lisp: Because despite having been an emacs user for years, I still don't know how to hello world in emacs lisp.
- Bash: It lacks anything that would remotely look like a closure.
Because that's bloody fun.
Finally had the time to fully read it. Nicely done, although a practical example with an evey day usecase might help :)