Skip to content

Instantly share code, notes, and snippets.

@evancz
Last active April 2, 2024 20:16
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save evancz/07436448b7d6c947f21742dab46d1218 to your computer and use it in GitHub Desktop.
Save evancz/07436448b7d6c947f21742dab46d1218 to your computer and use it in GitHub Desktop.
Recursive values in Elm

Recursive Values

In a functional language, writing recursive functions is common, but it is also possible to write recursive values. Here is a simple example of a recursive value:

ones = 1 :: ones

This is an infinite loop. We just keep growing the ones list to infinity. So #873 logged folks running into this and made it an error. This was especially nice for cases like x = x + 1 where folks were expecting Elm to allow mutation.

It turns out it is more complicated than this, and #1591 logs some reports that came after the initial detection of recursive values.

Problem

Building mutually recursive values is really nice in the context of Json.Decode.Decoder, Random.Generator, Parser.Parser, etc. where you have values that describe some computation you may want to use or combine with others. Most Elm users have probably seen this with recursive JSON decoders, but I see it with recursive parsers like this.

The example from #1529 is a bit odd, but it demonstrates the core problem:

import Random

rollIf : Bool -> Random.Generator Int
rollIf b =
  if b then
    Random.int 1 6
  else
    rollAgain

rollAgain : Random.Generator Int
rollAgain =
  roll

roll : Random.Generator Int
roll =
  Random.bool
    |> Random.andThen rollIf

If I say Random.generate identity roll it will keep rolling booleans until one is True, and then generate an integer between 1 and 6.

These definitions all depend on each other, forming a cycle:

cycle

The problem is, what order do we generate the definitions? There are many possible orders, but only one works:

// BAD ORDER - because `roll` is not defined yet
var rollIf = function(b) { return b ? _Random_int(1,6) : rollAgain; };
var rollAgain = roll;  // RUNTIME ERROR!
var roll = _Random_andThen(_Random_bool, rollIf);

// BAD ORDER - because `rollIf` is not defined yet
var roll = _Random_andThen(_Random_bool, rollIf); // RUNTIME ERROR!
var rollIf = function(b) { return b ? _Random_int(1,6) : rollAgain; };
var rollAgain = roll;

// GOOD ORDER
var rollIf = function(b) { return b ? _Random_int(1,6) : rollAgain; };
var roll = _Random_andThen(_Random_bool, rollIf);
var rollAgain = roll;

Based on the examples here and ones I have constructed in thinking about it, I believe it is impossible (in general) to figure out if a good order exists, and if so, what it is. I assume someone can reduce this to some sort of NP problem.

If you think there is a solution, please consider all of the examples in #1591, this parser, and then work hard to poke holes in your idea.

Possibilities

Before we look at ways to resolve this in Elm, let’s look at how it is resolved in other languages.

In Haskell

In Haskell, all values are lazy. To simplify why this is a solution, it is as if the program was actually generated like this:

function rollIf(b) { return b ? _Random_int(1,6) : rollAgain(); };
function rollAgain() { return roll(); }
function roll() { return _Random_andThen(_Random_bool, rollIf()); }

So instead of trying to actually instantiate roll and rollAgain they are functions that will be called as necessary. So if anyone uses any of these values later, it will work out independent of the particular order of the definitions.

This could work for us, but without laziness, it requires recomputing and reallocating a generator on every use of roll.

In OCaml

In OCaml, they disallow this kind of definition.

(** COMPILE ERROR **)
let rec rollIf b = if b then Random.int 1 6 else rollAgain
and rollAgain = roll
and roll = Random.andThen rollIf Random.bool
;;

You can have recursive functions or nothing. So you can rewrite it to this:

(** GOOD **)
let rec rollIf b = if b then Random.int 1 6 else rollAgain ()
and rollAgain () = roll ()
and roll () = Random.andThen rollIf Random.bool
;;

Notice that roll and rollAgain are now functions with type () -> Random.Generator Int. This is basically the same trick as above, but you have to do it by hand.

Another Way?

There is a way to generate code that will always work and has better performance than the OCaml approach. You can automatically upgrade values to functions, like in the Haskell solution, and then comput the values afterwards:

// automatically convert values to functions
function rollIf(b) { return b ? _Random_int(1,6) : _delayed_rollAgain(); }
function _delayed_roll() { return _Random_andThen(_Random_bool, rollIf); }
function _delayed_rollAgain() { return _delayed_roll(); }

// define `roll` value for everyone else
// redefine `_delayed_roll` so it only does the work once
var roll = _delayed_roll();
_delayed_roll = function() { return roll; };

// same thing for `rollAgain` and `_delayed_rollAgain`
var rollAgain = _delayed_rollAgain();
_delayed_rollAgain = function() { return rollAgain; };

First, notice that when roll is defined, we call _delayed_roll() just one time. After that, _delayed_roll is modified to return roll directly so we do the computation only once. This is in contrast to OCaml which, by forcing the user to rewrite as functions, would redo the computation on every use.

Second, I believe this disallows cyclic data structures, which could be quite important. If cyclic data structures do not exist in Elm at all, it may give us some unique advantages in garbage collection. For example, without cyclic data it is possible to use ARC and not leak memory over time. That means that (1) some future implementation could get away with ARC as a first draft and (2) papers like this one suggest that this may be a nice way to manage older generations in a more thoughtful GC.

Credit: This comes from a discussion with Aaron VonderHaar. I have not seen it done in other ML-family languages.

Solution

It seems like there are two options:

  1. Go the OCaml way and disallow recursive values.

    • Very explicit about performance costs, but by translating to functions by hand, you must pay that cost on every use.
    • Bad recursive values like this are a compiler error!
    • More annoying for decoders, generators, parsers, etc.
  2. Do the conversion automatically.

    • Small hidden performance cost, but is much cheaper than reallocating on every use.
    • Recursive values (mediated by a lambda) are allowed, and may infinite loop at runtime.
    • Works as expected for decoders, generators, parsers, etc.

So to me the big point is about error message quality for bad recursive values.

Exploring Error Message Quality

The example I linked above boils down to this:

sum = List.foldl (\num total -> num + sum) 0 [1,2,3]

Notice that sum is defined in terms of sum in a seemingly delayed way.

OCaml will reject this. Bad recursive value.

Elm will allow it. If there is a lambda before the value depends on itself, we let it through. So if we then automatically convert it to a function, it will infinite loop. It starts computing sum, which needs sum, which needs sum, etc.

In practice, these values would never be tail call optimized. That means they would always result in a stack overflow. If we have a nice error message for stack overflows, perhaps one that links to an overview of recursion and TCE, this may actually be a really nice way to present the error.

Questions

  • Are there languages that use other strategies?
  • Is my analysis correct? Would this avoid cyclic data structures? Would it never TCO?
  • Does it seem like this trick could be done in assembly?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment