Skip to content

Instantly share code, notes, and snippets.

@josevalim
Last active December 17, 2021 18:47
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save josevalim/5c6735a4b90acc1bafdafec09acabe4f to your computer and use it in GitHub Desktop.
Save josevalim/5c6735a4b90acc1bafdafec09acabe4f to your computer and use it in GitHub Desktop.

This is a proposal for adding a new feature to Elixir. See the discussion here: https://groups.google.com/g/elixir-lang-core/c/jesGwAl8E1s

Update: this proposal has been retracted.

There is prior art in languages like Common Lisp, Haskell, and even in C# with LINQ on having very powerful comprehensions as part of the language. While Elixir comprehensions are already very expressive, allowing you to map, filter, reduce, and collect over multiple enumerables at the same time, it is still not capable of expressing other constructs, such as map_reduce.

The challenge here is how to continue adding more expressive power to comprehensions without making the API feel massive. That's why, 7 years after v1.0, only two new options have been added to comprehensions, :uniq and :reduce, to a total of 3 (:into, :uniq, and :reduce).

Imperative loops

I have been on the record a couple times saying that, while many problems are more cleanly solved with recursion, there is a category of problems that are much more elegant with imperative loops. One of those problems have been described in the "nested-data-structures-traversal" repository, with solutions available in many different languages. Please read the problem statement in said repository, as I will assume from now on that you are familiar with it.

Personally speaking, the most concise and clear solution is the Python one, which I reproduce here:

section_counter = 1
lesson_counter = 1

for section in sections:
    if section["reset_lesson_position"]:
        lesson_counter = 1

    section["position"] = section_counter
    section_counter += 1

    for lesson in section["lessons"]:
        lesson["position"] = lesson_counter
        lesson_counter += 1

There are many things that make this solution clear:

  • Reassignment
  • Mutability
  • Sensitive whitespace

Let's compare it with the Elixir solution I wrote and personally prefer. I am pasting an image below which highlights certain aspects:

Screenshot 2021-12-13 at 10 02 48

  • Lack of reassignment: in Elixir, we can't reassign variables, we can only rebind them. The difference is, when you do var = some_value inside a if, for, etc, the value won't "leak" to the outer scope. This implies two things in the snippet above:

    1. We need to use Enum.map_reduce/3 and pass the state in and out (highlighted in red)
    2. When resetting the lesson counter, we need both sides of the conditional (hihhlighted in yellow)
  • Lack of mutability: even though we set the lesson counter inside the inner map_reduce, we still need to update the lesson inside the session (highlighted in green)

  • Lack of sensitive whitespace: we have two additional lines with end in them (highlighted in blue)

As you can see, do-end blocks add very litte noise to the final solution compared to sensitive whitespace. In fact, the only reason while I brought it up is so we can confidentaly discard it from the discussion from now on. And also because there is zero chance of the language suddenly becoming whitespace sensitive.

There is also zero chance of us introducing reassignment and making mutability first class in Elixir too. The reason for this is because we all agree that, the majority of the times, lack of reassignment and lack of mutability are features that make our code more readabily and understandable in the long term. The snippet above is one of the few examples where we are in the wrong end of the trade-offs.

Therefore, how can we move forward?

Comprehensions

Comprehensions in Elixir have always been a syntax sugar to more complex data-structure traversals. Do you want to have the cartersian product between all points in x and y? You could write this:

Enum.flat_map(x, fn i ->
  Enum.map(y, fn j -> {i, j} end)
end)

Or with a comprehension:

for i <- x, j <- y, do: {i, j}

Or maybe you want to brute force your way into finding Pythagorean Triples?

Enum.flat_map(1..20, fn a ->
  Enum.flat_map(1..20, fn b ->
    1..20
    |> Enum.filter(fn c -> a*a + b*b == c*c end)
    |> Enum.map(fn c -> {a, b, c} end)
  end)
end)

Or with a comprehension:

for a <- 1..20,
    b <- 1..20,
    c <- 1..20,
    a*a + b*b == c*c,
    do: {a, b, c}

There is no question the comprehensions are more concise and clearer, once you understand their basic syntax elements (which are, at this point, common to many languages).

As mentioned in the introduction, we can express map, filter, reduce, and collect inside comprehensions. But how can we represent map_reduce in a clear and concise way?

The :map_reduce option

Since we have :reduce in comprehensions, we could introduce :map_reduce. The solution above would look like this:

{sections, _acc} =
  for section <- sections, map_reduce: {1, 1} do
    {section_counter, lesson_counter} ->
      lesson_counter = if section["reset_lesson_position"], do: 1, else: lesson_counter

      {lessons, lesson_counter} =
        for lesson <- section["lessons"], map_reduce: lesson_counter do
          lesson_counter ->
            {Map.put(lesson, "position", lesson_counter), lesson_counter + 1}
        end

      section =
        section
        |> Map.put("lessons", lessons)
        |> Map.put("position", section_counter)

      {section, {section_counter + 1, lesson_counter}}
  end

While there is a bit less noise compared to the original solution, the reduction of noise mostly happened by the removal of modules names and a few tokens, such as fn, (, and ). In terms of implementation, there is still a lot of book keeping required to manage the variables. Can we do better?

Introducing :let

Our goal is to declare variables that are automatically looped within the comprehension. So let's introduce a new option that does exactly that: :let. :let expects one or a tuple of variables that will be reused across the comprehension. At the end, :let returns a tuple with the comprehension elements and the let variables.

Here is how the solution would look like:

section_counter = 1
lesson_counter = 1

{sections, _} =
  for section <- sections,
      let: {section_counter, lesson_counter} do
    lesson_counter = if section["reset_lesson_position"], do: 1, else: lesson_counter

    {lessons, lesson_counter} =
      for lesson <- section["lessons"], let: lesson_counter do
        lesson = Map.put(lesson, "position", lesson_counter)
        lesson_counter = lesson_counter + 1
        lesson
      end

    section =
      section
      |> Map.put("lessons", lessons)
      |> Map.put("position", section_counter)

    section_counter = section_counter + 1
    section
  end

The :let option automatically takes care of passing the variables across the comprehension, considerably cutting down the noise, without introducing any mutability into the language. At the end, for+:let returns the result of the comprehension plus the :let variables wrapped in a tuple.

Extensions

Here are some extensions to the proposal above. Not all of them might be available on the initial implementation.

Let initialization

You can also initialize the variables within let for convenience:

{sections, _} =
  for section <- sections,
      let: {section_counter = 1, lesson_counter = 1} do

This should be available in the initial implementation.

:reduce vs :let

With :let, :reduce becomes somewhat redundant. For example, Enum.group_by/2 could be written as:

for {k, v} <- Enum.reverse(list), reduce: %{} do
  acc -> Map.update(acc, k, [v], &[v | &1])
end

with :let:

{_, acc} =
  for {k, v} <- Enum.reverse(list), let: acc = %{} do
    acc = Map.update(acc, k, [v], &[v | &1])
  end

The difference, however, is that :let returns the collection, while :reduce does not. While the Elixir compiler could be smart enough to optimize away building the collection in the :let case if we don't use it, we may want to keep both :let and :reduce options for clarity. If this is the case, I propose to align the syntaxes such that :reduce uses the same semantics as :let. The only difference is the return type:

for {k, v} <- Enum.reverse(list), reduce: acc = %{} do
  acc = Map.update(acc, k, [v], &[v | &1])
end

This can be done in a backwards compatible fashion.

after

When you look at our solution to the problem using let, we had to introduce temporary variables in order to update our let variables:

{lessons, lesson_counter} =
  for lesson <- section["lessons"], let: lesson_counter do
    lesson = Map.put(lesson, "position", lesson_counter)
    lesson_counter = lesson_counter + 1
    lesson
  end

One extension is to add after to the comprehensions, which are computed after the result is returned:

{lessons, lesson_counter} =
  for lesson <- section["lessons"], let: lesson_counter do
    Map.put(lesson, "position", lesson_counter)
  after
    lesson_counter = lesson_counter + 1
  end

This does not need to be part of the initial implementation.

Summary

Feedback on the proposal and extensions is welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment