sabiwara/beyond_enum.livemd

## beyond_enum.livemd

      
    Raw
  

              beyond_enum.livemd
            
          
Beyond Enum (tokyo.ex #21)

Mix.install([
  {:kino_benchee, git: "https://github.com/livebook-dev/kino_benchee"},
  :enumancer
])

:ok


Let's talk about Enum and performance!
Enum: the MVP module


The Enum module is probably one of the most useful modules except for Kernel
Compensates for the lack of loops in Elixir
More compact and declarative than recursions
Works for any Enumerable data structure, not just lists


Let's start by generating sample data:
:rand.seed(:exsss, {100, 101, 102})
transactions = Enum.map(1..100, fn i -> %{id: i, amount: :rand.uniform(1000)} end)

[
  %{amount: 494, id: 1},
  %{amount: 603, id: 2},
  %{amount: 309, id: 3},
  %{amount: 74, id: 4},
  %{amount: 854, id: 5},
  %{amount: 387, id: 6},
  %{amount: 110, id: 7},
  %{amount: 439, id: 8},
  %{amount: 680, id: 9},
  %{amount: 286, id: 10},
  %{amount: 237, id: 11},
  %{amount: 746, id: 12},
  %{amount: 421, id: 13},
  %{amount: 177, id: 14},
  %{amount: 46, id: 15},
  %{amount: 802, id: 16},
  %{amount: 671, id: 17},
  %{amount: 617, id: 18},
  %{amount: 506, id: 19},
  %{amount: 958, id: 20},
  %{amount: 497, id: 21},
  %{amount: 835, id: 22},
  %{amount: 629, id: 23},
  %{amount: 638, id: 24},
  %{amount: 467, id: 25},
  %{amount: 486, id: 26},
  %{amount: 851, id: 27},
  %{amount: 567, id: 28},
  %{amount: 243, id: 29},
  %{amount: 379, id: 30},
  %{amount: 154, id: 31},
  %{amount: 354, id: 32},
  %{amount: 420, id: 33},
  %{amount: 371, id: 34},
  %{amount: 245, id: 35},
  %{amount: 591, id: 36},
  %{amount: 740, id: 37},
  %{amount: 405, id: 38},
  %{amount: 147, id: 39},
  %{amount: 893, id: 40},
  %{amount: 243, id: 41},
  %{amount: 365, id: 42},
  %{amount: 227, id: 43},
  %{amount: 969, id: 44},
  %{amount: 657, id: 45},
  %{amount: 279, id: 46},
  %{amount: 543, id: 47},
  %{amount: 770, id: 48},
  %{amount: 960, ...},
  %{...},
  ...
]

"SELECT id from TRANSACTIONS WHERE amount > 900".
A typical pipeline could look something like:
transactions
|> Enum.filter(&(&1.amount > 900))
|> Enum.map(& &1.id)

[20, 44, 49, 68, 88, 92]

How about efficiency?
transactions
|> Enum.filter(&(&1.amount > 900))
|> Enum.map(& &1.id)
|> dbg()

[20, 44, 49, 68, 88, 92]

We do build intermediate lists at each step!
In this case, for/1 is perfect:
for t <- transactions, t.amount > 900, do: t.id

[20, 44, 49, 68, 88, 92]

# which is basically:
Enum.reduce(transactions, [], fn t, acc ->
  if t.amount > 900, do: [t.id | acc], else: acc
end)
|> Enum.reverse()

[20, 44, 49, 68, 88, 92]

But for does only allow some cases, no flexibility.
"SELECT id from TRANSACTIONS WHERE amount > 900 LIMIT 5".
The following cannot be expressed using for/1 for instance:
transactions
|> Enum.filter(&(&1.amount > 900))
|> Enum.take(5)
|> Enum.map(& &1.id)

[20, 44, 49, 68, 88]

So, Enum is highly expressive and flexible.
But can we avoid the intermediate structures?
...
I know!
Let's use Stream!
transactions
|> Stream.filter(&(&1.amount > 900))
|> Stream.take(5)
|> Enum.map(& &1.id)

[20, 44, 49, 68, 88]

Benchmarking

defmodule Bench1 do
  def enum(list) do
    list
    |> Enum.filter(&(&1.amount > 900))
    |> Enum.take(5)
    |> Enum.map(& &1.id)
  end

  def stream(list) do
    list
    |> Stream.filter(&(&1.amount > 900))
    |> Stream.take(5)
    |> Enum.map(& &1.id)
  end
end

Benchee.run(
  [
    Enum: &Bench1.enum/1,
    Stream: &Bench1.stream/1
  ],
  time: 1,
  memory_time: 0.2,
  reduction_time: 0.2,
  inputs: [data: transactions]
)

Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.14.1
Erlang 25.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 s
memory time: 200 ms
reduction time: 200 ms
parallel: 1
inputs: data
Estimated total run time: 6.80 s

Benchmarking Enum with input data ...
Benchmarking Stream with input data ...

##### With input data #####
Name             ips        average  deviation         median         99th %
Enum          1.06 M        0.94 μs   ±621.97%        0.88 μs        1.08 μs
Stream        0.81 M        1.23 μs   ±340.39%        1.13 μs        1.46 μs

Comparison: 
Enum          1.06 M
Stream        0.81 M - 1.30x slower +0.29 μs

Memory usage statistics:

Name      Memory usage
Enum           0.33 KB
Stream         3.57 KB - 10.88x memory usage +3.24 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name   Reduction count
Enum               345
Stream             524 - 1.52x reduction count +179

**All measurements for reduction count were the same**


null

This is an unexpected result!

Stream is slower (understandable)
Stream also uses more memory 🤯

Let's take a simpler example:
defmodule Bench2 do
  def enum(list) do
    list
    |> Enum.map(& &1.amount)
    |> Enum.sum()
  end

  def stream(list) do
    list
    |> Stream.map(& &1.amount)
    |> Enum.sum()
  end
end

Benchee.run(
  [
    Enum: &Bench2.enum/1,
    Stream: &Bench2.stream/1
  ],
  time: 1,
  memory_time: 0.2,
  reduction_time: 0.2,
  inputs: [data: transactions]
)

Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.14.1
Erlang 25.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 s
memory time: 200 ms
reduction time: 200 ms
parallel: 1
inputs: data
Estimated total run time: 6.80 s

Benchmarking Enum with input data ...
Benchmarking Stream with input data ...

##### With input data #####
Name             ips        average  deviation         median         99th %
Enum        777.46 K        1.29 μs   ±289.13%        1.17 μs        1.50 μs
Stream      419.93 K        2.38 μs   ±215.43%        1.21 μs       14.58 μs

Comparison: 
Enum        777.46 K
Stream      419.93 K - 1.85x slower +1.10 μs

Memory usage statistics:

Name      Memory usage
Enum           1.60 KB
Stream         6.78 KB - 4.23x memory usage +5.18 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name   Reduction count
Enum               506
Stream             870 - 1.72x reduction count +364

**All measurements for reduction count were the same**


null

This is highly unexpected: even when the accumulator is a simple integer, the stream version has a higher memory usage!
Why the overhead? 🧐
Both Enum and Stream are built on the top of the Enumerable protocol.
However, Enum functions are optimized for lists:

  # lib/elixir/lib/enum.ex

  @spec map(t, (element -> any)) :: list
  def map(enumerable, fun)

  def map(enumerable, fun) when is_list(enumerable) do
    :lists.map(fun, enumerable)
  end

  def map(enumerable, fun) do
    reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
  end

For streams, this will basically run something like:
Enumerable.reduce(1..10, {:cont, 0}, &IO.inspect({:cont, &1 + &2}))

{:cont, 1}
{:cont, 3}
{:cont, 6}
{:cont, 10}
{:cont, 15}
{:cont, 21}
{:cont, 28}
{:cont, 36}
{:cont, 45}
{:cont, 55}


{:done, 55}

Stream implementation relies on building a lot of intermediate tuples and anonymous functions, resulting in an overhead that can be bigger than just building intermediate lists.
Streams are great to process resources lazily (files, database...).
But they shouldn't be used as a way to optimize an algorithm (at least without benchmarking).
Do it manually

defmodule Bench3 do
  def enum(list), do: list |> Enum.map(& &1.amount) |> Enum.sum()
  def stream(list), do: list |> Stream.map(& &1.amount) |> Enum.sum()

  def reduce(list), do: Enum.reduce(list, 0, &(&1.amount + &2))

  # tail-recursive
  def recursion(list), do: do_recursion(list, 0)
  def do_recursion([], acc), do: acc
  def do_recursion([h | t], acc), do: do_recursion(t, acc + h.amount)

  # https://github.com/sabiwara/enumancer
  import Enumancer

  defenum enumancer(list) do
    list |> map(& &1.amount) |> sum()
  end
end

Benchee.run(
  [
    "1. Enum.map/2 |> Enum.sum/1": &Bench3.enum/1,
    "2. Stream.map/2 |> Enum.sum/1": &Bench3.stream/1,
    "3. Enum.reduce/3": &Bench3.reduce/1,
    "4. Recursion": &Bench3.recursion/1,
    "5. Enumancer map/2 |> sum/1": &Bench3.enumancer/1
  ],
  time: 1,
  memory_time: 0.2,
  reduction_time: 0.2,
  inputs: [data: transactions]
)

Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.14.1
Erlang 25.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 s
memory time: 200 ms
reduction time: 200 ms
parallel: 1
inputs: data
Estimated total run time: 17 s

Benchmarking 1. Enum.map/2 |> Enum.sum/1 with input data ...
Benchmarking 2. Stream.map/2 |> Enum.sum/1 with input data ...
Benchmarking 3. Enum.reduce/3 with input data ...
Benchmarking 4. Recursion with input data ...
Benchmarking 5. Enumancer map/2 |> sum/1 with input data ...

##### With input data #####
Name                                    ips        average  deviation         median         99th %
4. Recursion                         2.88 M      346.93 ns   ±905.56%         333 ns         417 ns
5. Enumancer map/2 |> sum/1          2.82 M      354.71 ns  ±1490.72%         333 ns         500 ns
3. Enum.reduce/3                     1.90 M      525.85 ns  ±1149.47%         500 ns         667 ns
1. Enum.map/2 |> Enum.sum/1          0.79 M     1260.41 ns   ±360.91%        1167 ns        1458 ns
2. Stream.map/2 |> Enum.sum/1        0.76 M     1322.68 ns   ±431.17%        1167 ns        2375 ns

Comparison: 
4. Recursion                         2.88 M
5. Enumancer map/2 |> sum/1          2.82 M - 1.02x slower +7.78 ns
3. Enum.reduce/3                     1.90 M - 1.52x slower +178.92 ns
1. Enum.map/2 |> Enum.sum/1          0.79 M - 3.63x slower +913.48 ns
2. Stream.map/2 |> Enum.sum/1        0.76 M - 3.81x slower +975.75 ns

Memory usage statistics:

Name                             Memory usage
4. Recursion                              0 B
5. Enumancer map/2 |> sum/1               0 B - 1.00x memory usage +0 B
3. Enum.reduce/3                         40 B - ∞ x memory usage +40 B
1. Enum.map/2 |> Enum.sum/1            1640 B - ∞ x memory usage +1640 B
2. Stream.map/2 |> Enum.sum/1          6944 B - ∞ x memory usage +6944 B

**All measurements for memory usage were the same**

Reduction count statistics:

Name                          Reduction count
4. Recursion                              102
5. Enumancer map/2 |> sum/1               104 - 1.02x reduction count +2
3. Enum.reduce/3                          303 - 2.97x reduction count +201
1. Enum.map/2 |> Enum.sum/1               506 - 4.96x reduction count +404
2. Stream.map/2 |> Enum.sum/1             870 - 8.53x reduction count +768

**All measurements for reduction count were the same**


null

Key takeaways


Enum is incredibly flexible and expressive, but pipelines can be suboptimal when you need performance
Streams are very useful to work with resources (files, ...) to avoid loading all data at once...
... but they do come with an overhead and shouldn't be used automatically as an Enum replacement without benchmarking
hand-rolled recursion is the fastest and most efficient option, followed by hand-written Enum.reduce/3, but both can lack readability
Enumancer (experimental) can optimize an "Enum" pipeline at compile-time, without the overhead of a stream
Final quote:

Premature optimization is the root of all evil


Check https://github.com/sabiwara/enumancer 😉


Enum
Stream
for
recursion
Enumancer


Readability
✅
✅
⭕️(1)
❌
✅


Flexibility
✅
✅
❌
✅✅(2)
⏳✅(4)


Performance
❌
❌
✅
✅✅
✅✅


Enumerable support
✅
✅
✅
❌
✅


Lazy processing
❌
✅
✅
❌(3)
✅


for is readable in some cases, but for more complex cases its syntax can get quite complicated
Recursion is highly flexible since it can implement any arbitrary "loop"
Recursion only works with lists, so it can't really work with streams etc
Enumancer is still a work in progress, so flexibility is lacking at the moment but expected to increase
	Enum	Stream	for	recursion	Enumancer
Readability	✅	✅	⭕️(1)	❌	✅
Flexibility	✅	✅	❌	✅✅(2)	⏳✅(4)
Performance	❌	❌	✅	✅✅	✅✅
Enumerable support	✅	✅	✅	❌	✅
Lazy processing	❌	✅	✅	❌(3)	✅