Mix.install([
{:kino_benchee, git: "https://github.com/livebook-dev/kino_benchee"},
:enumancer
])
:ok
Let's talk about Enum
and performance!
- The
Enum
module is probably one of the most useful modules except forKernel
- Compensates for the lack of loops in Elixir
- More compact and declarative than recursions
- Works for any
Enumerable
data structure, not just lists
Let's start by generating sample data:
:rand.seed(:exsss, {100, 101, 102})
transactions = Enum.map(1..100, fn i -> %{id: i, amount: :rand.uniform(1000)} end)
[
%{amount: 494, id: 1},
%{amount: 603, id: 2},
%{amount: 309, id: 3},
%{amount: 74, id: 4},
%{amount: 854, id: 5},
%{amount: 387, id: 6},
%{amount: 110, id: 7},
%{amount: 439, id: 8},
%{amount: 680, id: 9},
%{amount: 286, id: 10},
%{amount: 237, id: 11},
%{amount: 746, id: 12},
%{amount: 421, id: 13},
%{amount: 177, id: 14},
%{amount: 46, id: 15},
%{amount: 802, id: 16},
%{amount: 671, id: 17},
%{amount: 617, id: 18},
%{amount: 506, id: 19},
%{amount: 958, id: 20},
%{amount: 497, id: 21},
%{amount: 835, id: 22},
%{amount: 629, id: 23},
%{amount: 638, id: 24},
%{amount: 467, id: 25},
%{amount: 486, id: 26},
%{amount: 851, id: 27},
%{amount: 567, id: 28},
%{amount: 243, id: 29},
%{amount: 379, id: 30},
%{amount: 154, id: 31},
%{amount: 354, id: 32},
%{amount: 420, id: 33},
%{amount: 371, id: 34},
%{amount: 245, id: 35},
%{amount: 591, id: 36},
%{amount: 740, id: 37},
%{amount: 405, id: 38},
%{amount: 147, id: 39},
%{amount: 893, id: 40},
%{amount: 243, id: 41},
%{amount: 365, id: 42},
%{amount: 227, id: 43},
%{amount: 969, id: 44},
%{amount: 657, id: 45},
%{amount: 279, id: 46},
%{amount: 543, id: 47},
%{amount: 770, id: 48},
%{amount: 960, ...},
%{...},
...
]
"SELECT id from TRANSACTIONS WHERE amount > 900
".
A typical pipeline could look something like:
transactions
|> Enum.filter(&(&1.amount > 900))
|> Enum.map(& &1.id)
[20, 44, 49, 68, 88, 92]
How about efficiency?
transactions
|> Enum.filter(&(&1.amount > 900))
|> Enum.map(& &1.id)
|> dbg()
[20, 44, 49, 68, 88, 92]
We do build intermediate lists at each step!
In this case, for/1
is perfect:
for t <- transactions, t.amount > 900, do: t.id
[20, 44, 49, 68, 88, 92]
# which is basically:
Enum.reduce(transactions, [], fn t, acc ->
if t.amount > 900, do: [t.id | acc], else: acc
end)
|> Enum.reverse()
[20, 44, 49, 68, 88, 92]
But for
does only allow some cases, no flexibility.
"SELECT id from TRANSACTIONS WHERE amount > 900 LIMIT 5
".
The following cannot be expressed using for/1
for instance:
transactions
|> Enum.filter(&(&1.amount > 900))
|> Enum.take(5)
|> Enum.map(& &1.id)
[20, 44, 49, 68, 88]
So, Enum
is highly expressive and flexible.
But can we avoid the intermediate structures?
...
I know!
Let's use Stream
!
transactions
|> Stream.filter(&(&1.amount > 900))
|> Stream.take(5)
|> Enum.map(& &1.id)
[20, 44, 49, 68, 88]
defmodule Bench1 do
def enum(list) do
list
|> Enum.filter(&(&1.amount > 900))
|> Enum.take(5)
|> Enum.map(& &1.id)
end
def stream(list) do
list
|> Stream.filter(&(&1.amount > 900))
|> Stream.take(5)
|> Enum.map(& &1.id)
end
end
Benchee.run(
[
Enum: &Bench1.enum/1,
Stream: &Bench1.stream/1
],
time: 1,
memory_time: 0.2,
reduction_time: 0.2,
inputs: [data: transactions]
)
Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.14.1
Erlang 25.0
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 s
memory time: 200 ms
reduction time: 200 ms
parallel: 1
inputs: data
Estimated total run time: 6.80 s
Benchmarking Enum with input data ...
Benchmarking Stream with input data ...
##### With input data #####
Name ips average deviation median 99th %
Enum 1.06 M 0.94 μs ±621.97% 0.88 μs 1.08 μs
Stream 0.81 M 1.23 μs ±340.39% 1.13 μs 1.46 μs
Comparison:
Enum 1.06 M
Stream 0.81 M - 1.30x slower +0.29 μs
Memory usage statistics:
Name Memory usage
Enum 0.33 KB
Stream 3.57 KB - 10.88x memory usage +3.24 KB
**All measurements for memory usage were the same**
Reduction count statistics:
Name Reduction count
Enum 345
Stream 524 - 1.52x reduction count +179
**All measurements for reduction count were the same**
null
This is an unexpected result!
Stream
is slower (understandable)Stream
also uses more memory 🤯
Let's take a simpler example:
defmodule Bench2 do
def enum(list) do
list
|> Enum.map(& &1.amount)
|> Enum.sum()
end
def stream(list) do
list
|> Stream.map(& &1.amount)
|> Enum.sum()
end
end
Benchee.run(
[
Enum: &Bench2.enum/1,
Stream: &Bench2.stream/1
],
time: 1,
memory_time: 0.2,
reduction_time: 0.2,
inputs: [data: transactions]
)
Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.14.1
Erlang 25.0
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 s
memory time: 200 ms
reduction time: 200 ms
parallel: 1
inputs: data
Estimated total run time: 6.80 s
Benchmarking Enum with input data ...
Benchmarking Stream with input data ...
##### With input data #####
Name ips average deviation median 99th %
Enum 777.46 K 1.29 μs ±289.13% 1.17 μs 1.50 μs
Stream 419.93 K 2.38 μs ±215.43% 1.21 μs 14.58 μs
Comparison:
Enum 777.46 K
Stream 419.93 K - 1.85x slower +1.10 μs
Memory usage statistics:
Name Memory usage
Enum 1.60 KB
Stream 6.78 KB - 4.23x memory usage +5.18 KB
**All measurements for memory usage were the same**
Reduction count statistics:
Name Reduction count
Enum 506
Stream 870 - 1.72x reduction count +364
**All measurements for reduction count were the same**
null
This is highly unexpected: even when the accumulator is a simple integer, the stream version has a higher memory usage!
Why the overhead? 🧐
Both Enum
and Stream
are built on the top of the Enumerable
protocol.
However, Enum
functions are optimized for lists:
# lib/elixir/lib/enum.ex
@spec map(t, (element -> any)) :: list
def map(enumerable, fun)
def map(enumerable, fun) when is_list(enumerable) do
:lists.map(fun, enumerable)
end
def map(enumerable, fun) do
reduce(enumerable, [], R.map(fun)) |> :lists.reverse()
end
For streams, this will basically run something like:
Enumerable.reduce(1..10, {:cont, 0}, &IO.inspect({:cont, &1 + &2}))
{:cont, 1}
{:cont, 3}
{:cont, 6}
{:cont, 10}
{:cont, 15}
{:cont, 21}
{:cont, 28}
{:cont, 36}
{:cont, 45}
{:cont, 55}
{:done, 55}
Stream implementation relies on building a lot of intermediate tuples and anonymous functions, resulting in an overhead that can be bigger than just building intermediate lists.
Streams are great to process resources lazily (files, database...).
But they shouldn't be used as a way to optimize an algorithm (at least without benchmarking).
defmodule Bench3 do
def enum(list), do: list |> Enum.map(& &1.amount) |> Enum.sum()
def stream(list), do: list |> Stream.map(& &1.amount) |> Enum.sum()
def reduce(list), do: Enum.reduce(list, 0, &(&1.amount + &2))
# tail-recursive
def recursion(list), do: do_recursion(list, 0)
def do_recursion([], acc), do: acc
def do_recursion([h | t], acc), do: do_recursion(t, acc + h.amount)
# https://github.com/sabiwara/enumancer
import Enumancer
defenum enumancer(list) do
list |> map(& &1.amount) |> sum()
end
end
Benchee.run(
[
"1. Enum.map/2 |> Enum.sum/1": &Bench3.enum/1,
"2. Stream.map/2 |> Enum.sum/1": &Bench3.stream/1,
"3. Enum.reduce/3": &Bench3.reduce/1,
"4. Recursion": &Bench3.recursion/1,
"5. Enumancer map/2 |> sum/1": &Bench3.enumancer/1
],
time: 1,
memory_time: 0.2,
reduction_time: 0.2,
inputs: [data: transactions]
)
Operating System: macOS
CPU Information: Apple M1
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.14.1
Erlang 25.0
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 1 s
memory time: 200 ms
reduction time: 200 ms
parallel: 1
inputs: data
Estimated total run time: 17 s
Benchmarking 1. Enum.map/2 |> Enum.sum/1 with input data ...
Benchmarking 2. Stream.map/2 |> Enum.sum/1 with input data ...
Benchmarking 3. Enum.reduce/3 with input data ...
Benchmarking 4. Recursion with input data ...
Benchmarking 5. Enumancer map/2 |> sum/1 with input data ...
##### With input data #####
Name ips average deviation median 99th %
4. Recursion 2.88 M 346.93 ns ±905.56% 333 ns 417 ns
5. Enumancer map/2 |> sum/1 2.82 M 354.71 ns ±1490.72% 333 ns 500 ns
3. Enum.reduce/3 1.90 M 525.85 ns ±1149.47% 500 ns 667 ns
1. Enum.map/2 |> Enum.sum/1 0.79 M 1260.41 ns ±360.91% 1167 ns 1458 ns
2. Stream.map/2 |> Enum.sum/1 0.76 M 1322.68 ns ±431.17% 1167 ns 2375 ns
Comparison:
4. Recursion 2.88 M
5. Enumancer map/2 |> sum/1 2.82 M - 1.02x slower +7.78 ns
3. Enum.reduce/3 1.90 M - 1.52x slower +178.92 ns
1. Enum.map/2 |> Enum.sum/1 0.79 M - 3.63x slower +913.48 ns
2. Stream.map/2 |> Enum.sum/1 0.76 M - 3.81x slower +975.75 ns
Memory usage statistics:
Name Memory usage
4. Recursion 0 B
5. Enumancer map/2 |> sum/1 0 B - 1.00x memory usage +0 B
3. Enum.reduce/3 40 B - ∞ x memory usage +40 B
1. Enum.map/2 |> Enum.sum/1 1640 B - ∞ x memory usage +1640 B
2. Stream.map/2 |> Enum.sum/1 6944 B - ∞ x memory usage +6944 B
**All measurements for memory usage were the same**
Reduction count statistics:
Name Reduction count
4. Recursion 102
5. Enumancer map/2 |> sum/1 104 - 1.02x reduction count +2
3. Enum.reduce/3 303 - 2.97x reduction count +201
1. Enum.map/2 |> Enum.sum/1 506 - 4.96x reduction count +404
2. Stream.map/2 |> Enum.sum/1 870 - 8.53x reduction count +768
**All measurements for reduction count were the same**
null
Enum
is incredibly flexible and expressive, but pipelines can be suboptimal when you need performanceStream
s are very useful to work with resources (files, ...) to avoid loading all data at once...- ... but they do come with an overhead and shouldn't be used automatically as an
Enum
replacement without benchmarking - hand-rolled recursion is the fastest and most efficient option, followed by hand-written
Enum.reduce/3
, but both can lack readability Enumancer
(experimental) can optimize an "Enum" pipeline at compile-time, without the overhead of a stream- Final quote:
Premature optimization is the root of all evil
Check https://github.com/sabiwara/enumancer 😉
Enum | Stream | for | recursion | Enumancer | |
---|---|---|---|---|---|
Readability | ✅ | ✅ | ⭕️(1) | ❌ | ✅ |
Flexibility | ✅ | ✅ | ❌ | ✅✅(2) | ⏳✅(4) |
Performance | ❌ | ❌ | ✅ | ✅✅ | ✅✅ |
Enumerable support | ✅ | ✅ | ✅ | ❌ | ✅ |
Lazy processing | ❌ | ✅ | ✅ | ❌(3) | ✅ |
for
is readable in some cases, but for more complex cases its syntax can get quite complicated- Recursion is highly flexible since it can implement any arbitrary "loop"
- Recursion only works with lists, so it can't really work with streams etc
Enumancer
is still a work in progress, so flexibility is lacking at the moment but expected to increase