All three of the benchmarks here attempt to side-step allocation overhead by making a special target "collection" for the FromIterator
traversal that just takes the last element from the iteration. (So its not really a collection at all.)
- In practice this means that the code should just compile into stepping through the iteration, cloning each element and saving it in a temporary variable, and then moving that temporary variable into a
Some
at the end of the iteration. - (Its possible that the compiler is avoiding some extra copying in some circumstances. Its also possible that we are just getting lucky with respect to register allocation and/or register pressure in some circumstances. It seems to me like the hot-loop in
into_last_std
may be achieving both...)
Here is an overview of the three benchmarks. (The source code is presented in PR #59605; both the "std" and "new" use the same benchmark code, and I just gathered a perf run
from before and after the reimplementation of the underlying impl FromIterator for Result
.)
bench_result_from_iter_into_last_old
is an inlined re-implementation of the original libstdFromIterator for Result
.- This takes 665 ns/iter, pretty reliably.
bench_result_from_iter_into_last_std
measures the orignal libstdFromIterator for Result
.- I am seeing this take 369 ns/iter
- I assume this 2x over bench_result_from_iter_into_last_old improvement is at least in part due to inlining directly into the benchmark driver?
bench_result_from_iter_into_last_new
measures the new libstdFromIterator for Result
.- For some reason this takes on the order of 3K ns/iter
- that is a 5x to 10x slowdown (depending on which case above you compare against).