This is a really dumb test, but the try_for_each desugaring is 3.5x faster at this aimless looping exercise on an Apple M1. However, if you put some non-trivial code in place of black_box or you use a much more complex iterator, then you might get something useful. You could think of this as a benchmark template.
rustc +nightly --edition 2021 optimised.rs -O --test
./optimised --test --bench
running 2 tests
test bench_desugared ... bench: 95,768 ns/iter (+/- 361)
test bench_original ... bench: 328,977 ns/iter (+/- 9,256)
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out; finished in 0.49s