TLDR: I understand the proposal of Bend, but when the efficiency performance is so big that a RTX 4090 is only 7x faster on a near-optimal scenario than 2 cores of a M3 Max in a language like JavaScript, you should probably not take it.
On the otherhand, easy to use, but opt-in, parallel languages such as OCaml exists and it can compete, so you should likely take it. If you need even more performance, Rust could likely beat the RTX 4090 results on a mobile CPU.
Of course future optimizations should improve Bend results, but my goal here is to show that the current results are not as impressive as they may look, likely a JIT will make the RTX 4090 results 10x faster, but keep always in mind, a RTX 4090 still uses at least 100 times more power than a single M3 core at any instant and that in principle GPUs are better for purely parallel tasks.
Also keep in mind that this is a very friendly code to parallelism, this is both against Bend and in favour of it, most real code is not pur